Skip to Content

Search: {{$root.lsaSearchQuery.q}}, Page {{$}}

Statistics Department Seminar Series: Noah Simon, Associate Professor, Department of Biostatistics, University of Washington

"Stochastic optimization for online non-parametric estimation"
Friday, April 2, 2021
10:00-11:00 AM
Abstract: Non-parametric modeling of regression functions is an important contemporary topic. Many machine learning (ML) methods can be viewed through the lens of non-parametric estimation. Often we aim to learn a predictive model or decision rule and must select model complexity to balance bias and variance. The optimal model complexity is based on the number of available samples as well as the complexity of the underlying truth. In ML, model complexity is generally selected based on empirical performance (rather than theoretical results), nevertheless theory can help us understand the limits of model performance, and motivate methodologic ideas.

When building ML models with large datasets generally mini-batch based stochastic optimization methods are used: There, a model is iteratively updated based on random small subsamples of our full dataset. This is often framed theoretically by imagining we have a dataset that is continually updated with batches of new samples. This is the so-called “online setting” and may also be an appropriate theoretical framework to use when we imagine that the performance of our method is compute constrained rather than data constrained.

As noted, the optimal complexity of our model (for balancing bias and variance) is dependent on the sample size. Thus, in the online scenario, it is natural to consider a model of growing complexity (as more observations are obtained). This brings up further questions: How do we appropriately increase complexity? Do we need to store and re-use old observations as complexity increases? How computationally efficient can we be without losing predictive performance?

In this talk, I consider these questions in a simplified model. I consider estimating a regression function that is smooth (lives in a Sobolev ellipsoid). I propose an almost painfully simple stochastic gradient descent method (in a linear space of growing dimension) and show that 1) this estimator achieves the minimax L2 estimation error rate; and 2) Under some assumptions, this estimator uses the minimal memory of any estimator that achieves that minimax error rate. I relate this to more classical sieve/projection estimators, and to other stochastic gradient-based methods and ideas. These results also hold for estimating regression functions in an RKHS.

This seminar will be livestreamed via Zoom There will be a virtual reception to follow.
Building: Off Campus Location
Location: Virtual
Event Link:
Event Type: Workshop / Seminar
Tags: seminar
Source: Happening @ Michigan from Department of Statistics, Department of Statistics Seminar Series