New Perspectives on High-Dimensional Estimation: Maximum Likelihood and Test-Time Training

Faculty of Informatics - Academic Studies Administration

Date: 3 February 2026 / 10:00 - 10:45

USI East Campus, Room C1.03

Speaker: Gil Kur, ETH

Abstract: In the theory part of the talk, we study the statistical performance of Maximum Likelihood Estimation (MLE) and, more generally, Empirical Risk Minimization (ERM). While MLE is known to be minimax optimal for low-complexity models, classical work showed that it can be suboptimal over “large” function classes, though those examples are somewhat pathological. First, we develop a technique for detecting and quantifying the suboptimality of ERM in regression over high-dimensional nonparametric classes. Second, we show that the variance term of ERM procedures is always upper-bounded by the minimax rate, implying that any minimax suboptimality must arise from bias. Third, we present the first minimax-optimal estimator with polynomial runtime in the sample size for convex regression in all dimensions. We then discuss applications of the local theory of Banach spaces to minimum-norm interpolators, building on an approach of Pisier and Maurey. In the applied part of the talk, we propose an explanation for the empirical success of Test-Time Training (TTT) in foundation models, which we primarily validate through experiments with sparse autoencoders (SAEs). TTT identifies the “most similar” points in the training data to a given evaluation point and improves predictions by locally adapting the model to this selected neighborhood. Although TTT was discussed in earlier work, it has only recently been shown to yield significant performance in foundation models across domains such as image generation, control, and language modeling.

Biography: Gil Kur is a postdoctoral fellow at ETH Zürich, primarily hosted by Andreas Krause. He completed his PhD in Electrical Engineering and Computer Science at MIT under the supervision of Sasha Rakhlin and earned an MSc from the Weizmann Institute of Science under the supervision of Boaz Nadler. His research focuses on statistical learning theory, nonparametric and high-dimensional statistics, and methodology for foundations models.
 

Host: Prof. Ernst Wit