Most learning problems are formulated as optimization problems, based on the observation of a data sample (training set). Optimizing an objective defined from this sample allows proposing an estimator with good performance on the training set. However, the main interest lies in the generalization ability of this estimator, that is, its performance on new observations. With the emergence of large datasets since the 2000s, the connection between the algorithm used and the generalization capacity of the associated estimator has become a major topic of interest.
Today, the question of generalization remains a major research issue, both for its theoretical and practical aspects. In this course, we will explore both theoretical and heuristic results that help address this problem. Specifically, we will first study various approaches that provide theoretical guarantees regarding the generalization of algorithms, particularly those related to complexity, stability, and early stopping methods (early stopping, stochastic approximation). In the second part, we will study heuristic approaches and the differences (explained or observed) in the context of deep learning (non-convex and over-parameterized).
Prerequisites: Basic knowledge in convex optimization and statistics. Having completed the course on optimization for data sciences will help better understand the various algorithms involved.
Non-exhaustive reference list:
- Rademacher and Gaussian Complexities: Risk Bounds and Structural Results, P. Bartlett, S. Mendelson
- The Tradeoffs of Large Scale Learning, L. Bottou, O. Bousquet
- Stability and Generalization, O. Bousquet, A. Elisseef
- Train faster, generalize better: Stability of stochastic gradient descent, M. Hardt, B. Recht, Y. Singer
- Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n), F. Bach, E. Moulines
- Understanding deep learning requires rethinking generalization, C. Zhang, S. Bengio, M. Hardt, B. Recht, O. Vinyals
- On early stopping in gradient descent learning, Y. Yao, L. Rosasco, and A. Caponnetto
- Generalization properties of multiple passes stochastic gradient method, S. Villa
- Competing with the empirical risk minimizer in a single pass, R. Frostig, R. Ge, S. M. Kakade, A. Sidford
- Deep Learning and Generalization, O. Bousquet
- Teaching coordinator: Dieuleveut Aymeric
- Teaching coordinator: Hendrikx Hadrien