**Objectives**

The objective of this course is to introduce linear and non-linear regression (logistic regression and generalized linear models). Regression plays a key role in many problems and it is absolutely essential for a datascientist to understand the theory and the practice of regression analysis. It is also an important vehicle to address the statistical challenges in statistical learning : model selection, penalisation, resampling (bootstrap, cross-validation) robustness, detection of outliers, and also methods to detect deviations from an assumed model. The course will also serve as a motivation to sharpen the understanding of statistical techniques, covering both estimation and tests.

**Syllabus**

- Introduction to statistical learning
- Regression: objectives and applications

- Linear models : interpretation, examples
- Least-Square estimators properties (bias, variance)
- Case study: univariate and multivariate regression

- Multivariate Linear Regression: Parametric case
- Asymptotic properties
- Gaussian case (distribution of the parameters, confidence regions)
- Confidence intervals and tests
- Classical regression diagnostic (leverage points)
- Case study: understanding multiple linear regression with R (lm summary, detecting outliers, understanding classical regression diagnosis)

- Multivariate Linear Regression: Non parametric case
- introduction to non-parametric regression : from parameters to functions
- Function classes, model selection
- Variable choice / Basis / Spline
- Bias / Variance (Approximation error / Estimation Error)
- Case study : Spline regression

- Model Selection and Resampling
- Approximation Error / Estimation Error
- Learning Error / Generalization Error
- Resampling based method: jacknife, bootstrap, Cross Validation
- Case study: model selection with CV

- Model Selection and Unbiased Risk Estimation
- Unbiased Risk Estimation
- AIC/BIC Penalization and Exhaustive Exploration
- Forward / Backward and Stochastic Exploration
- Multiple tests
- Case study : model selection with exploration

- Model Selection and Penalization
- Restricted Model and Penalization
- Ridge and Lasso
- Numerical algorithm: Gradient Descent and Coordinate Descent
- Case study: Coordinate Descent and Lasso

- Logistic Regression
- Classification and Binary output
- Maximum Likelihood Approach
- Penalization
- Numerical Algorithm: Gradient Descent, Stochastic Gradient Descent
- Case study: Logistic and model selection

- Generalized Linear Models
- The exponential family (definitions, examples, log-partition function)
- Generalized linear models basics (ML/MAP estimators)
- Probit regresion : latent variable interpretation, multinomial probit models
- Case study: multiclass

**Modalités d'évaluation :**Examen final**Langue du cours :**Français avec transparents en anglais

- Teaching coordinator: Julie Josse
- Teaching coordinator: Erwan Le Pennec
- Teaching coordinator: Eric Moulines