### MAP553 - The art of regression (2016-2017)

Objectives

The objective of this course is to introduce linear and non-linear regression (logistic regression and generalized linear models). Regression plays a key role in many problems and it is absolutely essential for a datascientist to understand the theory and the practice of regression analysis. It is also an important vehicle to address the statistical challenges in statistical learning : model selection, penalisation, resampling (bootstrap, cross-validation) robustness, detection of outliers, and also methods to detect deviations from an assumed model. The course will also serve as a motivation to sharpen the understanding of statistical techniques, covering both estimation and tests.

Syllabus

1. Introduction to statistical learning
• Regression: objectives and applications
• Linear models : interpretation, examples
• Least-Square estimators properties (bias, variance)
• Case study: univariate and multivariate regression
2. Multivariate Linear Regression: Parametric case
• Asymptotic properties
• Gaussian case (distribution of the parameters, confidence regions)
• Confidence intervals and tests
• Classical regression diagnostic (leverage points)
• Case study: understanding multiple linear regression with R (lm summary, detecting outliers, understanding classical regression diagnosis)
3. Multivariate Linear Regression: Non parametric case
• introduction to non-parametric regression : from parameters to functions
• Function classes, model selection
• Variable choice / Basis / Spline
• Bias / Variance (Approximation error / Estimation Error)
• Case study : Spline regression
4. Model Selection and Resampling
• Approximation Error / Estimation Error
• Learning Error / Generalization Error
• Resampling based method: jacknife, bootstrap, Cross Validation
• Case study: model selection with CV
5. Model Selection and Unbiased Risk Estimation
• Unbiased Risk Estimation
• AIC/BIC Penalization and Exhaustive Exploration
• Forward / Backward and Stochastic Exploration
• Multiple tests
• Case study : model selection with exploration
6. Model Selection and Penalization
• Restricted Model and Penalization
• Ridge and Lasso
• Numerical algorithm: Gradient Descent and Coordinate Descent
• Case study: Coordinate Descent and Lasso
7. Logistic Regression
• Classification and Binary output
• Maximum Likelihood Approach
• Penalization