Objectives

The objective of this course is to introduce linear and non-linear regression (logistic regression and generalized linear models). Regression plays a key role in many problems and it is absolutely essential for a datascientist to understand the theory and the practice of regression analysis. It is also an important vehicle to address the statistical challenges in statistical learning : model selection, penalisation, resampling (bootstrap, cross-validation) robustness, detection of outliers, and also methods to detect deviations from an assumed model. The course will also serve as a motivation to sharpen the understanding of statistical techniques, covering both estimation and tests.

Syllabus

  1. Introduction to statistical learning
    • Regression: objectives and applications
    • Linear models : interpretation, examples
    • Least-Square estimators properties (bias, variance)
    • Case study: univariate and multivariate regression
  2. Multivariate Linear Regression: Parametric case
    • Asymptotic properties
    • Gaussian case (distribution of the parameters, confidence regions)
    • Confidence intervals and tests
    • Classical regression diagnostic (leverage points)
    • Case study: understanding multiple linear regression with R (lm summary, detecting outliers, understanding classical regression diagnosis)
  3. Multivariate Linear Regression: Non parametric case
    • introduction to non-parametric regression : from parameters to functions
    • Function classes, model selection
    • Variable choice / Basis / Spline
    • Bias / Variance (Approximation error / Estimation Error)
    • Case study : Spline regression
  4. Model Selection and Resampling
    • Approximation Error / Estimation Error
    • Learning Error / Generalization Error
    • Resampling based method: jacknife, bootstrap, Cross Validation
    • Case study: model selection with CV
  5. Model Selection and Unbiased Risk Estimation
    • Unbiased Risk Estimation
    • AIC/BIC Penalization and Exhaustive Exploration
    • Forward / Backward and Stochastic Exploration
    • Multiple tests
    • Case study : model selection with exploration
  6. Model Selection and Penalization
    • Restricted Model and Penalization
    • Ridge and Lasso
    • Numerical algorithm: Gradient Descent and Coordinate Descent
    • Case study: Coordinate Descent and Lasso
  7. Logistic Regression
    • Classification and Binary output
    • Maximum Likelihood Approach
    • Penalization
    • Numerical Algorithm: Gradient Descent, Stochastic Gradient Descent
    • Case study: Logistic and model selection
  8. Generalized Linear Models
    • The exponential family (definitions, examples, log-partition function)
    • Generalized linear models basics (ML/MAP estimators)
    • Probit regresion : latent variable interpretation, multinomial probit models
    • Case study: multiclass
  • Modalités d'évaluation : Examen final

    Langue du cours : Français avec transparents en anglais