Enrolment options

Modern data analysis relies on the use of high-level languages such as Python or R for data handling and processing. However, behind standard libraries like Scikit-Learn hide core implementations in low-level languages such as C or C++, for faster execution and optimized resources management. Hence the utility of this course, the goal of which is twofold: first, to get acquainted with some of the standard techniques in data analysis and machine learning; second, to acquire some expertise in C/C++ programming, so that students can then adapt existing low-level implementations to their needs. Note that the programming paradigms seen in this course are almost exclusively sequential, concurrent programming being addressed only marginally during the last session and being the subject of other courses.

 

Topics of the sessions (data analysis / C++):

1. Short introduction to data analysis / C++ as C (1/2)

2. Nearest neighbors search and retrieval in databases / C++ as C (2/2)

3. k-means / structs and classes (1/2)

4. Hierarchical clustering / structs and classes (2/2)

5. Density estimation / inheritance

6. k-NN classification and regression / genericity

7. Linear models for regression / STL

8. Linear models for classification / C++11

9. Feature extraction / -

10. Neural networks / multithreading

 

References:

On data analysis:

  • Hastie, Tibshirani, Friedman: The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Springer, 2017.
  • Bishop: Pattern Recognition and Machine Learning. Springer, 2006.

On C++:

  • Weiss: C++ for Java Programmers. Prentice Hall, 2003.
  • Stroustrup. The C++ Programming Language (4th ed.). Addison-Wesley, 2013.

 

 

Prerequisites : INF371 or INF411

Recommended : MAP433 and INF421

Guests cannot access this course. Please log in.