Objectifs du cours :
- Introduction to the concept of data stream processing
- Learning the basics on and how to use Data Stream Management Systems (DSMS)
- Understanding the main sampling techniques used for stream processing : sampling, sketching, etc.
- Understanding and using the main data stream processing algorithms
Syllabus :
This course deals with the algorithms and softwares commonly used to process large data streams. It aims at understanding the main difficulties and specificities of this type of data, knowing what different types of streams exist, what are the theoretical models and practical algorithms to analyze them, and what are the right tools to process these streams.
After an introduction of what data streams are from a conceptual point of view, this class covers the question of data stream processing from two different angles:
- A Machine Learning and Data Mining approach to cover the theoretical and algorithmic difficulties of learning from data streams: online learning vs incremental and batch learning, and sampling techniques.
- A more practical approach with an introduction to the various systems and software that are used to handle these data.
In terms of organization, the course will consist of an alternance of lectures and practical sessions. Finally, during the last class the students will have to present a recent research article of their choice on the subject of data stream processing.
Prérequis :
- Basics in SQL language
- Basics in Machine Learning (supervised and unsupervised)
- A knowledge of Java programming is recommended but not mandatory
Évaluation :
- The practical sessions will make ⅔ of the mark
- The research paper presentation will make ⅓ of the mark
- Teaching coordinator: Diao Yanlei