Text is found everywhere. Text mining, Natural Language Processing (NLP), information retrieval and recommendation systems are very important from an applied point of view and are highly sought after by industrial actors (Google, Facebook, Amazon, Twitter, Yahoo!, to name only a few). Apart from the major players cited above, there is currently a vast industrial interest in search engines, web marketing/advertising, social networks analysis, recommendation systems and so on, most of which deal primarily with text. This huge market and strong demand has made NLP an important research field with high visibility and impact.
Weeks 1-4: Text mining (teaching team: Prof M. Vazirgiannis, Prof. A. Papadopoulos, Dr. A. Tixier, K. Skianis)
- 06/01. Information retrieval (lecture) / Text preprocessing and IR (lab).
- 13/01. Advanced information retrieval and graphs of words (lecture) / Keyword extraction (lab).
- 27/01. Text categorization, topic modeling (lecture) / Supervised document classification (lab).
- 03/02. Word embeddings (lecture) / Unsupervised document classification (lab).
Weeks 5-8: Natural Language Processing (by Prof. Nadi Tomeh - Univ-paris13)
- 10/02. Language modeling (lecture) / N-Gram language models (lab).
- 24/02. Stochastic tagging and discriminative sequence labeling (lecture) / POS-tagger and named entity recognition (lab).
- 03/03. Grammars, constituency & dependency parsing (lecture) / Stochastic parser & relation extractor (lab).
- 10/03. Automatic translation or Questions & Answers (lecture) / IBM Watson Technologies (lab).
Weeks 9: Text mining (teaching team: Prof M. Vazirgiannis, Prof. A. Papadopoulos, Dr. A. Tixier, K. Skianis)
- 15/03. Deep learning for NLP (lecture) / Application of NNs to text mining (lab).
1. The course will be divided into 9 sessions of 4 hours. Each session will be broken down into a lecture (08:30 - 10:15) and a lab session (10:30 -12:30). Both the lectures and the labs will take place in Amphi Monge. The first five sessions 06, 13, 27 Jan and 03 and 10/02 will be offered by the Data Science and Mining team@LIX (specifically by M. Vazirgiannis, A. Papadopoulos and A. Tixier) while the last four (24 Feb, 03, 10 and 15 March) will be offered by Prof. Nadi Tomeh (Univ Paris 13).
2. For the labs, students are expected to be equipped with their laptops (preferably with a Unix environment like Linux or Mac OS X for compatibility reasons). The language used will be Python 2.7.
3. The evaluation for the course will be based on a data science competition. More details will be provided in due time.
4. We will be using the e-learning platform Moodle to share the course materials (slides and lab material). Therefore it is imperative for students to enroll (available once logged in in Moodle with their @polytechnique.edu account using the enrollment key specified in the welcoming email they received). Additionally, the forum should be used to communicate with the staff following the guidelines
- Teaching coordinator: Apostolos Papadopoulos
- Teaching coordinator: Konstantinos Skianis
- Teaching coordinator: Antoine Tixier
- Teaching coordinator: Michalis Vazirgiannis