Le Big data marque le début d’une transformation majeure, qui va affecter de façon profonde l’ensemble des secteurs (de l’e-commerce à la recherche scientifique en passant par la finance et la santé). L’exploitation de ces immenses masses de données nécessite des techniques mathématiques sophistiquées visant à extraire l’information pertinente. L’ensemble de ces méthodes forme le socle de la « science des données » (ou data science). Ce passage des données aux connaissances est porteur de nombreux défis qui requièrent une approche interdisciplinaire. La « science des données » s’appuie fortement sur le traitement statistique de l’information : statistiques mathématiques, statistiques numériques, apprentissage statistique ou machine learning. De l’analyse de données exploratoires aux techniques les plus sophistiquées d’inférence (modèles graphiques hiérarchiques) et de classification ou de régression (deep learning, machine à vecteurs de support), une vaste palette de méthodes de statistiques mathématiques et numériques et d’apprentissage est mobilisée. Ces méthodes, pour pouvoir être développées à l’échelle de masses de données requièrent la maitrise des mécanismes de distribution des données et des calculs à très grande échelle. Les mathématiques appliquées (analyse fonctionnelle, analyse numérique, optimisation convexe et non convexe) ont également un rôle essentiel à jouer. D’un point de vue applicatif, la « science des données » impacte fortement de nombreux secteurs. Il existe actuellement partout dans le monde un large déficit de “Data Scientists” et “Data Analysts”. Les étudiants issus de formations en science des données et “Big Data” sont donc très attendus sur le marché de l’emploi. Ce marché de l’emploi est mondial et concerne à la fois les économies développées et émergentes. A l’instar de tous les domaines d’innovations de ruptures (biotechnologies, e-médecine), le besoin d’ingénieurs de très haut-niveau et de doctorants est également important.

1ere période : jusqu’au 18 novembre

Data Camp

Responsables : Alexandre GRAMFORT

Crédits : 5 ECTS

Syllabus :

You will put your basic machine learning and data analysis knowledge to test by

  1. solving practical data science problems in scientific or industrial applications and by
  2. designing data science workflows.


Code-submission data challenge

To achieve the first objective, you will participate in a data challenge at the RAMP site. The particularity of RAMPs (vs Kaggle) is that you will submit code, not predictions. Your code will be inserted into a predictive workflow, trained and tested. A public cross-validation score will be available on a public leaderboard, real time. Your grade will be a function of the private test score of your best submission, obtained on a hidden test set. The challenge will also include a collaborative phase in which you can access all the submitted solutions, and you will be allowed and encouraged to reuse each other’s code. Part of your grade will come from your activities from this collaborative phase.

You will be able to choose from two to five problems coming from scientific or industrial applications (e.g., brain imaging, astrophysics, biology/chemistry, ad placement, insurance pricing). You can participate in more than one challenges: we will grade you based on your best performance.

The starting kit

Each challenge will come with an open source starting kit available at https://github.com/ramp-kits, containing

  • a Jupyter notebook that describes the industrial or scientific prediction problem, does some exploratory analysis and data visualization, and explains the predictive workflow and one or more basic solutions (example),
  • a Python script that parametrizes the challenge (example),
  • public training and test data sets, different from the official sets we use to evaluate your submissions at the RAMP site (example), and
  • one ore more example submissions (example).

You will be able to test the example submission and all your subsequent submissions before submitting, using a simple command line test script (more information here).


  • Opening of the challenges: October 27, 12h30-14h30
  • Closing of the competitive phase: December 17, 20h
  • Closing of the collaborative phase: January 30, 20h


Half of your grade will come from the data challenge, 5/20 from the competitive phase and 5/20 from the collaborative phase. Selected students will be able to present their solutions to the class, for up to 2/20 bonus grades.

Team project


To achieve the second objective, you will build a predictive workflow in teams of size three to five, implement a data-driven business/science case. We will give you a set of pointers to existing data sources, but you will be encouraged to find business/science cases and data sources on your own. Collaborating with research teams or businesses will also be highly regarded.

You will submit the projects as RAMP starting kits on github. We will not ask you here to optimize the solution, rather to focus on its design and its match to the business or science case.

The business/science case

Half of your grade (5/10) will come from the quality of the predictive business or science case that you will present in the preamble of the Jupyter notebook of your starting kit. The following questions are to guide you in this exercise:

  • What do we want to predict? How will a good prediction improve a key performance indicator (KPI) or lead to a scientific result?
  • How do we measure the quality or value of the prediction in the selected business or science problem? What will be the quantitative score? How does the quantitative score reflect the quality or value of the prediction? How does the (possibly asymmetric) prediction error convert into cost or decreased KPI?
  • Will the predictor be used as decision support, as a part of a fully automated system, or only as part of a report or feasibility study? How will an agent use the system?
  • What data do we need to develop a predictor? Could you find this data? What were the actual data sources? What other sources (private or public) could be exploited? What were and would be the data collection costs?
  • What data cleaning/tidying steps were required to obtain clean training data?
  • Given the data source(s) and the prediction goal, what is the workflow and the workflow elements? Will you need different expertise for the different steps?
  • How fast the phenomena underlying the prediction problem change? How often the model will have to be retrained? What are the associated costs and risks?

The technical quality

The second half of your grade (5/10) will come from the technical quality of your solution. You kit will have to pass the ramp_test_submission test. We will pay close attention to your validation setup (Is the validation reasonable? Do you have enough test data to see significant differences between submissions?). We will also grade the quality of the exploratory analysis and the clarity of the technical explanation of the workflow.


  • December 18: You should arrive to the data camp week prepared, having formed teams and having an approximate idea about the business/science problem you would like to tackle and the potential data sources.
  • December 18 – 22: The data camp week, Ecole Polytechnique, Amphi Faure (9h – 17h). We will have lectures in the morning and guided work and student presentations in the afternoon (students presenting their solutions will get up to 2 bonus points). The tentative program:
  • January 30 20h: deadline of submitting the projects.



The course will require that you develop code in Python. We strongly suggest that you start preparing. You should have a complete Python environment setup on your machine on the first day of the course. We recommend to use Anaconda (https://www.continuum.io/downloads). It includes all required libraries. Here are some necessary resources: numpy, pandas, scikit-learn, xarray. You might want to also install: seaborn, hyperopt, and xgboost. Some of the challenges strongly favor deep learning solutions; we will allow submissions both in pytorch and in keras (with tensorflow backend).

The scikit-learn web site is also a great resource to brush up on your ML skills. The following tutorials are recommended to learn more about pandas and scikit-learn:




The slack forum

During the challenge and the data camp we will be communicating through slack. The workspace URL is:


You should be able to register if your emails ends with:

  • telecom-paristech.fr
  • ensae-paristech.fr
  • polytechnique.edu
  • supelec.fr
  • ensae.fr
  • edu.ece.fr
  • ens-lyon.fr
  • ensta-paristech.fr
  • u-psud.fr
  • eleves.enpc.fr*

* Contact us if you cannot register.

You can also use it for communicating within and between teams.

Data sources

The following is a list of data sources that you may use in your team project. Note however that picking a nice data set and setting up a prediction problem is not enough for a good grade: you also have to make a reasonable business or science case.

Apprentissage et optimisation séquentiels

Enseignant : 

ECTS : 5

Périodes : P1 + P2

Syllabus :


– Connaître le cadre des bandits stochastiques à nombre fini de bras, et le cadre de la prévision de suites arbitraires par agrégation de prédicteurs
– Maîtriser les techniques de preuves de bornes inférieures sur le regret
– Maîtriser les techniques de preuves de bornes supérieures sur le regret

Descriptif du cours

Ce cours est un cours très technique, centré sur les preuves mathématiques ; aucune programmation d’algorithmes ne sera proposée, uniquement des preuves, parfois longues et douloureuses. L’objectif est d’apprendre à poser et modéliser un problème d’apprentissage séquentiel, d’exhiber des algorithmes si possible computationnellement efficaces pour majorer le regret, et de montrer ensuite l’optimalité des bornes obtenues, en prouvant qu’aucun autre algorithme ne peut faire mieux en un sens à préciser. 

Cette démarche (modélisation, algorithme pour la borne supérieure, borne inférieure universelle) est la démarche canonique pour publier des résultats sur un problème donné. Nous verrons également comment rédiger élégamment des preuves. Ce cours est donc fort intéressant pour ceux qui se destinent à une thèse de mathématiques (et ceux-là uniquement).


Une maîtrise et du recul sur les sujets suivants (attention, avoir suivi juste quelques heures de cours sur ces sujets n’est pas suffisant) :
– Théorie de la mesure (ex : lemme de la classe monotone, théorème de Radon-Nykodym)
– Théorie de l’intégration (ex : théorèmes de Fubini)
– Calcul des probabilités (ex : espérances conditionnellement à un vecteur de variables aléatoires)
– Théorie des martingales (ex : théorèmes de Doob pour les sur- ou sous-martingales)
– Notions élémentaires de statistique (ex : intervalles de confiance non-asymptotiques à la Hoeffding)
– Eléments de théorie de l’information idéalement (ex : divergence de Kullback-Leibler, lemme de Pinsker) 

Note finale 

La validation du cours se fera par un examen à mi-parcours et un examen final, chacun en 3h environ. Il n’est pas possible de remplacer l’examen par un projet, ce cours étant théorique. Attention, l’examen a été jugé très difficile à passer les années précédentes par les étudiants n’ayant jamais étudié au préalable la théorie des martingales.

Bayesian Learning for partially observed dynamical systems

Responsables : Sylvain LE CORFF

Crédits : 2,5 ECTS

Syllabus :

Inference and simulation of models with latent data with practical sessions in Python


For the past few years, statistical learning and optimization of complex dynamical systems with latent data subject to mechanical stess and random sollicitations prone to be very noisy have been applied to time series analysis across a wide range of applied science and engineering domains such as signal processing, target tracking, enhancement and segmentation of speech and audio signals, inference of ecological networks, etc.

Solving Bayesian nonlinear filtering and smoothing problems, i.e. computing the posterior distributions of some hidden states given a record of observations, and computing the posterior distributions of the parameters is crucial to perform maximum likelihood estimation and prediction of future states of partially observed time series. Estimators of these posterior distributions may be obtained for instance with Sequential Monte Carlo (SMC), also known as particle filtering and smoothing, and Markov Chain Monte Carlo (MCMC) methods.

For massive data sets and complex models, the dynamics of the underlying latent state or the conditional likelihood of the observations might be unavailable or might rely on black box routines/simulation programs which makes usual approaches unreliable. Construction of (a) estimators of these posterior distributions to estimate uncertainty or (b) simulation under the posterior distribution of the parameters are very complex challenges in this setting.


Each Tuesday (9 a.m. to 12.30 p.m.) from the 17 of september to the 22 of october.

  • Markovian models (specific focus on observation-driven models).
  • Bayesian inference and consistency and asymptotic normality of the
    maximum likelihood estimator.
  • Introduction to Markov chain Monte Carlo algorithms.
  • Some convergence results of Markov chain Monte Carlo algorithms.
  • Particle Gibbs sampling, Particle marginal MCMC.
  • Approximate Bayesian Computation.


  • Master the statistical learning framework and its challenges with dependent
  • Know the inner mechanism of some classical Markovian models with missing
  • Know how to implement (Python) the most classical Markov chain Monte
    Carlo algorithms.
    Metropolis-Hastings, Gibbs, particle-based MCMC.
  • Understand some theoretical tools used to prove some convergence
    properties of Machine learning for such models (maximum likelihood inference,
    ergodicity of MCMC algorithms).

Evaluation: Report on a research article (100%).

Big Data Frameworks

Responsable : Salim NAHLE

Crédits : 5 ECTS

Numerus clausus : 30

Périodes : P1 + P2



The objectives of this course are the following:

  • Discover the different components of a Big Data cluster and how they interact.
  • Understand Big Data paradigms.
  • Understand the benefits of open source solutions.
  • Develop a Big Data project from scratch
  • Master Spark, its data models and its different methods of operation
  • Learn how to use Spark to analyze data, develop Machine Learning pipelines and finally do streaming with Spark
  • Understand and implement distributed algorithms.
  • Understand the advantages of SQL/NOSQL databases.

Description of the course:

The module Big Data Frameworks is composed of two courses:

  • Big Data with Hadoop (27 hours)
  • Data Science with Spark (13 hours)

Big Data with Hadoop:

Apache Hadoop has been evolving as the Big Data platform on top of which multiple building blocks are being developed. This course presents the Hadoop ecosystem, Hadoop Distributed File System (HDFS) as well as many of the tools developed on it:

  • MapReduce and YARN
  • Hive and HBase
  • Kafka, Flume, NiFi, Flink, Oozie, etc.

Students will also discover various subjects such as security, resource allocation and data governance in Hadoop

Data Science with Spark:

Apache Spark is rapidly becoming the computation engine of choice for big data. This course presents:

  • Spark’s architecture and Spark Core: RDDs (Resilient Distributed Datasets), Transformations, and Actions
  • Spark and Structured Data: explore Spark SQL and Spark Data Frames
  • Spark Machine Learning libraries (MLLIB and ML)
  • Spark Streaming


Java, Python, Machine Learning and basic knowledge in Linux system administration and SQL


The final mark of the module is a weighted average of 2 marks:

  • Big Data with Hadoop (weight 2)
  • Data Science with Spark (weight 1)

Each course is evaluated by a midterm exam (coefficient 0.3, 1 hour) and a continuous evaluation (coefficient 0.7, labs and mini-projects).


Convex Analysis and Optimization Theory

Responsables : Pascal BIANCHI & Olivier FERCOQ

Crédits : 5 ECTS

Périodes : P1 + P2

Syllabus :


– Maîtriser les outils mathématiques pour la construction d’algorithmes d’optimisation convexe.
– Savoir démontrer la convergence des itérées.
– Savoir résoudre numériquement des problèmes d’optimisation comportant des termes de régularisation non dérivables et structurés.
– S’initier à l’optimisation distribuée et la programmation sous Hadoop Spark.

Descriptif du cours

Le cours n’a PAS vocation à fournir un répertoire d’algorithmes le plus abondant possible. Il s’agit de prendre du recul afin de comprendre les fondements mathématiques pour la construction d’une vaste classe de méthodes itératives. Après une introduction à la théorie de l’analyse convexe, nous verrons les conditions sous lesquelles on peut démontrer la convergence d’un algorithme du point fixe. Cette approche générale permet de d’obtenir, comme corollaire, la convergence de l’emblématique algorithme du gradient proximal. Elle permet également de construire d’autres algorithmes plus généraux : les méthodes primales-duales.
Ces méthodes permettent de résoudre des problèmes d’optimisation comportant des régularisations complexes et structurées, ou des problèmes d’optimisation sous contraintes. De tels problèmes se rencontrent fréquemment en apprentissage statistique, traitement du signal, et traitement de l’image.

Sur le plan pédagogique, un juste compromis entre fondements théoriques et applications est visé. Deux TP permettront de mettre en application les méthodes numériques vues en cours. Ils incluent une initiation à l’optimisation distribuée et grande échelle, sous Hadoop Spark.

Prérequis : pas de prérequis à l’exception des connaissances élémentaires en analyse convexe : fonctions et ensembles convexes, minimiseurs. Le premier cours est consacré à des rappels.

Note finale : 3/4 examen (3h), 1/8 travail en séance de TP, 1/8 compte-rendu de TP.

Deep Learning I
Introduction to Bayesian learning

Responsables : Anne SABOURIN

Crédits : 2,5 ECTS

This course is an introduction to Bayesian methods for machine learning. As a first go, the main ingredients of Bayesian thinking are presented and typical situations where a Bayesian treatment of the learning task is useful are exemplified. Particular attention is payed to the links between regularization methods and specification of a prior distribution.

The second part of this course concerns the computational challenges of Bayesian analysis. Major approximation methods such as variational Bayes, Monte-Carlo-Markov-Chain sampling and sequential sampling schemes are introduced and implemented in lab session.

Format: 6 x 3.5 hours + exam

Programing language: R

Grading: mini-project (lab) (40%) + written exam (60%)


  • Week 1: Bayesian learning: basics.
    Bayesian model, prior-posterior, examples.Point and interval estimation.

    Prior choice, examples, exponential family

    A glimpse at asymptotics and at computational challenges

    Reading: Berger (2013), chapter 1; Bishop (2006) chapters 1 and 2, Robert (2007) chapter 1, Ghosh et al. (2007), chapter 2 and Robert and Casella (2010), chapter 1 for basic R programming.

  • Week 2: Bayesian modeling and decision theoryNaïve Bayes, KNN Bayesian Linear Regression Bayesian decision theory.

    Reading: Bishop (2006), chapter 3; Berger (2013) chapter 4; Robert (2007) chapter 2.

  • Week 3: Lab session
  • Weeks 4: Approximation methods EM and Variational Bayes, examples. Reading: Bishop (2006), Chapter 10.
  • Week 5 : Sampling methods Monte-Carlo methods, importance samplnig, MCMC (Metropolis-Hastings and Gibs), examples. If time allows it: sequential methods (particle filtering)
    Reading Robert and Casella (2010), (bits of) chapters 3, 4, 6, 7, 8.
  • Week 6:  Lab  session. approximation and sampling methods.


Berger, J. O. (2013). Statistical decision theory and Bayesian analysis. Springer Science & Business Media.

Bishop, C. M. (2006). Pattern recognition and machine learning. springer.

Ghosh, J. K., Delampady, M., and Samanta, T. (2007). An introduction to Bayesian analysis: theory and methods. Springer Science & Business Media.

Robert, C. (2007). The Bayesian choice: from decision-theoretic foundations to com- putational implementation. Springer Science & Business Media.

Robert, C. and Casella, G. (2010). Introducing Monte Carlo Methods with R. Springer Science & Business Media.

* Programming language: R

* Grading: mini-projects 1/2 + written exam 1/2.

* Program

[1] Berger, J. O. Statistical decision theory and Bayesian analysis. Springer Science & Business Media, 2013.

[2] Bishop, C. M. Pattern recognition and machine learning. Springer, 2006.

[3] Robert, C.. The Bayesian choice: from decision-theoretic foundations to computational implementation. Springer Science & Business Media, 2007.

Introduction to Deep Learning with Python

Enseignants : Charles Ollion & Olivier Grisel

ECTS : 2,5

Périodes : P1 + P2

Syllabus :

Introduction to Graphical Models

Responsables : Umut Şimşekli

Crédits : 2,5 ECTS

Syllabus :

  • Lecture 1: Introduction & Probability reminder
  • Lecture 2: Conditional independence, Directed and undirected graphical models,
  • Lecture 3: Factor graphs, Sum-product algorithm, Exponential family distributions
  • Lecture 4: Gaussian Mixture Models and the Expectation-Maximization algorithm
  • Lecture 5: Applications of the Expectation-Maximization algorithm, Non-negative matrix factorization
  • Lecture 6: Hidden Markov Models
  • Lecture 7: Hidden Markov Models, Mini project
Machine Learning

Responsables : Stéphane CLÉMENÇON

Crédits : 2,5 ECTS

Syllabus :

Le cours de Machine Learning est un cours dédié à la mise en place de projet complet
de Data Science :

  • Lors de cet UE, vous développerez en groupe des applications de Machine Learning pour répondre à des préoccupations Business des entreprises : cadrage du Business Case, exploration et nettoyage des données, choix de l’approche scientifique, implémentation numérique d’algorithmes d’apprentissage, analyse des performances, interprétation des travaux, pitch des résultats, etc.
  • L’animation du cours suscite et encourage la participation de tous les étudiants, le travail en équipe et l’intelligence collective.
  • La validation du cours s’effectuera au travers d'un projet Data Science réalisé en groupe.
Modèles à chaîne de Markov cachée et méthodes de Monte Carlo séquentielles

Enseignant : Nicolas Chopin – ENSAE – CREST

ECTS : 2,5

Périodes : P1 + P2

Syllabus :

Les modèles dits à chaîne de Markov cachée (ou à espace d’état), sont des modèles de séries temporelles faisant intervenir un ’signal’ (un processus X_t markovien décrivant l’état d’un système) observé de
façon imparfaite et bruitée sous forme de données, par ex. Y_t = f(X_t) + U_t.

Ces modèles sont très utilisés dans de nombreuses disciplines :

  • Finance : volatilité stochastique (X_t est la volatilité non-observée)
  • Ingénierie : pistage de cible (X_t est la position d’un mobile dont on essaie de retrouver la trajectoire ; reconnaissance de la parole ( X_t est un phonème)
  • Biostatistique : Ecologie (X_t = taille de la population)
  • Epidémiologie (X_t = nombre de personnes infectées)

Le but de ce cours est de présenter les méthodes modernes d’analyse séquentielle de tels modèles,
sur des algorithmes particulaires (Monte Carlo séquentiel). On traitera notamment les problèmes du
filtrage, du lissage, de prédiction, et d’estimation des paramètres. A la fin du cours, nous évoquerons
rapidement l’extension de tels algorithmes à des problèmes non-séquentiels, notamment en

Pré-requis :

cours 2A simulation et Monte Carlo, ou cours similaire Les cours de 3A de ’Statistique Computationnel-le’ et de ’Statistique Bayésienne’ sont conseillés mais non obligatoires.

Acquis de la formation :

A la fin du cours, l’étudiant sera en mesure :

  • d’énoncer les propriétés principales des modèles HMM
  • de mettre en oeuvre un filtre particulaire pour filtrer et lisser un modèle HMM donné
  • d’estimer les paramètres d’un tel modèle à partir de différentes méthodes


  • 1. Introduction : définition des HMM (Hidden Markov models), propriétés principales, notion de
    filtrage, lissage, et prédiction, formules forward-backward.
  • 2. HMM discrets, algorithme de Baum-Petrie
  • 3. HMM linéaire Gaussian, algorithme de Kalman
  • 4. Algorithmes SMC pour le filtrage d’un modèle HMM
  • 5. Estimation dans les modèles HMM
  • 6. Introduction aux applications non-séquentielles des algorithmes SMC

* Del Moral (2004). Feynman-Kac formulae, Springer.
* Cappé, Moulines and Ryden (2010) , Inference in Hidden Markov Models (Springer Series in

Enchères et Matching : apprentissage et approximations

Enseignant : Vianney PERCHET

ECTS : 2,5

Périodes : P1 + P2

Syllabus :

Billions of auctions are run everyday, in the amazingly huge online advertisement market. They require a complete knowledge of the different mechanism, how to improve them using past data and how to learn « good/reasonable/optimal » strategies. Matchings are also nowadays already quite important (allocation of students to universities) but will certainly become more and more used on local markets.

During the 8 lectures, we will first introduce the general concept of mechanism design, and especially auctions (1st/2nd price, combinatorial, VCG) and (stable) matchings that are or can be used in practice. The main questions are their approximation, optimisation and learning based on past data in a dynamical setting. We will introduce and study the main classical tools with a specific focus on prophet inequalities and secretary problems, but also quick reminder on statistical theory, multi-armed bandits and online algorithms. 

This course will be at the intersection of mathematics (statistics, optimization), computer science (complexity, approximation), and economics (strategies, equilibria and applications). Yet it should not have strong prerequisites. The evaluation will be a written exam. 

At the end of this course, students will be able to

  • Compute optimal strategies and design mechanisms (especially auctions and matchings)
  • Find the sample complexity of approximate mechanism
  • Design and analyse online algorithms
  • Prove and generalise prophet inequalities
Statistical Learning Theory

Responsables : Arnak DALALYAN

Crédits : 2,5 ECTS

Périodes : P1 + P2

Syllabus :

The main purpose of this course is to introduce the mathematical formalism of the learning theory and to showcase its relations with more classical statistical theory of nonparametric estimation.

  • Presentation of 3 central problems: regression, binary classification, clustering or density estimation. Connection between these problems.
  • Universal consistency. Overfitting and underfitting. The Hoeffding inequality and empirical risk minimisation. Rademacher complexities.
  • Density estimation by histograms. Bias-variance decomposition and the rate of convergence over Holder classes.
  • Adaptive choice of the bandwidth by the method of estimated unbiased risk minimization. Local choice of the bandwidth by the Lepski method.
  • Nonparametric regression and sparsity. Thresholding Fourier coefficients.
Statistique en grande dimension

Enseignant : Alexandre TSYBAKOV

Crédits : 2,5 ECTS

Périodes :P1 + P2


La statistique en grande dimension est un domaine récent développe au cours de la dernière décennie. Son objectif est de traiter les données nouvelles, telles que, pour chaque individu, on dispose d’un grand nombre de variables observées, qui est parfois plus grand que le nombre des individus dans l’échantillon. Bien évidemment, pas toutes les variables sont pertinentes et d’habitude il en existe très peu. La notion de parcimonie (sparsite) est donc fondamentale pour l’interprétation de données en grande dimension.

Le but de ce cours est de présenter quelques principes fondateurs qui émergent dans ce contexte. Ils sont communs a de nombreux problèmes apparus récemment, tels que la régression linéaire en grande dimension, l’estimation de grandes matrices de faible rang, ainsi que les modèles de réseaux, par exemple, les modèles stochastiques a blocs. L’accent sera mis sur la construction de méthodes optimales en vitesse de convergence et leurs propriétés d’oracle.


  • Modele de suite gaussienne. Sparsite et procedures de seuillage.
  • Regression lineaire en grande dimension. Methodes BIC, Lasso, Dantzig selector, square root Lasso. Proprietes d’oracle et selection de variables.
  • Estimation de grandes matrices de faible rang. Sparse PCA.
  • Inference sur les reseaux. Modele stochastique a blocs (stochastic bloc model).


  • C.Giraud. Introduction to high-dimensional statistics. Chapman and Hall, 2015.
  • A.B.Tsybakov. Apprentissage statistique et estimation non-parametrique. Polycopie de l’Ecole Polytechnique, 2014.
  • S.van de Geer. Estimation and testing under sparsity. Lecture Notes in Mathematics 2159. Springer, 2016.
Optimization for Data Science

Responsables : Alexandre GRAMFORT et Robert GOWER

Crédits : 5 ECTS

Périodes : P1 + P2

Syllabus :
Modern machine learning heavily relies on optimization tools, typically to minimize the so called loss functions on training sets. The objective of this course is to cover the necessary theorical results of convex optimization as well as the computational aspects. This course contains a fair amount of programming as all algorithms presented will be implemented and tested on real data. At the end of the course, students shall be able to decide what algorithm is the most adapted to the machine learning problem given the size the data (number of samples, sparsity, dimension of each observation).

Evaluation :
Labs. 2-3 Labs with Jupyter graded (30% of the final grade).

Project. Evaluate jupyter notebooks. 30% of final grade.

Project 2016: Implement SVM with cvxopt package, derive dual and implement dual solution
(without intercept) with prox grad and coordinate descent (SDCA).

Exam. 3h Exam (40% of the final grade).

To sum up :

Eval type % final grade Remarks
Lab 30% Students have 1 week to upload solutions on moodle
Project 30% To upload on moodle by ???
Examen 40% Final course


2nde période (du 18/11/2019 au 31/01/2020)

Partially observed Markov chains in signal and image

Enseignant : Wojciech Pieczynski

ECTS : 2,5

Syllabus :

Reinforcement learning

Responsable : Erwan LE PENNEC

ECTS : 2,5


Ce cours de 20h propose une introduction à l’apprentissage par renforcement. Il est basé sur la nouvelle édition du livre “Reinforcement Learning: An Introduction” de R. Sutton et A. Barto (disponible en ligne sur la page http://incompleteideas.net/book/the-book-2nd.html).

1. Introduction à l’apprentissage par renforcement et processus de décision markovien
2. Le cas des bandits
3. Méthodes tabulaires: prédiction par programmation dynamique, méthode de Monte Carlo et TD
4. Planification et apprentissage pour les méthodes tabulaires
5.  Méthodes approchées: prédiction, planification et apprentissage

Generalisation properties of algorithms in ML
Theoretical guidelines for high-dimensional data analysis

Responsable : Christophe GIRAUD

Crédits : 2,5 ECTS

Syllabus : 


Goal of the lectures:

  • to draw your attention to some issues in data analysis, and some proposals to handle them;
  • to learn to read a research paper, to catch the take home message and to identify the limits;
  • to favor your own critical analysis.

The lecture will be based on some recent research papers. The presence during the lectures is mandatory and taken into account in the final evaluation.






Further reading


False discoveries, multiple testing, online issue

paper 1 (short review)


Reliability of scientific findings? Online FDR control


Strength and weakness of the Lasso

Paper 1


No free computationnal lunch


Adaptive data analysis

Paper 1


Kaggle overfiting


Curse of dimensionality, robust PCA, theoretical limits

Paper 1 (suppl. material)


Robust PCA


Robust learning

Paper 1


Learning with Median Of Means

Bootstrap and resampling methods in machine learning


Responsable : F. PORTIER

Crédits : 2,5 ECTS

Bootstrap and resampling methods

The purpose of this course is (i) to present the original Bootstrap theory (this shall include application in parameter estimation, regression and functional estimation) and ((i) to study different bootstrap methods that are employed in some Machine learning algorithms. The content of the course will be mostly theoretical and students must have a strong background in mathematical statistics (basic probabilistic tools such as density, distribution, variance as well as convergence concepts including almost-sure and weak convergence).


  • Course 1. Nonparametric bootstrap (Efron’s method). Confidence interval for the empirical mean. Edgeworth development.
  • Course 2. Weighted bootstrap and Bayesien bootstrap. Prove the CLT in exercise.
  • Course 3. Bootstrap in regression. Parametric bootstrap.
  • Course 4. Empirical processes. Application to semiparametric model such as the Cox model. Local estimation (Nadaraya-Watson and k-NN).
  • Course 5. Cross validation.
  • Course 6. Bagging. Boosting.
Computer Vision
Enseignant : Alasdair Newson

ECTS : 2,5

Syllabus :

Bases de traitement de l’image et vision par ordinateur

Fonctionnement d’une caméra, échantillonnage, filtrage, radiométrie, couleur, restauration d’image, morphologie mathématique, segmentation, descripteurs d’image, reconnaissance et détection d’objets, estimation de mouvement, estimation d’arrière-plan


Image processing basics and computer vision

Camera basics, sampling, filtering, radiometry, colour, restoration, mathematical morophology, segmentation, image descriptors, object recognition and detection, motion estimation, background estimation

Estimation non paramétrique

Responsable : Cristina BUTUCEA

Crédits : 2,5 ECTS

Syllabus :

High dimensional matrix estimation

Responsable : Karim LOUNIC

Crédits : 2,5 ECTS

Syllabus :

In several high-dimensional problems, the information of interest take the form of a high-dimensional matrix with a low statistical complexity structure. In multivariate regression applications such as multitask collaborative filtering and recommender systems, we can measure the statistical complexity of the information matrix through group sparsity or the rank. In Principal Component Analysis, the goal is to learn the covariance structure of a high-dimensional random vector. In this problem, the statistical complexity is better quantified by the effective rank.

The goal of this course is to highlight how the complexity structure of a matrix conditions the design and the theoretical analysis of an estimation procedure. To this end, we will introduce several tools: oracle inequalities, minimax theory and concentration inequalities.

Program of the course:

  1. Multi-task regression
  2. Trace regression and matrix completion
  3.  Covariance matrix estimation and Principal Component Analysis

Prerequisites: Basis knowledge of probability and optimization.

Final grade: Project


Enseignant : François YVON

ECTS : 2,5

Syllabus :

Graphical models for large scale content access

Responsables : François YVON

Crédits : 2,5 ECTS

Syllabus :

  1. Introduction 
    Classification de documents, 
    Modèles graphiques orientés
  2. Les modèles de thèmes 
    Mélange de lois multinomiales 
    Algorithme EM 
    Modèle PLSA 
    Modèle LDA
  3. Les modèles structurés 
    Dépendances linguistiques, structure dans les modèles graphiques 
    Retour sur les HMM 
    Modèles d’alignments IBM1, IBM2
  4. Les modèles conditionnels 
    Régression logistique et maximum d’entropie 
    Modèle CRF
  5. Inférence exacte 
    Elimation des variables 
    Passage de messages 
    Algorithme de l’arbre de jonction
  6. Inférence approchée: méthodes variationnelles 
    Propagation de croyances dans des graphes cycliques 
    Principes de l’inférence variationnelle 
    Application à LDA
  7. Inférence approchée: échantillonnage 
    Principe des méthodes d’échantillonage 
    Application au mélangue de lois multinomiales 
    Application à LDA
Recherche opérationnelle et données massives

Enseignant : Zacharies ALES

ECTS : 2,5

Numerus clausus : 10 (hors étudiants ENSTA)

Syllabus :

Première partie : Introduction à la recherche opérationnelle (2,5 ECTS, obligatoire pour les étudiants n’ayant pas suivi de cours de programmation linéaire en nombres entiers) 

– Algorithmique des graphes – Programmation linéaire – Programmation linéaire en nombres entiers 

Seconde partie : RO et données massives (2,5 ECTS) 

I – Approche d’optimisation discrète pour la classification associative 

Présentation et implémentation d’une méthode de classification basée sur la résolution exacte de problèmes d’optimisation discrète permettant l’obtention d’un classifieur performant et interprétable. 

II – Algorithmes pour les grands graphes de terrain 

Comment classer des pages web par popularité ? Comment constituer des listes d’amis automatiquement sur Facebook ? Quels produits recommander à un utilisateur sur Amazon ? Nous verrons quelques algorithmes de graphe et comment les implémenter pour qu’ils passent à l’échelle (1G liens sur un laptop). 

Introduction to Deep Learning with Python part 1

Enseignants : Charles Ollion & Olivier Grisel

ECTS : 2,5

Syllabus : 

Stochastic approximation and reinforcement learning

Enseignants : Pascal Bianchi & Walid Hachem

ECTS : 2,5

Syllabus : 

We first recall some fundamental results in probability theory (martingales, markov chains, etc.). Next, we use these results to study the asymptotic behavior of iterative stochastic algorithms i.e., algorithm for which each iteration depends on the realization of a random variable. This covers many applications (stochastic optimization for machine learning, reinforcement learning, game theory, etc.). We especially emphasize two applications : in optimization, we focus on the analysis of the stochastic gradient descent and its variants; in reinforcement learning, we analyze the convergence of temporal difference learning and Q-learning algorithms.

Prerequisite: Students are expected to have a good background in probability theory. It is strongly advised to follow the course on “Reinforcement learning” prior to this course. Although not mandatory, it is also advised to follow at least one of the following courses :

  • Hidden markov chains and sequential Monte-Carlo
  • Partially observed Markov chains in signal and image
  • Bayesian Learning in partially observed evolving graphical models

Course schedule:

  • Applicative context and mathematical foundations.
  • The ODE method and almost sure convergence techniques in the decreasing step case.
  • Weak convergence techniques and the constant step case.
  • Fluctuations and saddle point avoidance.
  • Applications: Convex and non-convex optimization, Reinforcement learning, Temporal Difference learning, Q-learning


3e période (du 03/02/2020 au 03/04/2020)

Deep learning II

Responsables : Yohan PETETIN

Crédits : 2,5 ECTS

Syllabus :

Optimal Transport: Theory, Computations, Statistics, and ML Applications

Enseignant : Marco CUTURI

ECTS : 2,5

Syllabus : 

8 lectures + 4 practical sessions for a total of 18 hours

3 lectures on theory

  • Monge and Kantorovich Problems, duality in OT, 2-Wasserstein geometry and the Brenier theorem. 
  • Closed forms: Applications to transport between Gaussians, Transport in 1D,
  • Caffarelli contraction theorem, regularity theory (Figalli).

3 lectures on computations and statistics

  • Algorithmic overview: network flow solvers in the discrete world, Benamou-Brenier formula in the PDE world.
  • Statistical results and the curse of dimensionality
  • Regularized approaches to compute optimal transport.

2 lectures on applications

  • Handling measures with the Wasserstein geometry: computation of barycenters, clusters
  • Automatic differentiation with the Sinkhorn algorithm. Wasserstein regression
  • Wasserstein GANs
  • Applications to Biology (cell pathways) and NLP (alignment of multilingual corpora)

 4 practical sessions

  • 1D transport, transport between Gaussians, network flow solver type algorithms
  • Sinkhorn algorithm, color transfer, retrieval, biology.
  • Sorting using the Sin
Introduction to compressive sensing

Enseignant : Guillaume LECUÉ

ECTS : 2,5

Syllabus :

L’objectif de ce cours est d’étudier des probl`emes de statistiques en grandes dimensions afin de dégager trois idées fondamentales de cette thématique qui pourront par la suite être appliquées dans de nombreux autres probl`emes liés aux sciences des données. Nous énonçons bri`evement ces principes : 

  1. Un grand nombre de données réelles appartiennent `a des espaces de grandes dimensions dans lesquels les méthodes classiques de statistiques sont inefficaces (cf. fléau de la dimension). Néanmoins ces données sont pour la plupart d’entre elles structurées. Si bien que la “vraie” dimension du probl`eme n’est plus celle de l’espace ambiant mais plutôt celle de la structure qui contient l’information utile des données. On parle de données structurées ou parcimonieuses. La construction de bases ou dic- tionnaires permettant de révéler les structures de faible dimension de ces données est une composante importante de la statistique en grande dimension. 
  2. En premi`ere approche, la recherche de ces structures de faible dimension semble nécessiter le lancement d’une recherche combinatoire dans un espace de grande dimension. De telles procédures ne peuvent pas être utilisées en pratique. Une composante importante de la statistique en grande dimension est alors de proposer et d’analyser des algorithmes qui peuvent être implémentés même dans des espaces de grande dimension. Pour cela, deux approches ont reçu une attention particuli`ere : la relaxation convexe (couplé `a la boˆıte `a outils de l’optimisation convexe) et les algorithmes itératifs qui permettent de résoudre parfois des probl`emes d’optimisation non-convexe. 
  3. Finalement, la troisi`eme composante est le rôle joué par l’aléatoire dans la statistique en grande dimension. Il s’av`ere que les structures de faibles dimensions sont généralement révélées par des objets aléatoires et que, jusqu’`a maintenant, on ne sait pas exhiber ces structures `a l’aide de mesures déterministes aussi efficacement que le font, par exemple, les matrices aléatoires. 

Un cours de statistiques en grande dimension peut donc couvrir plusieurs pans des mathématiques dont la théorie de l’approximation, l’optimisation convexe et les probabilités. Dans ce cours, nous étudierons principalement l’aspect algorithmique et probabiliste de cette théorie. La théorie de l’approximation ne sera que tr`es bri`evement abordée au travers de l’exemple des images. 

Ce cours abordera le paradigme de la statistique en grande dimension principalement autour de trois thématiques : 

  1. Compressed sensing: probl`eme de reconstruction exacte et approchée d’un signal de grande dimen- 

sion `a partir d’un petit nombre de mesures linéaires de ce vecteur sachant qu’il a un petit support; 

  1. complétion de matrice / syst`eme de recommandation: comment compléter une matrice `a 

partir de l’observation d’un petit nombre de ses entrées sachant que cette matrice est de faible rang; 

  1. détection de communauté dans les graphes: trouver les sous-graphes de forte densité dans des 

’grands’ graphes. 

Nous abordons donc le probl`eme de la statistique en grande dimension au travers de trois objets/ types de données clefs pour la science des données : les vecteurs de grande dimension mais parcimonieux, les matrices de grande taille mais de faible rang et finalement, les graphes de ’grande’ taille dont les noeuds sont organisés en communautés. 

Le probl`eme de Compressed Sensing sera utilisé comme le principale vecteur pédagogique pour l’apprentissage des trois idées clefs de la statistique en grandes dimensions mentionnés précédemment. On y consacrera donc 8 séances divisées comme suit : 5 (ou 4) séances de cours, 2 séances d’exercices et 1 (ou 2) séances de pra- tiques informatiques (le temps consacrait en TP dépendra du goût des él`eves au fil du cours). Puis nous 

consacrerons les 4 derni`eres séances aux probl`emes de complétion de matrices et de détection de commu- nautés: 1 séance de cours/exercices et 1 séance d’informatique pour chacune des deux thématiques. 

D’un point de vue de la technique mathématiques nous mettrons l’accent sur les thématiques suivantes : 

  1. concentration de variables aléatoires et calcul de complexité; 
  2. méthodes et analyse d’algorithmes en optimisation convexe. 

Les séances de travaux pratiques informatiques s’effectueront avec le langages Python. On mettra particuli`erement l’accent sur les librairies classiques en science des données: sklearn, cvxopt et networkx. 

  • A l’issue de ce cours, les étudiants doivent être capable de 
  1. identifier les propriétés computationnelle de certains probl`emes d’optimisation, en particulier, identifier 

les probl`emes computationnelle difficile, 

  1. trouver des relaxation convexes pour ces probl`emes d’optimisation (non-convexe) 
  2. comprendre le rôle de l’aléatoire pour ces probl`emes, en particulier pour construire des vecteurs de 

mesures adéquates 

  1. utiliser les probl`emes de compressed sensing, matrice completion et de détection de communauté 

comme des probl`emes standard 

  1. construire des algorithmes permettant d’approcher les solutions des probl`emes d’optimisation “con- 


Kernel Techniques with Information Theoretical Applications

Enseignant : Zoltán Szabó

ECTS : 2,5

Syllabus :


To master the mathematical tools of learning with kernels, with particular focus on the estimation of divergence and statistical independence measures, hypothesis testing on structured domains, and their applications.

Description of the course

Kernel methods are among the most popular and influential tools in machine learning and statistics, with superior performance demonstrated in a large number of areas and applications. The key idea of kernels is to extend the traditional notion of inner product and hence data analysis from R^d to various domains, including for example time series, strings, sets, random variables, rankings, or graphs.

The course is divided into two parts:

  1. The first part is dedicated to the construction of kernels, the associated reproducing kernel Hilbert space (RKHS), and their fundamental properties. We will cover more classical applications of these tools in kernel based dimensionality reduction and supervised learning such as kernel (i) principal component analysis (KPCA), (ii) ridge regression, (iii) classification, or (iv) (structured) sparse coding.  
  2. The second part of the class is geared towards applications of kernel techniques in divergence/dependency estimation and hypothesis testing. Emerging methods in these fields can not only lead to state-of-the-art algorithms in several contexts (such as blind signal separation, feature selection, outlier-robust image registration, remote sensing, criminal data analysis), but they also come with elegant performance guarantees, complementing the regular statistical tools restricted to unstructured Euclidean domains. We are going to construct features of probability distributions which will enable us to define easy-to-estimate distances of random variables  and independence measures. As a byproduct, we will get non-parametric extensions of the classical t-test (two-sample test), the Pearson correlation test (independence test), and goodness-of-fit test. We will cover both quadratic-time and recent accelerated (linear-time) techniques. In this part we are going to learn about kernel canonical correlation analysis (KCCA), mean embedding, maximum mean discrepancy (MMD), integral probability metrics, characteristic and universal kernels, Hilbert-Schmidt independence criterion (HSIC), Stein discrepancy, energy distance and distance covariance. 


The course requires a basic knowledge of probability theory and functional analysis. 


Exam: Project. Topics: 

  • applications: feature selection, outlier-robust image registration, criminal data analysis, remote sensing, natural language processing, media analysis (image/video), blind source separation. 
  • tools: KPCA, KCCA, kernel ridge regression and classification, kernel structured sparse coding, MMD, HSIC, Stein discrepancy, energy distance, distance covariance, two-sample, independence and goodness-of-fit testing. 
Machine Learning Business Case

Enseignants : Alexandre ARAUJO & Ghislain de PIERREFEU

ECTS : 2,5

Syllabus : 

Missing Data and causality

Enseignant : Julie JOSSE

ECTS : 2,5

Syllabus : 

Mixed effects models: methods, algorithms and applications in life sciences

Enseignant : Marc LAVIELLE

ECTS : 2,5

Syllabus : 


Population models describe biological and physical phenomena observed in each of a set of individuals, and also the variability between individuals. This approach finds its place in domains like pharmacometrics when we need to quantitatively describe interactions between diseases, drugs and patients. This means developing models that take into account that different patients react differently to the same disease and the same drug. The population approach can be formulated in statistical terms using mixed effects models.

Such framework allows one to represent models for many different data types including continuous, categorical, count and time-to-event data. This opens the way for the use of quite generic methods for modeling these diverse data types. In particular, the SAEM (Stochastic Approximation of EM) algorithm is extremely efficient for maximum likelihood estimation and has been proven to converge in quite general settings. SAMBA (Stochastic Approximation for Model Building Algorithm) allows to automatically build a mixed effects model by optimizing a penalized likelihood criterion in an iterative way. Once the model is built, it must be validated, i.e. each of the hypotheses made on the model must be tested. We will see how to construct unbiased hypothesis tests in the framework of mixed effects models.

All these algorithms are implemented in software tools (R packages, Monolix) that will be used for modelling and simulating pharmacokinetics and pharmacodynamics, infectious disease or tumor growth processes.

Multi-object estimation and filtering

Enseignant : Daniel CLARK

ECTS : 2,5

Syllabus : 

1 Overview 

Methods for estimating multiple objects from sensor data are in increasing de- mand and are critically important for national security. For example, the in- creasing use of space for defence and civil applications makes it imperative to protect space-based infrastructure. Advanced surveillance capabilities are needed to be able to identify and monitor activities in earth’s orbit from a variety of different sensing platforms and modalities. 

There have been a number of important innovations in multitarget tracking and multisensor fusion in recent years that have had significant international impact across different application domains. In particular, the suite of math- ematical tools, such as point process models, have been developed specifically to enable such innovations. Considering systems of multiple objects with point process models adopted from the applied probability literature enables advanced models to be constructed in a simple way. This course draws together mathemat- ical concepts from diverse domains to provide a strong grounding for developing new algorithms for practical applications. 

This course will investigate mathematical concepts in multiobject estimation to enable prospective researchers to better understand and contribute to innova- tions in this field. The goal is to develop a broad mathematical perspective for mathematical modelling for multi-object estimation and explore the literature in spatial statistics and point processes to aid new advances in sensor fusion for the development of future technologies for autonomous systems. 

2 Course content 

The topics have been selected to cover the fundamental topics required for the development and implementation of practical algorithms for multi-sensor fusion. The course will cover fundamental mathematical topics, in estimation theory, information theory, and point process theory as follows. 

  • Bayesian filters: Kalman filter, extended Kalman filter, unscented Kalman filter, sequential Monte Carlo (particle) filtering, Gaussian mixture filter- ing 
  • Performance bounds and analysis: Fisher information, Cramér-Rao lower bound, consistency and bias. 
  • Topics in combinatorics: generating functions, Bell polynomials, par- titions. 
  • Topics in functional calculus: differentials, functional derivatives, gen- erating functionals 
  • Point process statistics: the intensity function, covariance and corre- lation, moments and cumulants 
  • Point process descriptions: the probability generating functional, the Laplace functional 
  • Point process parameterisations: Bernoulli, Poisson, Panjer, i.i.d. cluster process, Poisson-binomial 
  • Topics in multi-target tracking: modelling and derivation of point process filters, application with Gaussian mixture and particle filters, 
  • Metrics: mean-squared error, Hausdorff distance, OSPA metric 
  • Practical applications: simultaneous localisation and mapping (SLAM), tracking multiple targets and camera calibration, distributed multi-sensor multi-target tracking. 
  • Topics in information: Shannon entropy, Kullback-Leibler divergence, Rényi entropy, mutual information, channel capacity. 

3 Method of delivery 

The course will comprise of 15h lectures lectures and 15h tutorial and practical work. The assessment will be 30% coursework and 70% exam. 

4 Industrial engagement 

Competency in this domain is in high demand for defence and national security organisations. A workshop showcasing work in industry is planned and opportu- nities for collaboration on project with industrial and governmental partners will be communicated to students. Provisional commitments for opportunities have been provided by CNES, NATO CMRE, Naval Group, SAFRAN, Fraunhofer FKIE, Thales, Dstl and AFRL. 


Natural language processing

Enseignant : Chloé CLAVEL

ECTS : 2,5

Syllabus :

Text mining is a progressing and challenging domain. For example, a lot of efforts have been recently dedicated to the development of methods able to analyze opinion data available on the social Web. The first objective of this course is to tackle the different methods of language processing and machine learning underlying text and opinion mining. 

During this course, the students will acquire theoretical and technical skill on advanced machine learning methods for natural language processing. 

The techniques and concepts that will be studied include:

  • natural language pre-processing : tokenization, part-of-speech tagging, document representation and word embeddings techniques
  • natural language resources : lexicons, wordnet
  • text clustering and text categorization : advanced machine learning methods such as deep learning.
Projet Big Data & Assurance

Enseignant : Denis OBLIN

ECTS : 2,5

Syllabus :

Le but du cours est de présenter la construction d’un projet data dans le secteur de l’assurance.

L’accent sera mis sur les difficultés rencontrées :

  • captation des données
  • questions organisationnelles et relationnelles entre directions, 
  • maturation des besoins derrière l’intuition initiale
  • coaching des datascientists en dehors de leur code

Des professionnels pourraient partager leur retour d’expérience sur la mise en place de projets Big Data dans leur entreprise.

Les points abordés lors du cours seront :

  • Assureurs : quel patrimoine de données aujourd’hui ?
  • Les 3 étages de la valorisation de l’entreprise par les données (opérationnel, nouveaux services, nouveau business model), valorisation niveau opérationnel : tour d’horizon d’utilisation de la data, direction par direction dans une compagnie d’assurance, autres niveaux de valorisation : tour d’horizon des start up insurtech et leur exploitation de la donnée (alan, simply, …)
  • Quelle organisation data ? faut-il un datalab ? comment le piloter ?
  • Big Data / actuariat : complémentarité ou concurrence ? comment les rapprocher ?
  • Que dit le droit sur l’utilisation des données : RGDP, pack assurances de la CNIL, directive distribution assurances, … ?
  • Ce que les objets connectés changent pour l’assurance : relation client, évaluation du risque, nouvelle matière assurable, assurance courte durée.
Infrastructure de données

Enseignant : Nicolas TRAVERS

ECTS : 2,5

Syllabus :


Le cours “infrastructure de données” a pour but de décrire la manière de modéliser les données et de les distribuer de manière efficace dans une infrastructure distribuée dédiée à la gestion de données à large échelle.
En passant par des techniques de dénormalisation, d’optimisation de requêtes via sharding et indexation, l’étude des contraintes des systèmes d’information (cohérence, persistance, distribution, etc.), l’étude des solutions existantes et de leur spécificités, le cours permettra à l’élève d’intégrer l’éco-système NoSQL pour la gestion de données et de faire son choix d’infrastructure de données adéquate en fonction de ses propres besoins d’interrogation.

En détails :

  • Panorama des bases de données distribuées 
  • Placement des données et Sharding  – HDFS, Clustered index, Consistent Hashing
  • Modélisation de données – Relationnel => orienté colonnes / documents / graphes
  • Méthode pour la dénormalisation et modèle de coût NoSQL 
  • Caractéristiques de solutions NoSQL : MongoDB, Cassandra, Neo4j, Elasticsearch
  • Mise en pratique :
    • Prise en main d’un cas d’usage réel : modèle de données et requêtes
    • Dénormalisation du cas d’usage
    • Calcul du coût de la solution
    • Implantation des interrogations dans un environnement dédié


MCC : 

  • ½ : Présentation du cas d’usage et des choix d’infrastructure
  • ½ : Examen écrit, tout document autorisé et ordinateur non connecté.


Structured Data : learning and prediction

Enseignant : Florence D’ALCHÉ-BUC

ECTS : 2,5

Syllabus : 


Systems for Big Data Analytics

Enseignant : Angelos ANADIOTIS

ECTS : 2,5

Syllabus :


Tail events analysis: Robustness, outliers and models for extreme values

Enseignant :

ECTS : 2,5

Syllabus : 

Analysis of events in the tail of the distribution constitutes an important topic in statistics. The aim of the course is threefold: construction of the estimators less sensible to contamination of the data, identification of the outlying observations, and modeling of extreme events.

The course starts with the introduction to robust statistics and measures of robustness where the concepts of the influence function and of the breakdown value are given. Then, the simplest univariate robust estimators of location, scale, and skewness are regarded which behave consistently even if the data is contaminated. These are further generalised to robust estimators of multivariate location and scatter (Stahel-Donoho estimator, minimum covariance determinant estimator, S-estimators, MM-estimators). Robust regression as well as PCA estimators are also considered.

An important notion in robust statistics—data depth—is then introduced for the multivariate and functional framework. Presentation of the concept of the data depth function is followed by studying most important depth notions such as Tukey or projection depth, and (multivariate) functional depths. The regarded above material is then applied to detection of outliers in multivariate and functional data, as well as the cell-wise outliers.

Finally an introduction to extreme values analysis is given. Here, extremes are defined as the largest values of a considered dataset. Extreme value theory suggests natural models for block-maxima and excesses above large thresholds, which give rise to estimates of quantities of major interest for risk management, such as high quantiles, large return levels, or tail probabilities outside the range of observed data. This aims on providing guidelines to the students for applying extreme value models to answer such practical questions.

Format : 6 3:5 hours + exam

6 3:5 hours + exam

Programing language : R


Grading : 60% Exam, 40% Lab II

Syllabus :

  • Week 1: Introduction to robust statistics.
    Measures of robustness: influence function, breakdown value. Univariate robust estimators of location, scale, skewness. Multivariate location and scatter estimators. Robust regression and robust PCA.
    Reading:Rousseeuw and Leroy (1987); Huber and Ronchetti (2009); Wilcox (2016); Rousseeuw and Driessen (1999); Hubert et al. (2005), Cornillon et al. (2012) for (basic) R programming.
  • Week 2: Lab session I.
    Univariate robust estimation.
    Robust multivariate estimation: projection pursuit, minimum covariance determinant estimator.
    Robust regression and ROBPCA.
  • Week 3: Data depth
    Statistical data depth function: definition and properties, chosen notions.
    Data depth in infinite-dimensional setting.
    Identification of multivariate (row-wise and cell-wise) and functional outliers.
    Reading: Becker et al. (2013); Wilcox (2016);
    Tukey (1975); Donoho and Gasko (1992); Zuo and Serfling (2000); Hubert et al. (2015); Rousseeuw and Bossche (2018).
  • Weeks 4: Extreme value statistics.
    The one dimensional case, distribution of maxima of large datasets, excesses above high thresholds.
    Inference methods (tail index estimation, block maxima, Peaks-over-thresholds). Case studies. £
    Reading: Coles et al. (2001); Beirlant et al. (2006); Resnick (2013); De Haan and Ferreira (2007).
  • Week 5 : Multi-dimensional setting.
    Regular variation. Exponent and angular measure.
    Reading: Resnick (2007, 2013).
  • Week 6: Lab session II.
    Data depth: Tukey, projection, spatial depths, functional depths, applications to outlier detection.
    Extreme value analysis: Return levels, probabilities of failure.



Becker, C., Fried, R., and Kuhnt, S. E. (2013). Robustness and Complex Data Structures: Festschrift in Honour of Ursula Gather. Springer, Berlin–Heidelberg.

Beirlant, J., Goegebeur, Y., Segers, J., and Teugels, J. L. (2006). Statistics of extremes: theory and applications. John Wiley & Sons.

Coles, S., Bawa, J., Trenner, L., and Dorazio, P. (2001). An introduction to statistical modeling of extreme values, volume 208. Springer.

Cornillon, P., Guyader, A., Husson, F., Jegou, N., Josse, J., Kloareg, M., Matzner- Lober, E., and Rouviére, L. (2012). R for Statistics. Chapman and Hall/CRC, New York.

De Haan, L. and Ferreira, A. (2007). Extreme value theory: an introduction. Springer Science & Business Media.

Donoho, D. L. and Gasko, M. (1992). Breakdown properties of location estimates based on halfspace depth and projected outlyingness. The Annals of Statistics, 20(4):1803–1827.

Huber, P. J. and Ronchetti, E. M. (2009). Robust Statistics. Second Edition. John Wiley & Sons, Hoboken.

Hubert, M., Rousseeuw, P. J., and Branden, K. V. (2005). Robpca: A new approach to robust principal component analysis. Technometrics, 47(1):64–79.

 Hubert, M., Rousseeuw, P. J., and Segaert, P. (2015). Multivariate functional outlier detection. Statistical Methods and Applications, 24(2):177–202.

Resnick, S. I. (2007). Heavy-tail phenomena: probabilistic and statistical modeling. Springer Science & Business Media.

Resnick, S. I. (2013). Extreme values, regular variation and point processes. Springer.

Rousseeuw, P. J. and Bossche, W. V. D. (2018). Detecting deviating data cells. Technometrics, 60(2):135–145.

Rousseeuw, P. J. and Driessen, K. V. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41(3):212–223. Rousseeuw, P. J. and Leroy, A. M. (1987). Robust Regression and Outlier Detection. John Wiley & Sons, New York.

Tukey, J. W. (1975). Mathematics and the picturing of data. In James, R., editor, Proceedings of the International Congress of Mathematicians, volume 2, pages 523– 531. Canadian Mathematical Congress.

Wilcox, R. (2016). Introduction to Robust Estimation and Hypothesis Testing. 4th Edition. Academic Press, Amsterdam.

Zuo, Y. and Serfling, R. (2000). General notions of statistical depth function. The Annals of Statistics, 28(2):461–482.

60% Exam, 40% Lab II

Time series for financial Data

Enseignant : François ROUEFF

ECTS : 2,5

Syllabus :

General objectives 

The main goal of this course is to introduce and explain the statistical methods used for the analysis and forecasting of certain financial time series. This domain of applications have given rise to substantial modeling efforts in the last decades, which allow one to consider many types of financial time series (price returns, rates, transactions data): linear time series, conditionally heteroscedastic time series, multivariate time series, discrete time series and so on. The main classes of linear and non-linear models will be introduced as well as the statistical methods associated to them. 

The main prerequisites to attend this course are the bases of linear algebra, Hilbert geometry, probability and statistics. 

Core information 

Schedule: “Semaine intensive 2” 

Evaluation: Case study report in addition to an oral or a written exam. 


The precise outline is as follows. 

  • Day I : Crash course on financial time series. 

Lecture 1 An introduction using the likelihood as a guideline. In this first lesson, we use the likelihood function as a guideline for statistical modeling of time series. The likelihood function is essential for statistical inference as it fully exploits the way one models the data. The goal of this lesson is to understand why and how, in the context of time series, the dynamics contained in the data appear in the likelihood. We will follow the following outline: 

1) Examples of financial time series. 2) Reminders: i.i.d. models. 

  1. a) Univariate models. b) Multivariate models. c) Regression model. d) Hidden variables. 3) Introducing dynamics. 
  2. a) What’s wrong with i.i.d. models ? b) Univariate models c) Multivariate models Lecture 2 Stationary and weakly stationary time series. Stationarity is an assumption underlying all statistical inference procedures. This notion has to be understood in the context of time series, where the dynamics of the data is essential in the modeling, as seen in Lesson 1. The goal of this lesson is to define stationary and weakly stationary time series and to understand these definitions through examples and practical questions such as detrending. We also introduce the main tools for the statistical analysis of linear L2 time series. 

1) Stationary Time series 

  1. a) The statistical approach b) Classical steps of statistical inference c) Random processes in a nutshell d) Examples e) Stationary time series 2) Weakly stationary time series 
  2. a) L2 processes b) Weak stationarity c) Spectral measure d) Empirical estimation 
  • Day II : Linear prediction 

Lecture 3 Linear prediction and ARMA models. Linear prediction relies only on the second order properties of the process. Practical algorithms such as the Levinson algorithm or the Innovation algorithm are derived in this context. The objects developed for linear pre- diction, such as correlation, partial correlation and innovation, are also used for defining and understanding AR, MA, and ARMA models. In order to understand the proper- ties of these models, some preliminary work is required on general l1 convolution filters. ARMA models are widespread parametric linear models for time series. They can be characterized easily using the autocovariance function and the partial autocorrelation function. 

1) Linear prediction 

  1. a) Prediction VS linear prediction b) Linear prediction for weakly stationary processes c) Innovation process 2) Composition and inversion of l1 convolution filters 
  2. a) Example b) General results c) Inversion of a finite order filter 3) ARMA processes 
  3. a) ARMA equations, stationary solutions b) Innovations of ARMA processes c) Characterization of MA processes d) Characterization of AR processes 

Case Study 1 ARMA modeling 

  • Day III : Heteroscedastic models. 

Lecture 4 Modeling volatility in financial data. Volatility is a essentially used as a measure of the risk in financial time series. The main limitation of linear models is that their volatility is constant. Introducing heteroscedastic models while preserving the stationarity can be done by conditioning. The main part of the lesson is dedicated to the class of GARCH processes, which are built on the idea that conditional volatility can be made random in a way similar to the conditional mean of ARMA processes. We will compare such models with the sthochasitic volatility model, where the volatility is exogeneous. 

1) Standard models for financial time series 

  1. a) Statistical properties of returns b) What’s wrong with ARMA models? 
  1. c) Stochastic volatility models d) ARCH and GARCH models 2) Explicit construction of GARCH processes a) Construction from an IID sequence b) Stochastic autoregressive models c) Stationary non-anticipative solutions d) Empirical study 

Case Study 2 GARCH & EGARCH modeling of log returns 

  • Day IV : Multivariate financial time series. 

Lecture 5 Multivariate time series analysis. It has been known for a long time that investment requires diversification. To optimize a portfolio, it is indeed required to model the joint behavior of a panel of assets. The main goal of this lecture is to introduce the main tools for modeling and estimating the second order statistics of multivariate time series. Most of the classical linear models such as ARMA or VARMA models can be embedded in the class of dynamic linear models, where the dynamics are essentially carried out through a vector state variable, which is not directly observed. Efficient algorithms for filtering, forecasting and computing the likelihood will be presented. 

1) Second order statistics 

  1. a) Bases of Portfolio management b) Autocovariance matrices c) Spectral and cross-spectral density functions 2) Dynamic linear models a) General setting b) Main algorithms c) Illustrative example 

Case Study 3 Fitting and forecasting realized volatility. 

  • Day V : Concluding day 

Conference A data scientist view on banks and hedge funds. What you need to know before getting 

in the industry. By Antoine Pichot (Systematica Investments). 

Report writing Finish and write a report on one of the case studies started before. The assigned case 

study will be made available here at 1 PM on Friday.

Introduction to Deep Learning with Python part 2

Enseignants : Charles OLLION & Olivier GRISEL

ECTS : 2,5

Syllabus : 

4e période : Stage

Stage obligatoire d’une durée minimum de 14 semaines à compter du mois d’avril.