A new paradigm of generative models has emerged in AI in the last decade. It aims at designing a generative model mimicking a distribution described by a data set. It has mainly two applications: data augmentation, i.e. to generate new data statistically coherent with those of the initial (training) data set; digital twin, i.e. to replace a costly physical simulation model with an easy-to-use one. 

 

This has huge applications in image generation, fashion pictures, chemical molecules... this is called sometimes deepfakes.

 

A generative model is usually made of a random input (the latent space) and of a deterministic transformation function (usually made of neural networks) mapping the latent space into the data space.

In this course, we will describe how to mathematically model and solve this problem, starting from basic concepts and going to more advanced tools. We will investigate (with mathematical tools)

- how to measure the distance between probability distributions, the relations between these distances

- existence of transformation between two given distributions

- the design of the optimization problem of the generative model

- possible parameterizations based on GAN, VAE, SDE

- role of the latent dimension in the search for a numerical solution, quality/complexity bounds on examples

- some statistical analysis for the sampling effect