Over the last few decades an ever-increasing amount of data is being collected in a wide range of applications. This has boosted the development of mathematical models that are able to analyze it and discover its underlying structure, and use the extracted information to solve a multitude of diﬀerent tasks, such as for predictive modelling or pattern recognition. The available data is however often complex and high-dimensional, making traditional data analysis methods ineﬀective in many applications. In the recent years there has then been a big focus on the development of more powerful models, that need to be general enough to be able to handle many diverse applications and kinds of data. Some of the most interesting advancements in this research direction have been recently obtained combining ideas from probabilistic modelling and deeplearning. Variationalauto-encoders(VAEs), that belong to the broader family of deep latent variable models, are powerful and scalable models that can be used for unsupervised learning of complex high-dimensional data distributions. They achieve this by parameterizing expressive probability distributions over the latent variables of the model using deep neural networks. VAEs can be used in applications with static data, for example as a generative model of images, but they are not suitable to model temporal data such as the sequences of images that form a video. However, a major part of the data that is being collected has a sequential nature, and ﬁnding powerful architectures that are able to model it is therefore fundamental. In the ﬁrst part of the thesis we will introduce a broad class of deep latent variable models for sequential data, that can be used for unsupervised learning of complex and high-dimensional sequential data distributions. We obtain these models by extending VAEs to the temporal setting, and further combining ideas from deep learning (e.g. deep and recurrent neural networks) and probabilistic modelling (e.g. state-space models) to deﬁne generative models for the data that use deep neural networks to parameterize very ﬂexible probability distributions. This results in a family of powerful architectures that can model a wide range of complex temporal data, and can be trained in a scalable way using large unlabelled datasets. In the second part of the thesis we will then present in detail three architectures belonging to this family of models. First, we will introduce stochastic recurrent neural networks (Fraccaro et al., 2016c), that combine the expressiveness of recurrent neural networks and the ability of state-space models to model the uncertainty in the learned latent representation. We will then present Kalman variational auto-encoders (Fraccaro et al., 2017), that can learn from data disentangled and more interpretable visual and dynamic representations. Finally, we will show that to deal with temporal applications that require a high memory capacity we can combine deep latent variable models with external memory architectures, as in the generative temporal model with spatial memory of Fraccaro et al. (2018).
|Number of pages||146|
|Publication status||Published - 2018|
|Series||DTU Compute PHD-2018|