Abstract
Probabilistic unsupervised learning aims to capture the generating distribution of data. In this setting, learning an interpretable model is challenging. Many models assume that the generating distribution is inherently Euclidean. We may achieve more insight by relaxing this constraint on the generating function. By considering, e.g. Riemaniann geometry, we can compute more distance. Our goal is to learn the geometry of data which allows for computing distances under Riemannian geometries to achieve more interpretable models.
Topology is the first step to capturing the geometry of data, and specifically, we employ symmetries in data as a proxy for topologies. We develop a simple workflow for detecting symmetries using tools from topological data analysis and investigate if the symmetry is preserved under dimensionality reduction in four models. Our quantitative study shows that these algorithms frequently break symmetry, highlighting current visualisation tools’ shortcomings. This is somewhat concerning, but our results indicate that the likelihood is a good indicator for preserved topology in the Gaussian process latent variable model.
Decoders built on Gaussian processes (GPs) are enticing due to the marginalisation over the non-linear function space) which further motivates our work with the Gaussian process latent variable model (GPLVM). Such models are expensive and, therefore, often scaled with variational inference and inducing points, but these are challenging to train. We develop the stochastic active sets approximation, a scalable and robust training scheme for GPLVMs that leads to interesting latent representations with more structure than the Bayesian GPLVM and comparable variational autoencoders.
We attempt to build the geometry into the prior in a GPLVM. We develop the computational framework for this model, and we consider the Riemannian Brownian motion as a suitable choice fo prior for this purpose. We fit this to a GP manifold, and though we have the needed components, we fail to implement training of the model.
The intention of capturing geometry is to capture the essence of the generating process of observations. This has the potential to synthesise patterns in large amounts of data which humans would otherwise be unable to grasp and provides insights in a humanly interpretable format. This could enable humans to learn from machine learning.
Topology is the first step to capturing the geometry of data, and specifically, we employ symmetries in data as a proxy for topologies. We develop a simple workflow for detecting symmetries using tools from topological data analysis and investigate if the symmetry is preserved under dimensionality reduction in four models. Our quantitative study shows that these algorithms frequently break symmetry, highlighting current visualisation tools’ shortcomings. This is somewhat concerning, but our results indicate that the likelihood is a good indicator for preserved topology in the Gaussian process latent variable model.
Decoders built on Gaussian processes (GPs) are enticing due to the marginalisation over the non-linear function space) which further motivates our work with the Gaussian process latent variable model (GPLVM). Such models are expensive and, therefore, often scaled with variational inference and inducing points, but these are challenging to train. We develop the stochastic active sets approximation, a scalable and robust training scheme for GPLVMs that leads to interesting latent representations with more structure than the Bayesian GPLVM and comparable variational autoencoders.
We attempt to build the geometry into the prior in a GPLVM. We develop the computational framework for this model, and we consider the Riemannian Brownian motion as a suitable choice fo prior for this purpose. We fit this to a GP manifold, and though we have the needed components, we fail to implement training of the model.
The intention of capturing geometry is to capture the essence of the generating process of observations. This has the potential to synthesise patterns in large amounts of data which humans would otherwise be unable to grasp and provides insights in a humanly interpretable format. This could enable humans to learn from machine learning.
Original language | English |
---|
Publisher | Technical University of Denmark |
---|---|
Number of pages | 181 |
Publication status | Published - 2023 |