Abstract
All generative models use representations of data to arrive at their results and understanding these representations is a natural gateway to understanding the models. This thesis makes three excursions into the world of representations of generative models.
1. We explore how diffusion models can benefit from more flexible representations using a learned encoder. During their generation process, diffusion models transform Gaussian noise into clean data, for example images, over a large number of time steps. The models are trained using a loss with three parts: The latent loss, which is how far most noisy image is from a true Gaussian; the diffusion loss, which is how well the model takes a step from a more to slightly less noisy image; and the reconstruction loss, which depends on how well the model goes from the smallest amount of noise to no noise at all. We find that using a learned encoder lets the model retain less information from the image at the noisiest time step (better latent loss), but still be as good or even slightly better at taking the steps from more to less noisy image (same or better diffusion loss).
2. We investigate how the representations of language models are affected by phenomena commonly found in high dimensional data. Measuring distances between high-dimensional vectors often runs into a problem called concentration of distances, where the vectors are very sparsely distributed and there is very little difference between the smallest and largest distance. This can also lead to the related problem of nuisance hubness. Since language models use high dimensional representations, we consider whether these problems also apply to language models. We prove under reasonable assumptions that concentration of distances will not occur when the model is used for next token prediction, and show empirically that the hubness which occurs in this case is not a nuisance phenomenon. However, if we instead make different kinds of comparisons between representations, for example measuring normalised Euclidean distance between token representations, then we find that both concentration of distances and nuisance hubness can occur.
3. We inspect the connection between the outputted probability distributions and representations of a broad class of models which includes language models from the perspective of identifiability. We prove that if we use KL-Divergence to measure difference between distributions, then we cannot get any guarantees on the similarity of representations. We then show that it is possible to define a metric between sets of distributions and a dissimilarity measure of representations such that a bound on the first gives us a bound on the other.
1. We explore how diffusion models can benefit from more flexible representations using a learned encoder. During their generation process, diffusion models transform Gaussian noise into clean data, for example images, over a large number of time steps. The models are trained using a loss with three parts: The latent loss, which is how far most noisy image is from a true Gaussian; the diffusion loss, which is how well the model takes a step from a more to slightly less noisy image; and the reconstruction loss, which depends on how well the model goes from the smallest amount of noise to no noise at all. We find that using a learned encoder lets the model retain less information from the image at the noisiest time step (better latent loss), but still be as good or even slightly better at taking the steps from more to less noisy image (same or better diffusion loss).
2. We investigate how the representations of language models are affected by phenomena commonly found in high dimensional data. Measuring distances between high-dimensional vectors often runs into a problem called concentration of distances, where the vectors are very sparsely distributed and there is very little difference between the smallest and largest distance. This can also lead to the related problem of nuisance hubness. Since language models use high dimensional representations, we consider whether these problems also apply to language models. We prove under reasonable assumptions that concentration of distances will not occur when the model is used for next token prediction, and show empirically that the hubness which occurs in this case is not a nuisance phenomenon. However, if we instead make different kinds of comparisons between representations, for example measuring normalised Euclidean distance between token representations, then we find that both concentration of distances and nuisance hubness can occur.
3. We inspect the connection between the outputted probability distributions and representations of a broad class of models which includes language models from the perspective of identifiability. We prove that if we use KL-Divergence to measure difference between distributions, then we cannot get any guarantees on the similarity of representations. We then show that it is possible to define a metric between sets of distributions and a dissimilarity measure of representations such that a bound on the first gives us a bound on the other.
| Original language | English |
|---|
| Publisher | Technical University of Denmark |
|---|---|
| Number of pages | 170 |
| Publication status | Published - 2025 |
Fingerprint
Dive into the research topics of 'On Representations of Generative Models'. Together they form a unique fingerprint.Projects
- 1 Finished
-
On Representations of Generative Models
Nielsen, B. M. G. (PhD Student), Winther, O. (Main Supervisor), Schmidt, M. N. (Supervisor), Lioma, C. (Examiner) & Rodolà, E. (Examiner)
01/11/2022 → 02/03/2026
Project: PhD
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver