Phase transition in PCA with missing data: Reduced signal-to-noise ratio, not sample size!

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

27 Downloads (Pure)

Abstract

How does missing data affect our ability to learn signal structures? It has been shown that learning signal structure in terms of principal components is dependent on the ratio of sample size and dimensionality and that a critical number of observations is needed before learning starts (Biehl and Mietzner, 1993). Here we generalize this analysis to include missing data. Probabilistic principal component analysis is regularly used for estimating signal structures in datasets with missing data. Our analytic result suggests that the effect of missing data is to effectively reduce signal-to-noise ratio rather than - as generally believed - to reduce sample size. The theory predicts a phase transition in the learning curves and this is indeed found both in simulation data and in real datasets.
Original languageEnglish
Title of host publicationProceedings of Machine Learning Research
Volume97
PublisherInternational Machine Learning Society (IMLS)
Publication date2019
Pages5248-5260
ISBN (Print)9781510886988
Publication statusPublished - 2019
Event36th International Conference on Machine Learning - Long Beach Convention Center, Long Beach, United States
Duration: 10 Jun 201915 Jun 2019
Conference number: 36

Conference

Conference36th International Conference on Machine Learning
Number36
LocationLong Beach Convention Center
CountryUnited States
CityLong Beach
Period10/06/201915/06/2019

Cite this

Ipsen, N. B., & Hansen, L. K. (2019). Phase transition in PCA with missing data: Reduced signal-to-noise ratio, not sample size! In Proceedings of Machine Learning Research (Vol. 97, pp. 5248-5260). International Machine Learning Society (IMLS).