Data analysis in high-dimensional sparse spaces: Large p, small n problems

    Research output: Book/ReportPh.D. thesisResearch

    575 Downloads (Pure)

    Abstract

    The present thesis considers data analysis of problems with many features in relation to the number of observations (large p, small n problems). The theoretical considerations for such problems are outlined including the curses and blessings of dimensionality, and the importance of dimension reduction. In this context the trade off between a rich solution which answers the questions at hand and a simple solution which generalizes to unseen data is described. For all of the given data examples labelled output exists and the analyses are therefore limited to supervised settings. Three novel classification techniques for high-dimensional problems are presented: Sparse discriminant analysis, sparse mixture discriminant analysis and orthogonality constrained support vector machines. The first two introduces sparseness to the well known linear and mixture discriminant analysis and thereby provide low-dimensional projections of data with few non-zero loadings which give improvements in classification. The latter adds a priori information of pairing between observations to the support vector machine and thereby give solutions with less variation and slight improvements in classification. The classification methods are applied to classifications of fish species, ear canal impressions used in the hearing aid industry, microbiological fungi species, and various cancerous tissues and healthy tissues. In addition, novel applications of sparse regressions (also called the elastic net) to the medical, concrete, and food industries via multi-spectral images for objective and automated systems are presented.
    Original languageEnglish
    Place of PublicationKgs. Lyngby, Denmark
    PublisherTechnical University of Denmark
    Publication statusPublished - Mar 2010
    SeriesIMM-PHD-2009-228

    Projects

    Data-analyse i sparse, høj-dimensionale rum

    Clemmensen, L. K. H., Ersbøll, B. K., Larsen, R. W., Bigun, J. & Bro-Jørgensen, R.

    DTU stipendium

    01/04/200631/03/2010

    Project: PhD

    Cite this