TY - BOOK

T1 - Sparse Classification - Methods & Applications

AU - Einarsson, Gudmundur

PY - 2018

Y1 - 2018

N2 - With increasing number of more sophisticated tools to acquire data, we are faced with the important question of what matters in the sea of information at hand. This challenge is becoming more prevalent across virtually all scientific disciplines. Improvements over state of the art methods for analysing such data carry the potential to revolutionize tasks such as medical diagnostics where often decisions need to be based on only a few high-dimensional observations. This explosion in data dimensionality has sparked the development of novel statistical methods. In contrast, classical statistics build upon the assumption that we have more samples than variables, and the main asymptotic results, such as the central limit theorem, reflect that. As the assumption of having many samples does not hold for modern datasets, we need new tools and methods to find the signal within the dataset which is predictive of the relevant response variable. The focus in this thesis is on sparse methods where sparse implies that the method selects only a few variables. Different types of data call for different methods. In this thesis the sparse methods we study concern settings where the response variable is ordinal. Such ordinal labeling is common in many fields, for example, medical doctors often summarize their observations into a single class of disease severity, which is known as a medical rating score. Automation offers the potential to improve both the reliability and objectivity of such tasks. To demonstrate the effectiveness of the sparse methods developed in this thesis, they were applied to both challenging and diverse real-world problems: Predicting the severity of motion disorders from Parkinson’s patients, generating short summaries of content from hundreds of online user reviews and detecting foreign objects from Multispectral X-ray scans. It may be noted, that to achieve these results, novel optimization approaches and open-source software were implemented.

AB - With increasing number of more sophisticated tools to acquire data, we are faced with the important question of what matters in the sea of information at hand. This challenge is becoming more prevalent across virtually all scientific disciplines. Improvements over state of the art methods for analysing such data carry the potential to revolutionize tasks such as medical diagnostics where often decisions need to be based on only a few high-dimensional observations. This explosion in data dimensionality has sparked the development of novel statistical methods. In contrast, classical statistics build upon the assumption that we have more samples than variables, and the main asymptotic results, such as the central limit theorem, reflect that. As the assumption of having many samples does not hold for modern datasets, we need new tools and methods to find the signal within the dataset which is predictive of the relevant response variable. The focus in this thesis is on sparse methods where sparse implies that the method selects only a few variables. Different types of data call for different methods. In this thesis the sparse methods we study concern settings where the response variable is ordinal. Such ordinal labeling is common in many fields, for example, medical doctors often summarize their observations into a single class of disease severity, which is known as a medical rating score. Automation offers the potential to improve both the reliability and objectivity of such tasks. To demonstrate the effectiveness of the sparse methods developed in this thesis, they were applied to both challenging and diverse real-world problems: Predicting the severity of motion disorders from Parkinson’s patients, generating short summaries of content from hundreds of online user reviews and detecting foreign objects from Multispectral X-ray scans. It may be noted, that to achieve these results, novel optimization approaches and open-source software were implemented.

M3 - Ph.D. thesis

T3 - DTU Compute PHD-2018

BT - Sparse Classification - Methods & Applications

PB - DTU Compute

ER -