TY - RPRT

T1 - On discriminant analysis techniques and correlation structures in high dimensions

AU - Clemmensen, Line Katrine Harder

PY - 2013

Y1 - 2013

N2 - This paper compares several recently proposed techniques for performing discriminant analysis in high dimensions, and illustrates that the various sparse methods dier in prediction abilities depending on their underlying assumptions about the correlation structures in the data. The techniques generally focus on two things: Obtaining sparsity (variable selection) and regularizing the estimate of the within-class covariance matrix. For high-dimensional data, this gives rise to increased interpretability and generalization ability over standard linear discriminant analysis. Here, we group the methods in two: Those who assume independence between the variables and thus use a diagonal estimate of the within-class covariance matrix, and those who assume dependence between the variables and thus use an estimate of the within-class covariance matrix, which also estimates the correlations between variables. The two groups of methods are compared and the pros and cons are exemplied using dierent cases of simulated data. The results illustrate that the estimate of the covariance matrix is an important factor with respect to choice of method, and the choice of method should thus be driven by the nature of the problem at hand.

AB - This paper compares several recently proposed techniques for performing discriminant analysis in high dimensions, and illustrates that the various sparse methods dier in prediction abilities depending on their underlying assumptions about the correlation structures in the data. The techniques generally focus on two things: Obtaining sparsity (variable selection) and regularizing the estimate of the within-class covariance matrix. For high-dimensional data, this gives rise to increased interpretability and generalization ability over standard linear discriminant analysis. Here, we group the methods in two: Those who assume independence between the variables and thus use a diagonal estimate of the within-class covariance matrix, and those who assume dependence between the variables and thus use an estimate of the within-class covariance matrix, which also estimates the correlations between variables. The two groups of methods are compared and the pros and cons are exemplied using dierent cases of simulated data. The results illustrate that the estimate of the covariance matrix is an important factor with respect to choice of method, and the choice of method should thus be driven by the nature of the problem at hand.

M3 - Report

T3 - Technical Report-2013

BT - On discriminant analysis techniques and correlation structures in high dimensions

PB - Technical University of Denmark

CY - Kgs. Lyngby

ER -