Abstract
In this paper, we present a new evaluation approach for missing data
techniques (MDTs) where the efficiency of those are investigated using
listwise deletion method as reference. We experiment on classification
problems and calculate misclassification rates (MR) for different
missing data percentages (MDP) using a missing completely
at random (MCAR) scheme. We compare three MDTs: pairwise
deletion (PW), mean imputation (MI) and a maximum likelihood
method that we call complete expectation maximization (CEM). We
use a synthetic dataset, the Iris dataset and the Pima Indians Diabetes
dataset. We train a Gaussian mixture model (GMM). We test
the trained GMM for two cases, in which test dataset is missing or
complete. The results show that CEM is the most efficient method
in both cases while MI is the worst performer of the three. PW and
CEM proves to be more stable, in particular for higher MDP values
than MI.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Place of Publication | Prague |
Publisher | IEEE |
Publication date | 2011 |
Pages | 2260-2263 |
ISBN (Print) | 978-1-4577-0538-0 |
ISBN (Electronic) | 978-1-4577-0537-3 |
DOIs | |
Publication status | Published - 2011 |
Event | 2011 IEEE International Conference on Acoustics, Speech and Signal Processing - Prague, Czech Republic Duration: 22 May 2011 → 27 May 2011 Conference number: 36 http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=5916934 |
Conference
Conference | 2011 IEEE International Conference on Acoustics, Speech and Signal Processing |
---|---|
Number | 36 |
Country/Territory | Czech Republic |
City | Prague |
Period | 22/05/2011 → 27/05/2011 |
Internet address |
Keywords
- Missing data techniques
- Supervised learning
- Machine learning