How efficient is estimation with missing data?

Seliz Karadogan, Letizia Marchegiani, Lars Kai Hansen, Jan Larsen

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    1 Downloads (Pure)

    Abstract

    In this paper, we present a new evaluation approach for missing data techniques (MDTs) where the efficiency of those are investigated using listwise deletion method as reference. We experiment on classification problems and calculate misclassification rates (MR) for different missing data percentages (MDP) using a missing completely at random (MCAR) scheme. We compare three MDTs: pairwise deletion (PW), mean imputation (MI) and a maximum likelihood method that we call complete expectation maximization (CEM). We use a synthetic dataset, the Iris dataset and the Pima Indians Diabetes dataset. We train a Gaussian mixture model (GMM). We test the trained GMM for two cases, in which test dataset is missing or complete. The results show that CEM is the most efficient method in both cases while MI is the worst performer of the three. PW and CEM proves to be more stable, in particular for higher MDP values than MI.
    Original languageEnglish
    Title of host publicationProceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    Place of PublicationPrague
    PublisherIEEE
    Publication date2011
    Pages2260-2263
    ISBN (Print)978-1-4577-0538-0
    ISBN (Electronic)978-1-4577-0537-3
    DOIs
    Publication statusPublished - 2011
    Event2011 IEEE International Conference on Acoustics, Speech and Signal Processing - Prague, Czech Republic
    Duration: 22 May 201127 May 2011
    Conference number: 36
    http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=5916934

    Conference

    Conference2011 IEEE International Conference on Acoustics, Speech and Signal Processing
    Number36
    Country/TerritoryCzech Republic
    CityPrague
    Period22/05/201127/05/2011
    Internet address

    Keywords

    • Missing data techniques
    • Supervised learning
    • Machine learning

    Fingerprint

    Dive into the research topics of 'How efficient is estimation with missing data?'. Together they form a unique fingerprint.

    Cite this