On Optimal Data Split for Generalization Estimation and Model Selection

Jan Larsen, Cyril Goutte

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    646 Downloads (Pure)

    Abstract

    The paper is concerned with studying the very different behavior of the two data splits using hold-out cross-validation, K-fold cross-validation and randomized permutation cross-validation. First we describe the theoretical basics of various cross-validation techniques with the purpose of reliably estimating the generalization error and optimizing the model structure. The paper deals with the simple problem of estimating a single location parameter. This problem is tractable as non-asymptotic theoretical analysis is possible, whereas mainly asymptotic analysis and simulation studies are viable for the more complex AR-models and neural networks.
    Original languageEnglish
    Title of host publicationProceedings of the IEEE Workshop on Neural Networks for Signal Processing IX
    Place of PublicationPiscataway
    PublisherIEEE
    Publication date1999
    Pages225-234
    ISBN (Print)0-7803-5673-x
    DOIs
    Publication statusPublished - 1999
    Event1999 IEEE Workshop on Neural Networks for Signal Processing IX - Madison, WI, United States
    Duration: 23 Aug 199925 Aug 1999
    Conference number: 9
    http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6375

    Workshop

    Workshop1999 IEEE Workshop on Neural Networks for Signal Processing IX
    Number9
    CountryUnited States
    CityMadison, WI
    Period23/08/199925/08/1999
    Internet address

    Bibliographical note

    Copyright 1999 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

    Fingerprint Dive into the research topics of 'On Optimal Data Split for Generalization Estimation and Model Selection'. Together they form a unique fingerprint.

    Cite this