Enhanced Context Recognition by Sensitivity Pruned Vocabularies

Rasmus Elsborg Madsen, Sigurdur Sigurdsson, Lars Kai Hansen

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    111 Downloads (Orbit)

    Abstract

    Language independent `bag-of-words' representations are surprisingly effective for text classification. The generic BOW approach is based on a high-dimensional vocabulary which may reduce the generalization performance of subsequent classifiers, e.g., based on ill-posed principal component transformations. In this communication our aim is to study the effect of sensitivity based pruning of the bag-of-words representation. We consider neural network based sensitivity maps for determination of term relevancy, when pruning the vocabularies. With reduced vocabularies documents are classified using a latent semantic indexing representation and a probabilistic neural network classifier. Pruning the vocabularies to approximately 20% of the original size, we find consistent context recognition enhancement for two mid size data-sets for a range of training set sizes. We also study the applicability of the sensitivity measure for automated keyword generation.
    Original languageEnglish
    Title of host publicationProceedings of 17th International Conference on Pattern Recognition (ICPR 2004)
    Volume2
    Publication date2004
    Pages483-486
    Publication statusPublished - 2004
    Event17th International Conference on Pattern Recognition - Cambridge, United Kingdom
    Duration: 26 Aug 200426 Aug 2004
    Conference number: 17

    Conference

    Conference17th International Conference on Pattern Recognition
    Number17
    Country/TerritoryUnited Kingdom
    CityCambridge
    Period26/08/200426/08/2004

    Fingerprint

    Dive into the research topics of 'Enhanced Context Recognition by Sensitivity Pruned Vocabularies'. Together they form a unique fingerprint.

    Cite this