Enhanced Context Recognition by Sensitivity Pruned Vocabularies

Rasmus Elsborg Madsen, Sigurdur Sigurdsson, Lars Kai Hansen

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    56 Downloads (Pure)

    Abstract

    Language independent `bag-of-words' representations are surprisingly effective for text classification. The generic BOW approach is based on a high-dimensional vocabulary which may reduce the generalization performance of subsequent classifiers, e.g., based on ill-posed principal component transformations. In this communication our aim is to study the effect of sensitivity based pruning of the bag-of-words representation. We consider neural network based sensitivity maps for determination of term relevancy, when pruning the vocabularies. With reduced vocabularies documents are classified using a latent semantic indexing representation and a probabilistic neural network classifier. Pruning the vocabularies to approximately 20% of the original size, we find consistent context recognition enhancement for two mid size data-sets for a range of training set sizes. We also study the applicability of the sensitivity measure for automated keyword generation.
    Original languageEnglish
    Title of host publicationProceedings of 17th International Conference on Pattern Recognition (ICPR 2004)
    Publication date2004
    Pages483-486
    Publication statusPublished - 2004

    Fingerprint Dive into the research topics of 'Enhanced Context Recognition by Sensitivity Pruned Vocabularies'. Together they form a unique fingerprint.

    Cite this