Part-of-Speech Enhanced Context Recognition

Rasmus Elsborg Madsen, Jan Larsen, Lars Kai Hansen

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    120 Downloads (Pure)

    Abstract

    Language independent `bag-of-words' representations are surprisingly efective for text classi¯cation. In this communi- cation our aim is to elucidate the synergy between language inde- pendent features and simple language model features. We consider term tag features estimated by a so-called part-of-speech tagger. The feature sets are combined in an early binding design with an optimized binding coefficient that allows weighting of the relative variance contributions of the participating feature sets. With the combined features documents are classi¯ed using a latent semantic indexing representation and a probabilistic neural network classi- fier. Three medium size data-sets are analyzed and we find consis- tent synergy between the term and natural language features in all three sets for a range of training set sizes. The most significant en- hancement is found for small text databases where high recognition rates are possible.
    Original languageEnglish
    Title of host publicationProceedings of IEEE Workshop on Machine Learning for Signal Processing XIV
    PublisherIEEE Press
    Publication date2004
    Pages635-644
    ISBN (Print)0-7803-8608-4
    DOIs
    Publication statusPublished - 2004
    Event14th IEEE Signal Processing Society Workshop Machine Learning for Signal Processing, 2004. -
    Duration: 1 Jan 2004 → …

    Conference

    Conference14th IEEE Signal Processing Society Workshop Machine Learning for Signal Processing, 2004.
    Period01/01/2004 → …

    Bibliographical note

    Copyright: 2004 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE

    Keywords

    • text mining
    • context recognition
    • latent space

    Fingerprint Dive into the research topics of 'Part-of-Speech Enhanced Context Recognition'. Together they form a unique fingerprint.

    Cite this