Large scale topic modeling made practical

Bjarne Ørum Wahlgreen, Lars Kai Hansen

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    Abstract

    Topic models are of broad interest. They can be used for query expansion and result structuring in information retrieval and as an important component in services such as recommender systems and user adaptive advertising. In large scale applications both the size of the database (number of documents) and the size of the vocabulary can be significant challenges. Here we discuss two mechanisms that can make scalable solutions possible in the face of large document databases and large vocabularies. The first issue is addressed by a parallel distributed implementation, while the vocabulary problem is reduced by use of large and carefully curated term set. We demonstrate the performance of the proposed system and in the process break a previously claimed ’world record’ announced April 2010 both by speed and size of problem. We show that the use of a WordNet derived vocabulary can identify topics at par with a much larger case specific vocabulary.
    Original languageEnglish
    Title of host publication2011 IEEE International Workshop on Machine Learning for Signal Processing (MLSP)
    PublisherIEEE
    Publication date2011
    ISBN (Print)978-1-4577-1621-8
    ISBN (Electronic)978-1-4577-1622-5
    DOIs
    Publication statusPublished - 2011
    Event2011 IEEE International Workshop on Machine Learning for Signal Processing - Beijing, China
    Duration: 18 Sep 201121 Sep 2011
    Conference number: 21
    https://ieeexplore.ieee.org/xpl/conhome/6058570/proceeding

    Conference

    Conference2011 IEEE International Workshop on Machine Learning for Signal Processing
    Number21
    Country/TerritoryChina
    CityBeijing
    Period18/09/201121/09/2011
    Internet address
    SeriesMachine Learning for Signal Processing
    ISSN1551-2541

    Fingerprint

    Dive into the research topics of 'Large scale topic modeling made practical'. Together they form a unique fingerprint.

    Cite this