Abstract
Topic models are of broad interest. They can be used for
query expansion and result structuring in information retrieval
and as an important component in services such as
recommender systems and user adaptive advertising. In large
scale applications both the size of the database (number of
documents) and the size of the vocabulary can be significant
challenges. Here we discuss two mechanisms that can
make scalable solutions possible in the face of large document
databases and large vocabularies. The first issue is
addressed by a parallel distributed implementation, while
the vocabulary problem is reduced by use of large and carefully
curated term set. We demonstrate the performance of
the proposed system and in the process break a previously
claimed ’world record’ announced April 2010 both by speed
and size of problem. We show that the use of a WordNet
derived vocabulary can identify topics at par with a much
larger case specific vocabulary.
Original language | English |
---|---|
Title of host publication | 2011 IEEE International Workshop on Machine Learning for Signal Processing (MLSP) |
Publisher | IEEE |
Publication date | 2011 |
ISBN (Print) | 978-1-4577-1621-8 |
ISBN (Electronic) | 978-1-4577-1622-5 |
DOIs | |
Publication status | Published - 2011 |
Event | 2011 IEEE International Workshop on Machine Learning for Signal Processing - Beijing, China Duration: 18 Sep 2011 → 21 Sep 2011 Conference number: 21 https://ieeexplore.ieee.org/xpl/conhome/6058570/proceeding |
Conference
Conference | 2011 IEEE International Workshop on Machine Learning for Signal Processing |
---|---|
Number | 21 |
Country/Territory | China |
City | Beijing |
Period | 18/09/2011 → 21/09/2011 |
Internet address |
Series | Machine Learning for Signal Processing |
---|---|
ISSN | 1551-2541 |