TY - JOUR
T1 - Learning Supervised Topic Models for Classification and Regression from Crowds
AU - Rodrigues, Filipe
AU - Lourenco, Mariana
AU - Ribeiro, Bernardete
AU - Pereira, Francisco Camara
PY - 2017
Y1 - 2017
N2 - The growing need to analyze large collections of documents has led to great developments in topic modeling. Since documents are frequently associated with other related variables, such as labels or ratings, much interest has been placed on supervised topic models. However, the nature of most annotation tasks, prone to ambiguity and noise, often with high volumes of documents, deem learning under a single-annotator assumption unrealistic or unpractical for most real-world applications. In this article, we propose two supervised topic models, one for classification and another for regression problems, which account for the heterogeneity and biases among different annotators that are encountered in practice when learning from crowds. We develop an efficient stochastic variational inference algorithm that is able to scale to very large datasets, and we empirically demonstrate the advantages of the proposed model over state-of-the-art approaches.
AB - The growing need to analyze large collections of documents has led to great developments in topic modeling. Since documents are frequently associated with other related variables, such as labels or ratings, much interest has been placed on supervised topic models. However, the nature of most annotation tasks, prone to ambiguity and noise, often with high volumes of documents, deem learning under a single-annotator assumption unrealistic or unpractical for most real-world applications. In this article, we propose two supervised topic models, one for classification and another for regression problems, which account for the heterogeneity and biases among different annotators that are encountered in practice when learning from crowds. We develop an efficient stochastic variational inference algorithm that is able to scale to very large datasets, and we empirically demonstrate the advantages of the proposed model over state-of-the-art approaches.
U2 - 10.1109/TPAMI.2017.2648786
DO - 10.1109/TPAMI.2017.2648786
M3 - Journal article
C2 - 28103190
SN - 0162-8828
VL - 39
SP - 2409
EP - 2422
JO - I E E E Transactions on Pattern Analysis and Machine Intelligence
JF - I E E E Transactions on Pattern Analysis and Machine Intelligence
IS - 12
ER -