Abstract
Co-clustering is a problem of both theoretical and practical importance,
e.g., market basket analysis and collaborative filtering, and
in web scale text processing. We state the co-clustering problem in
terms of non-parametric generative models which can address the
issue of estimating the number of row and column clusters from a
hypothesis space of an infinite number of clusters. To reach large
scale applications of co-clustering we exploit that parameter inference
for co-clustering is well suited for parallel computing. We
develop a generic GPU framework for efficient inference on large
scale sparse bipartite networks and achieve a speedup of two orders
of magnitude compared to estimation based on conventional
CPUs. In terms of scalability we find for networks with more
than 100 million links that reliable inference can be achieved in
less than an hour on a single GPU. To efficiently manage memory
consumption on the GPU we exploit the structure of the posterior
likelihood to obtain a decomposition that easily allows model estimation
of the co-clustering problem on arbitrary large networks
as well as distributed estimation on multiple GPUs. Finally we
evaluate the implementation on real-life large scale collaborative
filtering data and web scale text corpora, demonstrating that latent
mesoscale structures extracted by the co-clustering problem as
formulated by the Infinite Relational Model (IRM) are consistent
across consecutive runs with different initializations and also relevant
for interpretation of the underlaying processes in such large
scale networks.
Original language | English |
---|---|
Title of host publication | 2011 IEEE International Workshop on Machine Learning for Signal Processing (MLSP) |
Publisher | IEEE |
Publication date | 2011 |
ISBN (Print) | 978-1-4577-1621-8 |
ISBN (Electronic) | 978-1-4577-1622-5 |
DOIs | |
Publication status | Published - 2011 |
Event | 2011 IEEE International Workshop on Machine Learning for Signal Processing - Beijing, China Duration: 18 Sept 2011 → 21 Sept 2011 Conference number: 21 https://ieeexplore.ieee.org/xpl/conhome/6058570/proceeding |
Conference
Conference | 2011 IEEE International Workshop on Machine Learning for Signal Processing |
---|---|
Number | 21 |
Country/Territory | China |
City | Beijing |
Period | 18/09/2011 → 21/09/2011 |
Internet address |
Series | Machine Learning for Signal Processing |
---|---|
ISSN | 1551-2541 |