Non-parametric co-clustering of large scale sparse bipartite networks on the GPU

Toke Jansen Hansen, Morten Mørup, Lars Kai Hansen

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    Abstract

    Co-clustering is a problem of both theoretical and practical importance, e.g., market basket analysis and collaborative filtering, and in web scale text processing. We state the co-clustering problem in terms of non-parametric generative models which can address the issue of estimating the number of row and column clusters from a hypothesis space of an infinite number of clusters. To reach large scale applications of co-clustering we exploit that parameter inference for co-clustering is well suited for parallel computing. We develop a generic GPU framework for efficient inference on large scale sparse bipartite networks and achieve a speedup of two orders of magnitude compared to estimation based on conventional CPUs. In terms of scalability we find for networks with more than 100 million links that reliable inference can be achieved in less than an hour on a single GPU. To efficiently manage memory consumption on the GPU we exploit the structure of the posterior likelihood to obtain a decomposition that easily allows model estimation of the co-clustering problem on arbitrary large networks as well as distributed estimation on multiple GPUs. Finally we evaluate the implementation on real-life large scale collaborative filtering data and web scale text corpora, demonstrating that latent mesoscale structures extracted by the co-clustering problem as formulated by the Infinite Relational Model (IRM) are consistent across consecutive runs with different initializations and also relevant for interpretation of the underlaying processes in such large scale networks.
    Original languageEnglish
    Title of host publication2011 IEEE International Workshop on Machine Learning for Signal Processing (MLSP)
    PublisherIEEE
    Publication date2011
    ISBN (Print)978-1-4577-1621-8
    ISBN (Electronic)978-1-4577-1622-5
    DOIs
    Publication statusPublished - 2011
    Event2011 IEEE International Workshop on Machine Learning for Signal Processing - Beijing, China
    Duration: 1 Jan 2011 → …
    http://mlsp2011.conwiz.dk/

    Conference

    Conference2011 IEEE International Workshop on Machine Learning for Signal Processing
    CountryChina
    CityBeijing
    Period01/01/2011 → …
    Internet address
    SeriesMachine Learning for Signal Processing
    ISSN1551-2541

    Fingerprint Dive into the research topics of 'Non-parametric co-clustering of large scale sparse bipartite networks on the GPU'. Together they form a unique fingerprint.

    Cite this