TY - JOUR
T1 - An Empirical Comparison of Algorithms to Find Communities in Directed Graphs and Their Application in Web Data Analytics
AU - Agreste, Santa
AU - De Meo, Pasquale
AU - Fiumara, Giacomo
AU - Piccione, Giuseppe
AU - Piccolo, Sebastiano
AU - Rosaci, Domenico
AU - Sarne, Giuseppe M. L.
AU - Vasilakos, Athanasios V.
PY - 2017
Y1 - 2017
N2 - Detecting communities in graphs is a fundamental tool to understand the structure of Web-based systems and predict their evolution. Many community detection algorithms are designed to process undirected graphs (i.e., graphs with bidirectional edges) but many graphs on the Web-e.g., microblogging Web sites, trust networks or the Web graph itself-are often directed. Few community detection algorithms deal with directed graphs but we lack their experimental comparison. In this paper we evaluated some community detection algorithms across accuracy and scalability. A first group of algorithms (Label Propagation and Infomap) are explicitly designed to manage directed graphs while a second group (e.g., WalkTrap) simply ignores edge directionality; finally, a third group of algorithms (e.g., Eigenvector) maps input graphs onto undirected ones and extracts communities from the symmetrized version of the input graph. We ran our tests on both artificial and real graphs and, on artificial graphs, WalkTrap achieved the highest accuracy, closely followed by other algorithms; Label Propagation has outstanding performance in scalability on both artificial and real graphs. The Infomap algorithm showcased the best trade-off between accuracy and computational performance and, therefore, it has to be considered as a promising tool for Web Data Analytics purposes.
AB - Detecting communities in graphs is a fundamental tool to understand the structure of Web-based systems and predict their evolution. Many community detection algorithms are designed to process undirected graphs (i.e., graphs with bidirectional edges) but many graphs on the Web-e.g., microblogging Web sites, trust networks or the Web graph itself-are often directed. Few community detection algorithms deal with directed graphs but we lack their experimental comparison. In this paper we evaluated some community detection algorithms across accuracy and scalability. A first group of algorithms (Label Propagation and Infomap) are explicitly designed to manage directed graphs while a second group (e.g., WalkTrap) simply ignores edge directionality; finally, a third group of algorithms (e.g., Eigenvector) maps input graphs onto undirected ones and extracts communities from the symmetrized version of the input graph. We ran our tests on both artificial and real graphs and, on artificial graphs, WalkTrap achieved the highest accuracy, closely followed by other algorithms; Label Propagation has outstanding performance in scalability on both artificial and real graphs. The Infomap algorithm showcased the best trade-off between accuracy and computational performance and, therefore, it has to be considered as a promising tool for Web Data Analytics purposes.
KW - Web data analytics
KW - Graph analytics
KW - Community detection and clustering
KW - Directed graphs
U2 - 10.1109/TBDATA.2016.2631512
DO - 10.1109/TBDATA.2016.2631512
M3 - Journal article
SN - 2332-7790
VL - 3
SP - 289
EP - 306
JO - IEEE Transactions on Big Data
JF - IEEE Transactions on Big Data
IS - 3
ER -