Clustering cliques for graph-based summarization of the biomedical research literature

Han Zhang, Marcelo Fiszman, Dongwook Shin, Bartlomiej Wilkowski, Thomas C. Rindflesch

Research output: Contribution to journalJournal articleResearchpeer-review

278 Downloads (Pure)

Abstract

Background: Graph-based notions are increasingly used in biomedical data mining and knowledge discovery tasks. In this paper, we present a clique-clustering method to automatically summarize graphs of semantic predications produced from PubMed citations (titles and abstracts).Results: SemRep is used to extract semantic predications from the citations returned by a PubMed search. Cliques were identified from frequently occurring predications with highly connected arguments filtered by degree centrality. Themes contained in the summary were identified with a hierarchical clustering algorithm based on common arguments shared among cliques. The validity of the clusters in the summaries produced was compared to the Silhouette-generated baseline for cohesion, separation and overall validity. The theme labels were also compared to a reference standard produced with major MeSH headings.Conclusions: For 11 topics in the testing data set, the overall validity of clusters from the system summary was 10% better than the baseline (43% versus 33%). While compared to the reference standard from MeSH headings, the results for recall, precision and F-score were 0.64, 0.65, and 0.65 respectively.
Original languageEnglish
Article number182
JournalB M C Bioinformatics
Volume14
Number of pages15
ISSN1471-2105
DOIs
Publication statusPublished - 2013

Bibliographical note

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Keywords

  • Graphic methods
  • Semantics
  • Clustering algorithms

Fingerprint Dive into the research topics of 'Clustering cliques for graph-based summarization of the biomedical research literature'. Together they form a unique fingerprint.

Cite this