Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd

Zichen Wang, Caroline D. Monteiro, Kathleen M. Jagodnik, Nicolas F. Fernandez, Gregory W. Gundersen, Andrew D. Rouillard, Sherry L. Jenkins, Axel S. Feldmann, Kevin S. Hu, Michael G. McDermott, Qiaonan Duan, Neil R. Clark, Matthew R. Jones, Yan Kou, Troy Goff, Holly Woodland, Fabio M R Amaral, Gregory L. Szeto, Oliver Fuchs, Sophia M. Schüssler-Fiorenza RoseShvetank Sharma, Uwe Schwartz, Xabier Bengoetxea Bausela, Maciej Szymkiewicz, Vasileios Maroulis, Anton Salykin, Carolina M. Barra, Candice D. Kruth, Nicholas J. Bongio, Vaibhav Mathur, Radmila D. Todoric, Udi E. Rubin, Apostolos Malatras, Carl T. Fulp, John A. Galindo, Ruta Motiejunaite, Christoph Jüschke, Philip C. Dishuck, Katharina Lahl, Mohieddin Jafari, Sara Aibar, Apostolos Zaravinos, Linda H. Steenhuizen, Lindsey R. Allison, Pablo Gamallo, Fernando De Andres Segura, Tyler Dae Devlin, Vicente Pérez-García, Avi Ma'ayan

    Research output: Contribution to journalJournal articleResearchpeer-review

    310 Downloads (Pure)

    Abstract

    Gene expression data are accumulating exponentially in public repositories. Reanalysis and integration of themed collections from these studies may provide new insights, but requires further human curation. Here we report a crowdsourcing project to annotate and reanalyse a large number of gene expression profiles from Gene Expression Omnibus (GEO). Through a massive open online course on Coursera, over 70 participants from over 25 countries identify and annotate 2,460 single-gene perturbation signatures, 839 disease versus normal signatures, and 906 drug perturbation signatures. All these signatures are unique and are manually validated for quality. Global analysis of these signatures confirms known associations and identifies novel associations between genes, diseases and drugs. The manually curated signatures are used as a training set to develop classifiers for extracting similar signatures from the entire GEO repository. We develop a web portal to serve these signatures for query, download and visualization.
    Original languageEnglish
    Article number12846
    JournalNature Communications
    Volume7
    Number of pages11
    ISSN2041-1723
    DOIs
    Publication statusPublished - 2016

    Keywords

    • Chemistry (all)
    • Biochemistry, Genetics and Molecular Biology (all)
    • Physics and Astronomy (all)

    Cite this

    Wang, Z., Monteiro, C. D., Jagodnik, K. M., Fernandez, N. F., Gundersen, G. W., Rouillard, A. D., ... Ma'ayan, A. (2016). Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd. Nature Communications, 7, [12846]. https://doi.org/10.1038/ncomms12846