Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd

Zichen Wang, Caroline D. Monteiro, Kathleen M. Jagodnik, Nicolas F. Fernandez, Gregory W. Gundersen, Andrew D. Rouillard, Sherry L. Jenkins, Axel S. Feldmann, Kevin S. Hu, Michael G. McDermott, Qiaonan Duan, Neil R. Clark, Matthew R. Jones, Yan Kou, Troy Goff, Holly Woodland, Fabio M R Amaral, Gregory L. Szeto, Oliver Fuchs, Sophia M. Schüssler-Fiorenza RoseShvetank Sharma, Uwe Schwartz, Xabier Bengoetxea Bausela, Maciej Szymkiewicz, Vasileios Maroulis, Anton Salykin, Carolina M. Barra, Candice D. Kruth, Nicholas J. Bongio, Vaibhav Mathur, Radmila D. Todoric, Udi E. Rubin, Apostolos Malatras, Carl T. Fulp, John A. Galindo, Ruta Motiejunaite, Christoph Jüschke, Philip C. Dishuck, Katharina Lahl, Mohieddin Jafari, Sara Aibar, Apostolos Zaravinos, Linda H. Steenhuizen, Lindsey R. Allison, Pablo Gamallo, Fernando De Andres Segura, Tyler Dae Devlin, Vicente Pérez-García, Avi Ma'ayan

    Research output: Contribution to journalJournal articleResearchpeer-review

    310 Downloads (Pure)

    Abstract

    Gene expression data are accumulating exponentially in public repositories. Reanalysis and integration of themed collections from these studies may provide new insights, but requires further human curation. Here we report a crowdsourcing project to annotate and reanalyse a large number of gene expression profiles from Gene Expression Omnibus (GEO). Through a massive open online course on Coursera, over 70 participants from over 25 countries identify and annotate 2,460 single-gene perturbation signatures, 839 disease versus normal signatures, and 906 drug perturbation signatures. All these signatures are unique and are manually validated for quality. Global analysis of these signatures confirms known associations and identifies novel associations between genes, diseases and drugs. The manually curated signatures are used as a training set to develop classifiers for extracting similar signatures from the entire GEO repository. We develop a web portal to serve these signatures for query, download and visualization.
    Original languageEnglish
    Article number12846
    JournalNature Communications
    Volume7
    Number of pages11
    ISSN2041-1723
    DOIs
    Publication statusPublished - 2016

    Keywords

    • Chemistry (all)
    • Biochemistry, Genetics and Molecular Biology (all)
    • Physics and Astronomy (all)

    Cite this

    Wang, Z., Monteiro, C. D., Jagodnik, K. M., Fernandez, N. F., Gundersen, G. W., Rouillard, A. D., ... Ma'ayan, A. (2016). Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd. Nature Communications, 7, [12846]. https://doi.org/10.1038/ncomms12846
    Wang, Zichen ; Monteiro, Caroline D. ; Jagodnik, Kathleen M. ; Fernandez, Nicolas F. ; Gundersen, Gregory W. ; Rouillard, Andrew D. ; Jenkins, Sherry L. ; Feldmann, Axel S. ; Hu, Kevin S. ; McDermott, Michael G. ; Duan, Qiaonan ; Clark, Neil R. ; Jones, Matthew R. ; Kou, Yan ; Goff, Troy ; Woodland, Holly ; Amaral, Fabio M R ; Szeto, Gregory L. ; Fuchs, Oliver ; Schüssler-Fiorenza Rose, Sophia M. ; Sharma, Shvetank ; Schwartz, Uwe ; Bausela, Xabier Bengoetxea ; Szymkiewicz, Maciej ; Maroulis, Vasileios ; Salykin, Anton ; Barra, Carolina M. ; Kruth, Candice D. ; Bongio, Nicholas J. ; Mathur, Vaibhav ; Todoric, Radmila D. ; Rubin, Udi E. ; Malatras, Apostolos ; Fulp, Carl T. ; Galindo, John A. ; Motiejunaite, Ruta ; Jüschke, Christoph ; Dishuck, Philip C. ; Lahl, Katharina ; Jafari, Mohieddin ; Aibar, Sara ; Zaravinos, Apostolos ; Steenhuizen, Linda H. ; Allison, Lindsey R. ; Gamallo, Pablo ; De Andres Segura, Fernando ; Dae Devlin, Tyler ; Pérez-García, Vicente ; Ma'ayan, Avi. / Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd. In: Nature Communications. 2016 ; Vol. 7.
    @article{d703c2248cf543848563abe57eb4ddb3,
    title = "Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd",
    abstract = "Gene expression data are accumulating exponentially in public repositories. Reanalysis and integration of themed collections from these studies may provide new insights, but requires further human curation. Here we report a crowdsourcing project to annotate and reanalyse a large number of gene expression profiles from Gene Expression Omnibus (GEO). Through a massive open online course on Coursera, over 70 participants from over 25 countries identify and annotate 2,460 single-gene perturbation signatures, 839 disease versus normal signatures, and 906 drug perturbation signatures. All these signatures are unique and are manually validated for quality. Global analysis of these signatures confirms known associations and identifies novel associations between genes, diseases and drugs. The manually curated signatures are used as a training set to develop classifiers for extracting similar signatures from the entire GEO repository. We develop a web portal to serve these signatures for query, download and visualization.",
    keywords = "Chemistry (all), Biochemistry, Genetics and Molecular Biology (all), Physics and Astronomy (all)",
    author = "Zichen Wang and Monteiro, {Caroline D.} and Jagodnik, {Kathleen M.} and Fernandez, {Nicolas F.} and Gundersen, {Gregory W.} and Rouillard, {Andrew D.} and Jenkins, {Sherry L.} and Feldmann, {Axel S.} and Hu, {Kevin S.} and McDermott, {Michael G.} and Qiaonan Duan and Clark, {Neil R.} and Jones, {Matthew R.} and Yan Kou and Troy Goff and Holly Woodland and Amaral, {Fabio M R} and Szeto, {Gregory L.} and Oliver Fuchs and {Sch{\"u}ssler-Fiorenza Rose}, {Sophia M.} and Shvetank Sharma and Uwe Schwartz and Bausela, {Xabier Bengoetxea} and Maciej Szymkiewicz and Vasileios Maroulis and Anton Salykin and Barra, {Carolina M.} and Kruth, {Candice D.} and Bongio, {Nicholas J.} and Vaibhav Mathur and Todoric, {Radmila D.} and Rubin, {Udi E.} and Apostolos Malatras and Fulp, {Carl T.} and Galindo, {John A.} and Ruta Motiejunaite and Christoph J{\"u}schke and Dishuck, {Philip C.} and Katharina Lahl and Mohieddin Jafari and Sara Aibar and Apostolos Zaravinos and Steenhuizen, {Linda H.} and Allison, {Lindsey R.} and Pablo Gamallo and {De Andres Segura}, Fernando and {Dae Devlin}, Tyler and Vicente P{\'e}rez-Garc{\'i}a and Avi Ma'ayan",
    year = "2016",
    doi = "10.1038/ncomms12846",
    language = "English",
    volume = "7",
    journal = "Nature Communications",
    issn = "2041-1723",
    publisher = "Nature Publishing Group",

    }

    Wang, Z, Monteiro, CD, Jagodnik, KM, Fernandez, NF, Gundersen, GW, Rouillard, AD, Jenkins, SL, Feldmann, AS, Hu, KS, McDermott, MG, Duan, Q, Clark, NR, Jones, MR, Kou, Y, Goff, T, Woodland, H, Amaral, FMR, Szeto, GL, Fuchs, O, Schüssler-Fiorenza Rose, SM, Sharma, S, Schwartz, U, Bausela, XB, Szymkiewicz, M, Maroulis, V, Salykin, A, Barra, CM, Kruth, CD, Bongio, NJ, Mathur, V, Todoric, RD, Rubin, UE, Malatras, A, Fulp, CT, Galindo, JA, Motiejunaite, R, Jüschke, C, Dishuck, PC, Lahl, K, Jafari, M, Aibar, S, Zaravinos, A, Steenhuizen, LH, Allison, LR, Gamallo, P, De Andres Segura, F, Dae Devlin, T, Pérez-García, V & Ma'ayan, A 2016, 'Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd', Nature Communications, vol. 7, 12846. https://doi.org/10.1038/ncomms12846

    Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd. / Wang, Zichen; Monteiro, Caroline D.; Jagodnik, Kathleen M.; Fernandez, Nicolas F.; Gundersen, Gregory W.; Rouillard, Andrew D.; Jenkins, Sherry L.; Feldmann, Axel S.; Hu, Kevin S.; McDermott, Michael G.; Duan, Qiaonan; Clark, Neil R.; Jones, Matthew R.; Kou, Yan; Goff, Troy; Woodland, Holly; Amaral, Fabio M R; Szeto, Gregory L.; Fuchs, Oliver; Schüssler-Fiorenza Rose, Sophia M.; Sharma, Shvetank; Schwartz, Uwe; Bausela, Xabier Bengoetxea; Szymkiewicz, Maciej; Maroulis, Vasileios; Salykin, Anton; Barra, Carolina M.; Kruth, Candice D.; Bongio, Nicholas J.; Mathur, Vaibhav; Todoric, Radmila D.; Rubin, Udi E.; Malatras, Apostolos; Fulp, Carl T.; Galindo, John A.; Motiejunaite, Ruta; Jüschke, Christoph; Dishuck, Philip C.; Lahl, Katharina; Jafari, Mohieddin; Aibar, Sara; Zaravinos, Apostolos; Steenhuizen, Linda H.; Allison, Lindsey R.; Gamallo, Pablo; De Andres Segura, Fernando; Dae Devlin, Tyler; Pérez-García, Vicente; Ma'ayan, Avi.

    In: Nature Communications, Vol. 7, 12846, 2016.

    Research output: Contribution to journalJournal articleResearchpeer-review

    TY - JOUR

    T1 - Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd

    AU - Wang, Zichen

    AU - Monteiro, Caroline D.

    AU - Jagodnik, Kathleen M.

    AU - Fernandez, Nicolas F.

    AU - Gundersen, Gregory W.

    AU - Rouillard, Andrew D.

    AU - Jenkins, Sherry L.

    AU - Feldmann, Axel S.

    AU - Hu, Kevin S.

    AU - McDermott, Michael G.

    AU - Duan, Qiaonan

    AU - Clark, Neil R.

    AU - Jones, Matthew R.

    AU - Kou, Yan

    AU - Goff, Troy

    AU - Woodland, Holly

    AU - Amaral, Fabio M R

    AU - Szeto, Gregory L.

    AU - Fuchs, Oliver

    AU - Schüssler-Fiorenza Rose, Sophia M.

    AU - Sharma, Shvetank

    AU - Schwartz, Uwe

    AU - Bausela, Xabier Bengoetxea

    AU - Szymkiewicz, Maciej

    AU - Maroulis, Vasileios

    AU - Salykin, Anton

    AU - Barra, Carolina M.

    AU - Kruth, Candice D.

    AU - Bongio, Nicholas J.

    AU - Mathur, Vaibhav

    AU - Todoric, Radmila D.

    AU - Rubin, Udi E.

    AU - Malatras, Apostolos

    AU - Fulp, Carl T.

    AU - Galindo, John A.

    AU - Motiejunaite, Ruta

    AU - Jüschke, Christoph

    AU - Dishuck, Philip C.

    AU - Lahl, Katharina

    AU - Jafari, Mohieddin

    AU - Aibar, Sara

    AU - Zaravinos, Apostolos

    AU - Steenhuizen, Linda H.

    AU - Allison, Lindsey R.

    AU - Gamallo, Pablo

    AU - De Andres Segura, Fernando

    AU - Dae Devlin, Tyler

    AU - Pérez-García, Vicente

    AU - Ma'ayan, Avi

    PY - 2016

    Y1 - 2016

    N2 - Gene expression data are accumulating exponentially in public repositories. Reanalysis and integration of themed collections from these studies may provide new insights, but requires further human curation. Here we report a crowdsourcing project to annotate and reanalyse a large number of gene expression profiles from Gene Expression Omnibus (GEO). Through a massive open online course on Coursera, over 70 participants from over 25 countries identify and annotate 2,460 single-gene perturbation signatures, 839 disease versus normal signatures, and 906 drug perturbation signatures. All these signatures are unique and are manually validated for quality. Global analysis of these signatures confirms known associations and identifies novel associations between genes, diseases and drugs. The manually curated signatures are used as a training set to develop classifiers for extracting similar signatures from the entire GEO repository. We develop a web portal to serve these signatures for query, download and visualization.

    AB - Gene expression data are accumulating exponentially in public repositories. Reanalysis and integration of themed collections from these studies may provide new insights, but requires further human curation. Here we report a crowdsourcing project to annotate and reanalyse a large number of gene expression profiles from Gene Expression Omnibus (GEO). Through a massive open online course on Coursera, over 70 participants from over 25 countries identify and annotate 2,460 single-gene perturbation signatures, 839 disease versus normal signatures, and 906 drug perturbation signatures. All these signatures are unique and are manually validated for quality. Global analysis of these signatures confirms known associations and identifies novel associations between genes, diseases and drugs. The manually curated signatures are used as a training set to develop classifiers for extracting similar signatures from the entire GEO repository. We develop a web portal to serve these signatures for query, download and visualization.

    KW - Chemistry (all)

    KW - Biochemistry, Genetics and Molecular Biology (all)

    KW - Physics and Astronomy (all)

    U2 - 10.1038/ncomms12846

    DO - 10.1038/ncomms12846

    M3 - Journal article

    C2 - 27667448

    VL - 7

    JO - Nature Communications

    JF - Nature Communications

    SN - 2041-1723

    M1 - 12846

    ER -

    Wang Z, Monteiro CD, Jagodnik KM, Fernandez NF, Gundersen GW, Rouillard AD et al. Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd. Nature Communications. 2016;7. 12846. https://doi.org/10.1038/ncomms12846