GEDI: an R package for integration of transcriptomic data from multiple highthroughput platforms

Mathias N. Stokholm, Maria Belen Rabaglino, Haja Kadarmideen*

*Corresponding author for this work

Research output: Contribution to conferenceConference abstract for conferenceResearchpeer-review

1 Downloads (Pure)


Transcriptomic data is often expensive and difficult to generate in large cohorts in comparison to genomic data and therefore is often important to integrate multiple transcriptomic datasets from both microarray and next generation sequencing (NGS) based transcriptomic data across similar experiments or clinical trials to improve analytical power and discovery of novel transcripts and
genes. However, transcriptomic data integration presents a few challenges including re-annotation and batch effect removal. We developed the Gene Expression Data Integration (GEDI) R package to enable transcriptomic data integration by combining already existing R packages. With just four functions, the GEDI R package makes constructing a transcriptomic data integration pipeline
straightforward. Together, the functions overcome the complications in transcriptomic data integration by automatically re-annotating the data and removing the batch effect. The removal of the batch effect is verified with Principal Component Analysis and the data integration is verified using a logistic regression model with forward stepwise feature selection. To demonstrate the functionalities of the GEDI package, we integrated five bovine endometrial transcriptomic datasets from the NCBI Gene Expression Omnibus. The datasets included Affymetrix, Agilent and RNA-sequencing data. Furthermore, we compared the GEDI package to already existing tools and found that GEDI is the only tool that provides a full transcriptomic data integration pipeline including verification of both batch effect removal and data integration.
Original languageEnglish
Publication date2021
Number of pages8
Publication statusPublished - 2021
Event6th Annual Danish Bioinformatics Conference - Aalborg, Denmark
Duration: 18 Nov 202119 Nov 2021


Conference6th Annual Danish Bioinformatics Conference


Dive into the research topics of 'GEDI: an R package for integration of transcriptomic data from multiple highthroughput platforms'. Together they form a unique fingerprint.

Cite this