The effect of sample processing on microbiomes using metagenomics characterization

Casper Sahl Poulsen*

*Corresponding author for this work

Research output: Book/ReportPh.D. thesisResearch

231 Downloads (Pure)


Advancements in sequencing technologies allow unprecedented detailed DNA and RNA based characterization of complex microbial communities. Metagenomics is the direct study of genetic material from a microbiome and it can be implemented in diagnostics, surveillance and the clinical prediction of a number of diseases. The method has a huge potential because it bypasses classical isolation, cultivation and characterization of microorganisms and the use of marker genes limited to specific groups of organisms. In addition, the functional capabilities of a microbial community can be investigated, such as patterns in antimicrobial resistance genes. There is an increasing concern regarding the bias associated with sample and data processing including: Sampling, storage, DNA isolation, library preparation, sequencing, data manipulation and statistical analysis. The aim of this PhD is to provide guidelines on sample storage, library preparation, sequencing and data processing in the context of developing a global real time data sharing, analysis and interpretation platform.
All data in this PhD were generated from two pig feces samples (P1 and P2) and two sewage samples (S1 and S2) obtaining a large number of aliquots that were used to investigate a number of different sample processing parameters. In manuscript I the effect of short term storage was investigated in the context of time (0h, 16h and 64h) and temperature (22°C, 5°C, -20°C and -80°C). In addition a well characterized mock community consisting of live cells was used to spike a portion of the samples (P1, P2, S1 and S2) when they arrived in the laboratory. In manuscript II a subset of unspiked samples (0h and 64h stored at -80°C) were used to investigate different library preparation kits (KAPA PCR-free, NEXTflex PCR-free and Nextera) and sequencing platforms (HiSeq and NextSeq). Additional investigation of storage, included subjecting spiked and unspiked aliquots from P1 and S1 to freeze-thawing (1, 2, 3, 4 and 5 freeze-thaw cycles) and long-term freezing (4m, 8m and 12m). The data in manuscript III were generated in silico to assess the robustness of using different combinations of data pre-processing (normalization, transformation and standardization) and betadiversity metrics in the context of multivariate visualization.
Storage, library preparation and sequencing have systematic effects on the microbial characterization of samples (P1, P2, S1 and S2). In manuscript I frozen samples clustered regardless of being stored at -20°C or -80°C. Storing samples at room temperature revealed that the microbial community changed along a gradient corresponding to the time of storage. Similar results were observed when storing samples in the fridge, but when the storage time was limited to 16h, samples remained similar to the samples that were processed immediately upon arrival in the laboratory. A significant effect of storage was observed using permutational analysis of variance (PERMANOVA), but it was possible to differentiate samples (P1, P2, S1 and S2) from each other. In manuscript II library preparation and sequencing bias were at a level making it difficult to differentiate between the two pig feces samples highlighting that sample processing bias is a concern when comparing similar samples. Comparing storage (limited to two conditions), library preparation and sequencing platform with PERMANOVA revealed that the latter two introduced the largest variation into the data. It was possible in both manuscript I and II to associate specific groups of organisms with sample processing. In manuscript III theoretical and applied perspectives are provided on data processing on how to pre-process and calculate beta-diversity. A framework is provided for other researchers to assess the robustness of their data.
To generate comparable data it is recommended to keep processing parameters the same throughout a study. However, when this is not possible, thorough reporting and depositing of raw data and metadata is highly important for other researchers to validate the results. This is also true in general for replication purposes. Overall recommendation to limit storage bias is to process samples immediately upon arrival in the laboratory. Alternatively, to store samples in the fridge and process them on the following day. If this is not possible for all samples exclusively, all samples should be frozen, before further processing, to stabilize the community and infer a similar bias.
Considering that the data were derived from four different microbiomes (P1, P2, S1 and S2), caution should be asserted not to overemphasize the results obtained in the thesis and extrapolate recommendations to all sample types. However, the investigations were comprehensive and the many parameters investigated resulted in the sequencing of 438 metagenomes, representing one of the largest validation efforts to investigate the effect of sample processing in metagenomics. This project highlights the need for continuous validation to establish metagenomics as a diagnostic, surveillance and disease prediction tool.
Original languageEnglish
Place of PublicationKgs. Lyngby, Denmark
PublisherTechnical University of Denmark
Number of pages91
Publication statusPublished - 2019

Cite this