Contaminating viral sequences in high-throughput sequencing viromics: a linkage study of 700 sequencing libraries

Maria Asplund, Kristín Rós Kjartansdóttir, Sarah Mollerup, Lasse Vinner, Helena Fridholm, José A. R. Herrera, Jens Friis-Nielsen, Thomas Arn Hansen, Randi Holm Jensen, Ida Broman Nielsen, Stine Raith Richter, Alba Rey-Iglesia, Maria Luisa Matey-Hernandez, David E. Alquezar-Planas, Pernille V. S. Olsen, Thomas Sicheritz-Pontén, Eske Willerslev, Ole Lund, Søren Brunak, Tobias MourierLars Peter Nielsen, Jose M. G. Izarzugaza, Anders Johannes Hansen*

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

79 Downloads (Pure)


Sample preparation for High-throughput sequencing (HTS) includes treatment with various laboratory components, potentially carrying viral nucleic acids, the extent of which has not been thoroughly investigated. Our aim was to systematically examine a diverse repertoire of laboratory components used to prepare samples for HTS in order to identify contaminating viral sequences. A total of 322 samples of mainly human origin were analysed using eight protocols, applying a wide variety of laboratory components. Several samples (60% of human specimens) were processed by different protocols. In total 712 sequencing libraries were investigated for viral sequence contamination. Among sequences showing similarity to viruses, 493 were significantly associated to the use of laboratory components. Each of these viral sequences showed sporadic appearance, only being identified in a subset of the samples treated with the linked laboratory component, and some were not identified in the non-template control (NTC) samples. Remarkably, more than 65% of all viral sequences identified were within viral clusters linked to the use of laboratory components. We show that high prevalence of contaminating viral sequences can be expected in HTS-based virome data and provide an extensive list of novel contaminating viral sequences that can be used for evaluation of viral findings in future virome and metagenome studies. Moreover we show that detection can be problematic due to stochastic appearance and limited NTCs. Although the exact origin of these viral sequences requires further research, our results support laboratory component linked viral sequence contamination of both biological and synthetic origin.
Original languageEnglish
JournalClinical Microbiology and Infection
Issue number10
Pages (from-to)1277-1285
Publication statusPublished - 2019


  • Contamination
  • Cluster
  • High-throughput sequencing
  • Laboratory component
  • Metagenomic
  • Next generation sequencing
  • Nucleic acid
  • Virome
  • Virus


Dive into the research topics of 'Contaminating viral sequences in high-throughput sequencing viromics: a linkage study of 700 sequencing libraries'. Together they form a unique fingerprint.

Cite this