Performance and precision of double digestion RAD (ddRAD) genotyping in large multiplexed datasets of marine fish species

F. Maroso*, J E J Hillen, B. G. Pardo, K. Gkagkavouzis, I. Coscia, M. Hermida, R. Franch, B. Hellemans, J. Van Houdt, B. Simionati, J. B. Taggart, Einar Eg Nielsen, G. Maes, S. A. Ciavaglia, L. M. I. Webster, F. A. M. Volckaert, P. Martinez, L. Bargelloni, R. Ogden, Consortium AquaTrace

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

Abstract

The development of Genotyping-By-Sequencing (GBS) technologies enables cost-effective analysis of large numbers of Single Nucleotide Polymorphisms (SNPs), especially in "non-model" species. Nevertheless, as such technologies enter a mature phase, biases and errors inherent to GBS are becoming evident. Here, we evaluated the performance of double digest Restriction enzyme Associated DNA (ddRAD) sequencing in SNP genotyping studies including high number of samples. Datasets of sequence data were generated from three marine teleost species (>5500 samples, >2.5 × 1012 bases in total), using a standardized protocol. A common bioinformatics pipeline based on STACKS was established, with and without the use of a reference genome. We performed analyses throughout the production and analysis of ddRAD data in order to explore (i) the loss of information due to heterogeneous raw read number across samples; (ii) the discrepancy between expected and observed tag length and coverage; (iii) the performances of reference based vs. de novo approaches; (iv) the sources of potential genotyping errors of the library preparation/bioinformatics protocol, by comparing technical replicates. Our results showed use of a reference genome and a posteriori genotype correction improved genotyping precision. Individual read coverage was a key variable for reproducibility; variance in sequencing depth between loci in the same individual was also identified as an important factor and found to correlate to tag length. A comparison of downstream analysis carried out with ddRAD vs single SNP allele specific assay genotypes provided information about the levels of genotyping imprecision that can have a significant impact on allele frequency estimations and population assignment. The results and insights presented here will help to select and improve approaches to the analysis of large datasets based on RAD-like methodologies.
Original languageEnglish
JournalMarine Genomics
Volume39
Pages (from-to)64-72
ISSN1874-7787
DOIs
Publication statusPublished - 2018

Keywords

  • European sea bass
  • GBS
  • Gilthead sea bream
  • Sequencing precision
  • Turbot
  • ddRAD

Cite this

Maroso, F., Hillen, J. E. J., Pardo, B. G., Gkagkavouzis, K., Coscia, I., Hermida, M., ... AquaTrace, C. (2018). Performance and precision of double digestion RAD (ddRAD) genotyping in large multiplexed datasets of marine fish species. Marine Genomics, 39, 64-72. https://doi.org/10.1016/j.margen.2018.02.002
Maroso, F. ; Hillen, J E J ; Pardo, B. G. ; Gkagkavouzis, K. ; Coscia, I. ; Hermida, M. ; Franch, R. ; Hellemans, B. ; Van Houdt, J. ; Simionati, B. ; Taggart, J. B. ; Nielsen, Einar Eg ; Maes, G. ; Ciavaglia, S. A. ; Webster, L. M. I. ; Volckaert, F. A. M. ; Martinez, P. ; Bargelloni, L. ; Ogden, R. ; AquaTrace, Consortium. / Performance and precision of double digestion RAD (ddRAD) genotyping in large multiplexed datasets of marine fish species. In: Marine Genomics. 2018 ; Vol. 39. pp. 64-72.
@article{1ec3ec71019643f0be723fa7ad65e2d4,
title = "Performance and precision of double digestion RAD (ddRAD) genotyping in large multiplexed datasets of marine fish species",
abstract = "The development of Genotyping-By-Sequencing (GBS) technologies enables cost-effective analysis of large numbers of Single Nucleotide Polymorphisms (SNPs), especially in {"}non-model{"} species. Nevertheless, as such technologies enter a mature phase, biases and errors inherent to GBS are becoming evident. Here, we evaluated the performance of double digest Restriction enzyme Associated DNA (ddRAD) sequencing in SNP genotyping studies including high number of samples. Datasets of sequence data were generated from three marine teleost species (>5500 samples, >2.5 × 1012 bases in total), using a standardized protocol. A common bioinformatics pipeline based on STACKS was established, with and without the use of a reference genome. We performed analyses throughout the production and analysis of ddRAD data in order to explore (i) the loss of information due to heterogeneous raw read number across samples; (ii) the discrepancy between expected and observed tag length and coverage; (iii) the performances of reference based vs. de novo approaches; (iv) the sources of potential genotyping errors of the library preparation/bioinformatics protocol, by comparing technical replicates. Our results showed use of a reference genome and a posteriori genotype correction improved genotyping precision. Individual read coverage was a key variable for reproducibility; variance in sequencing depth between loci in the same individual was also identified as an important factor and found to correlate to tag length. A comparison of downstream analysis carried out with ddRAD vs single SNP allele specific assay genotypes provided information about the levels of genotyping imprecision that can have a significant impact on allele frequency estimations and population assignment. The results and insights presented here will help to select and improve approaches to the analysis of large datasets based on RAD-like methodologies.",
keywords = "European sea bass, GBS, Gilthead sea bream, Sequencing precision, Turbot, ddRAD",
author = "F. Maroso and Hillen, {J E J} and Pardo, {B. G.} and K. Gkagkavouzis and I. Coscia and M. Hermida and R. Franch and B. Hellemans and {Van Houdt}, J. and B. Simionati and Taggart, {J. B.} and Nielsen, {Einar Eg} and G. Maes and Ciavaglia, {S. A.} and Webster, {L. M. I.} and Volckaert, {F. A. M.} and P. Martinez and L. Bargelloni and R. Ogden and Consortium AquaTrace",
year = "2018",
doi = "10.1016/j.margen.2018.02.002",
language = "English",
volume = "39",
pages = "64--72",
journal = "Marine Genomics",
issn = "1874-7787",
publisher = "Elsevier",

}

Maroso, F, Hillen, JEJ, Pardo, BG, Gkagkavouzis, K, Coscia, I, Hermida, M, Franch, R, Hellemans, B, Van Houdt, J, Simionati, B, Taggart, JB, Nielsen, EE, Maes, G, Ciavaglia, SA, Webster, LMI, Volckaert, FAM, Martinez, P, Bargelloni, L, Ogden, R & AquaTrace, C 2018, 'Performance and precision of double digestion RAD (ddRAD) genotyping in large multiplexed datasets of marine fish species', Marine Genomics, vol. 39, pp. 64-72. https://doi.org/10.1016/j.margen.2018.02.002

Performance and precision of double digestion RAD (ddRAD) genotyping in large multiplexed datasets of marine fish species. / Maroso, F.; Hillen, J E J; Pardo, B. G.; Gkagkavouzis, K.; Coscia, I.; Hermida, M.; Franch, R.; Hellemans, B.; Van Houdt, J.; Simionati, B.; Taggart, J. B.; Nielsen, Einar Eg; Maes, G.; Ciavaglia, S. A.; Webster, L. M. I.; Volckaert, F. A. M.; Martinez, P.; Bargelloni, L.; Ogden, R.; AquaTrace, Consortium.

In: Marine Genomics, Vol. 39, 2018, p. 64-72.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - Performance and precision of double digestion RAD (ddRAD) genotyping in large multiplexed datasets of marine fish species

AU - Maroso, F.

AU - Hillen, J E J

AU - Pardo, B. G.

AU - Gkagkavouzis, K.

AU - Coscia, I.

AU - Hermida, M.

AU - Franch, R.

AU - Hellemans, B.

AU - Van Houdt, J.

AU - Simionati, B.

AU - Taggart, J. B.

AU - Nielsen, Einar Eg

AU - Maes, G.

AU - Ciavaglia, S. A.

AU - Webster, L. M. I.

AU - Volckaert, F. A. M.

AU - Martinez, P.

AU - Bargelloni, L.

AU - Ogden, R.

AU - AquaTrace, Consortium

PY - 2018

Y1 - 2018

N2 - The development of Genotyping-By-Sequencing (GBS) technologies enables cost-effective analysis of large numbers of Single Nucleotide Polymorphisms (SNPs), especially in "non-model" species. Nevertheless, as such technologies enter a mature phase, biases and errors inherent to GBS are becoming evident. Here, we evaluated the performance of double digest Restriction enzyme Associated DNA (ddRAD) sequencing in SNP genotyping studies including high number of samples. Datasets of sequence data were generated from three marine teleost species (>5500 samples, >2.5 × 1012 bases in total), using a standardized protocol. A common bioinformatics pipeline based on STACKS was established, with and without the use of a reference genome. We performed analyses throughout the production and analysis of ddRAD data in order to explore (i) the loss of information due to heterogeneous raw read number across samples; (ii) the discrepancy between expected and observed tag length and coverage; (iii) the performances of reference based vs. de novo approaches; (iv) the sources of potential genotyping errors of the library preparation/bioinformatics protocol, by comparing technical replicates. Our results showed use of a reference genome and a posteriori genotype correction improved genotyping precision. Individual read coverage was a key variable for reproducibility; variance in sequencing depth between loci in the same individual was also identified as an important factor and found to correlate to tag length. A comparison of downstream analysis carried out with ddRAD vs single SNP allele specific assay genotypes provided information about the levels of genotyping imprecision that can have a significant impact on allele frequency estimations and population assignment. The results and insights presented here will help to select and improve approaches to the analysis of large datasets based on RAD-like methodologies.

AB - The development of Genotyping-By-Sequencing (GBS) technologies enables cost-effective analysis of large numbers of Single Nucleotide Polymorphisms (SNPs), especially in "non-model" species. Nevertheless, as such technologies enter a mature phase, biases and errors inherent to GBS are becoming evident. Here, we evaluated the performance of double digest Restriction enzyme Associated DNA (ddRAD) sequencing in SNP genotyping studies including high number of samples. Datasets of sequence data were generated from three marine teleost species (>5500 samples, >2.5 × 1012 bases in total), using a standardized protocol. A common bioinformatics pipeline based on STACKS was established, with and without the use of a reference genome. We performed analyses throughout the production and analysis of ddRAD data in order to explore (i) the loss of information due to heterogeneous raw read number across samples; (ii) the discrepancy between expected and observed tag length and coverage; (iii) the performances of reference based vs. de novo approaches; (iv) the sources of potential genotyping errors of the library preparation/bioinformatics protocol, by comparing technical replicates. Our results showed use of a reference genome and a posteriori genotype correction improved genotyping precision. Individual read coverage was a key variable for reproducibility; variance in sequencing depth between loci in the same individual was also identified as an important factor and found to correlate to tag length. A comparison of downstream analysis carried out with ddRAD vs single SNP allele specific assay genotypes provided information about the levels of genotyping imprecision that can have a significant impact on allele frequency estimations and population assignment. The results and insights presented here will help to select and improve approaches to the analysis of large datasets based on RAD-like methodologies.

KW - European sea bass

KW - GBS

KW - Gilthead sea bream

KW - Sequencing precision

KW - Turbot

KW - ddRAD

U2 - 10.1016/j.margen.2018.02.002

DO - 10.1016/j.margen.2018.02.002

M3 - Journal article

VL - 39

SP - 64

EP - 72

JO - Marine Genomics

JF - Marine Genomics

SN - 1874-7787

ER -