Alpha diversity metrics for noisy OTUs

Robert C. Edgar, Henrik Flyvbjerg

    Research output: Contribution to journalJournal articleResearch

    76 Downloads (Pure)

    Abstract

    Next-generation sequencing (NGS) of marker genes such as 16S ribosomal RNA is widely used to survey microbial communities. The in-sample (alpha) diversity of Operational Taxonomic Units (OTUs) is often summarized by metrics such as richness or entropy which are calculated from observed abundances, or by estimators such as Chao1 which extrapolate to unobserved OTUs. Most such measures are adopted from traditional biodiversity studies, where observational error can often be neglected. However, errors introduced by next-generation amplicon sequencing tend to induce spurious OTUs and spurious counts in OTU tables, both of which are especially prevalent at low abundances. In consequence, traditional metrics may be grossly inaccurate if they are naively applied to NGS OTU tables. In this work, we describe two novel alpha diversity estimators which are calculated from OTU abundances above a specified threshold. The singleton-free estimator (SFE) is a non-parametric estimator which is derived from a similar approach to Chao1 but extrapolates using doublet and triplet abundances rather than singletons and doublets. The octave estimator (OE) fits a log-normal distribution to non-singleton bars of an octave plot. We show that these estimators are effective under suitable conditions, but these conditions rarely apply in practice. We conclude that extrapolating to unobserved OTUs remains an open problem which is unlikely to be solved in the near future.
    Original languageEnglish
    JournalbioRxiv
    Number of pages21
    DOIs
    Publication statusPublished - 2018

    Fingerprint Dive into the research topics of 'Alpha diversity metrics for noisy OTUs'. Together they form a unique fingerprint.

    Cite this