TY - JOUR
T1 - BGCFlow
T2 - systematic pangenome workflow for the analysis of biosynthetic gene clusters across large genomic datasets
AU - Nuhamunada, Matin
AU - Mohite, Omkar S.
AU - Phaneuf, Patrick V.
AU - Palsson, Bernhard O.
AU - Weber, Tilmann
N1 - Publisher Copyright:
© 2024 The Author(s). Published by Oxford University Press on behalf of Nucleic Acids Research.
PY - 2024
Y1 - 2024
N2 - Genome mining is revolutionizing natural products discovery efforts. The rapid increase in available genomes demands comprehensive computational platforms to effectively extract biosynthetic knowledge encoded across bacterial pangenomes. Here, we present BGCFlow, a novel systematic workflow integrating analytics for large-scale genome mining of bacterial pangenomes. BGCFlow incorporates several genome analytics and mining tools grouped into five common stages of analysis such as: (i) data selection, (ii) functional annotation, (iii) phylogenetic analysis, (iv) genome mining, and (v) comparative analysis. Furthermore, BGCFlow provides easy configuration of different projects, parallel distribution, scheduled job monitoring, an interactive database to visualize tables, exploratory Jupyter Notebooks, and customized reports. Here, we demonstrate the application of BGCFlow by investigating the phylogenetic distribution of various biosynthetic gene clusters detected across 42 genomes of the Saccharopolyspora genus, known to produce industrially important secondary/specialized metabolites. The BGCFlow-guided analysis predicted more accurate dereplication of BGCs and guided the targeted comparative analysis of selected RiPPs. The scalable, interoperable, adaptable, re-entrant, and reproducible nature of the BGCFlow will provide an effective novel way to extract the biosynthetic knowledge from the ever-growing genomic datasets of biotechnologically relevant bacterial species.
AB - Genome mining is revolutionizing natural products discovery efforts. The rapid increase in available genomes demands comprehensive computational platforms to effectively extract biosynthetic knowledge encoded across bacterial pangenomes. Here, we present BGCFlow, a novel systematic workflow integrating analytics for large-scale genome mining of bacterial pangenomes. BGCFlow incorporates several genome analytics and mining tools grouped into five common stages of analysis such as: (i) data selection, (ii) functional annotation, (iii) phylogenetic analysis, (iv) genome mining, and (v) comparative analysis. Furthermore, BGCFlow provides easy configuration of different projects, parallel distribution, scheduled job monitoring, an interactive database to visualize tables, exploratory Jupyter Notebooks, and customized reports. Here, we demonstrate the application of BGCFlow by investigating the phylogenetic distribution of various biosynthetic gene clusters detected across 42 genomes of the Saccharopolyspora genus, known to produce industrially important secondary/specialized metabolites. The BGCFlow-guided analysis predicted more accurate dereplication of BGCs and guided the targeted comparative analysis of selected RiPPs. The scalable, interoperable, adaptable, re-entrant, and reproducible nature of the BGCFlow will provide an effective novel way to extract the biosynthetic knowledge from the ever-growing genomic datasets of biotechnologically relevant bacterial species.
U2 - 10.1093/nar/gkae314
DO - 10.1093/nar/gkae314
M3 - Journal article
C2 - 38686794
AN - SCOPUS:85195707988
SN - 0305-1048
VL - 52
SP - 5478
EP - 5495
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - 10
ER -