BGCFlow: systematic pangenome workflow for the analysis of biosynthetic gene clusters across large genomic datasets

Research output: Contribution to journalJournal articleResearchpeer-review

6 Downloads (Pure)

Abstract

Genome mining is revolutionizing natural products discovery efforts. The rapid increase in available genomes demands comprehensive computational platforms to effectively extract biosynthetic knowledge encoded across bacterial pangenomes. Here, we present BGCFlow, a novel systematic workflow integrating analytics for large-scale genome mining of bacterial pangenomes. BGCFlow incorporates several genome analytics and mining tools grouped into five common stages of analysis such as: (i) data selection, (ii) functional annotation, (iii) phylogenetic analysis, (iv) genome mining, and (v) comparative analysis. Furthermore, BGCFlow provides easy configuration of different projects, parallel distribution, scheduled job monitoring, an interactive database to visualize tables, exploratory Jupyter Notebooks, and customized reports. Here, we demonstrate the application of BGCFlow by investigating the phylogenetic distribution of various biosynthetic gene clusters detected across 42 genomes of the Saccharopolyspora genus, known to produce industrially important secondary/specialized metabolites. The BGCFlow-guided analysis predicted more accurate dereplication of BGCs and guided the targeted comparative analysis of selected RiPPs. The scalable, interoperable, adaptable, re-entrant, and reproducible nature of the BGCFlow will provide an effective novel way to extract the biosynthetic knowledge from the ever-growing genomic datasets of biotechnologically relevant bacterial species.

Original languageEnglish
JournalNucleic Acids Research
Volume52
Issue number10
Pages (from-to)5478-5495
ISSN0305-1048
DOIs
Publication statusPublished - 2024

Fingerprint

Dive into the research topics of 'BGCFlow: systematic pangenome workflow for the analysis of biosynthetic gene clusters across large genomic datasets'. Together they form a unique fingerprint.

Cite this