Recent development of antiSMASH and other computational approaches to mine secondary metabolite biosynthetic gene clusters

Kai Blin, Hyun Uk Kim, Marnix H. Medema, Tilmann Weber*

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

251 Downloads (Pure)

Abstract

Many drugs are derived from small molecules produced by microorganisms and plants, so-called natural products. Natural products have diverse chemical structures, but the biosynthetic pathways producing those compounds are often organized as biosynthetic gene clusters (BGCs) and follow a highly conserved biosynthetic logic. This allows for the identification of core biosynthetic enzymes using genome mining strategies that are based on the sequence similarity of the involved enzymes/genes. However, mining for a variety of BGCs quickly approaches a complexity level where manual analyses are no longer possible and require the use of automated genome mining pipelines, such as the antiSMASH software. In this review, we discuss the principles underlying the predictions of antiSMASH and other tools and provide practical advice for their application. Furthermore, we discuss important caveats such as rule-based BGC detection, sequence and annotation quality and cluster boundary prediction, which all have to be considered while planning for, performing and analyzing the results of genome mining studies.
Original languageEnglish
JournalBriefings in Bioinformatics
Volume20
Issue number4
Pages (from-to)1103-1113
ISSN1467-5463
DOIs
Publication statusPublished - 2019

Keywords

  • Genome mining
  • Biosynthetic gene cluster
  • Antibiotics
  • Secondary metabolites
  • Natural products
  • AntiSMASH

Cite this

@article{ec83b4a9e6244807a1a215ca26ff5e53,
title = "Recent development of antiSMASH and other computational approaches to mine secondary metabolite biosynthetic gene clusters",
abstract = "Many drugs are derived from small molecules produced by microorganisms and plants, so-called natural products. Natural products have diverse chemical structures, but the biosynthetic pathways producing those compounds are often organized as biosynthetic gene clusters (BGCs) and follow a highly conserved biosynthetic logic. This allows for the identification of core biosynthetic enzymes using genome mining strategies that are based on the sequence similarity of the involved enzymes/genes. However, mining for a variety of BGCs quickly approaches a complexity level where manual analyses are no longer possible and require the use of automated genome mining pipelines, such as the antiSMASH software. In this review, we discuss the principles underlying the predictions of antiSMASH and other tools and provide practical advice for their application. Furthermore, we discuss important caveats such as rule-based BGC detection, sequence and annotation quality and cluster boundary prediction, which all have to be considered while planning for, performing and analyzing the results of genome mining studies.",
keywords = "Genome mining, Biosynthetic gene cluster, Antibiotics, Secondary metabolites, Natural products, AntiSMASH",
author = "Kai Blin and Kim, {Hyun Uk} and Medema, {Marnix H.} and Tilmann Weber",
year = "2019",
doi = "10.1093/bib/bbx146",
language = "English",
volume = "20",
pages = "1103--1113",
journal = "Briefings in Bioinformatics",
issn = "1467-5463",
publisher = "Oxford University Press",
number = "4",

}

Recent development of antiSMASH and other computational approaches to mine secondary metabolite biosynthetic gene clusters. / Blin, Kai; Kim, Hyun Uk; Medema, Marnix H.; Weber, Tilmann.

In: Briefings in Bioinformatics, Vol. 20, No. 4, 2019, p. 1103-1113.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - Recent development of antiSMASH and other computational approaches to mine secondary metabolite biosynthetic gene clusters

AU - Blin, Kai

AU - Kim, Hyun Uk

AU - Medema, Marnix H.

AU - Weber, Tilmann

PY - 2019

Y1 - 2019

N2 - Many drugs are derived from small molecules produced by microorganisms and plants, so-called natural products. Natural products have diverse chemical structures, but the biosynthetic pathways producing those compounds are often organized as biosynthetic gene clusters (BGCs) and follow a highly conserved biosynthetic logic. This allows for the identification of core biosynthetic enzymes using genome mining strategies that are based on the sequence similarity of the involved enzymes/genes. However, mining for a variety of BGCs quickly approaches a complexity level where manual analyses are no longer possible and require the use of automated genome mining pipelines, such as the antiSMASH software. In this review, we discuss the principles underlying the predictions of antiSMASH and other tools and provide practical advice for their application. Furthermore, we discuss important caveats such as rule-based BGC detection, sequence and annotation quality and cluster boundary prediction, which all have to be considered while planning for, performing and analyzing the results of genome mining studies.

AB - Many drugs are derived from small molecules produced by microorganisms and plants, so-called natural products. Natural products have diverse chemical structures, but the biosynthetic pathways producing those compounds are often organized as biosynthetic gene clusters (BGCs) and follow a highly conserved biosynthetic logic. This allows for the identification of core biosynthetic enzymes using genome mining strategies that are based on the sequence similarity of the involved enzymes/genes. However, mining for a variety of BGCs quickly approaches a complexity level where manual analyses are no longer possible and require the use of automated genome mining pipelines, such as the antiSMASH software. In this review, we discuss the principles underlying the predictions of antiSMASH and other tools and provide practical advice for their application. Furthermore, we discuss important caveats such as rule-based BGC detection, sequence and annotation quality and cluster boundary prediction, which all have to be considered while planning for, performing and analyzing the results of genome mining studies.

KW - Genome mining

KW - Biosynthetic gene cluster

KW - Antibiotics

KW - Secondary metabolites

KW - Natural products

KW - AntiSMASH

U2 - 10.1093/bib/bbx146

DO - 10.1093/bib/bbx146

M3 - Journal article

C2 - 29112695

VL - 20

SP - 1103

EP - 1113

JO - Briefings in Bioinformatics

JF - Briefings in Bioinformatics

SN - 1467-5463

IS - 4

ER -