Specificity prediction of adenylation domains in nonribosomal peptide synthetases (NRPS) using transductive support vector machines (TSVMs): Nucleic Acids Research

C. Rausch, Tilmann Weber, O. Kohlbacher, W. Wohlleben, D. H. Huson

Research output: Contribution to journalJournal articleResearchpeer-review

Abstract

We present a new support vector machine (SVM)-based approach to predict the substrate specificity of subtypes of a given protein sequence family. We demonstrate the usefulness of this method on the example of aryl acid-activating and amino acid-activating adenylation domains (A domains) of nonribosomal peptide synthetases (NRPS). The residues of gramicidin synthetase A that are 8 A around the substrate amino acid and corresponding positions of other adenylation domain sequences with 397 known and unknown specificities were extracted and used to encode this physico-chemical fingerprint into normalized real-valued feature vectors based on the physico-chemical properties of the amino acids. The SVM software package SVM(light) was used for training and classification, with transductive SVMs to take advantage of the information inherent in unlabeled data. Specificities for very similar substrates that frequently show cross-specificities were pooled to the so-called composite specificities and predictive models were built for them. The reliability of the models was confirmed in cross-validations and in comparison with a currently used sequence-comparison-based method. When comparing the predictions for 1230 NRPS A domains that are currently detectable in UniProt, the new method was able to give a specificity prediction in an additional 18% of the cases compared with the old method. For 70% of the sequences both methods agreed, for
Original languageEnglish
JournalNucleic acids research
Volume33
Pages (from-to)5799-5808
Number of pages10
ISSN0305-1048
DOIs
Publication statusPublished - 2005
Externally publishedYes

Keywords

  • Acids amino acid Amino Acids classification domains Gramicidin method methods peptide peptide synthetase Peptide Synthetases properties protein sequence Software specificity Substrate Specificity vector

Fingerprint

Dive into the research topics of 'Specificity prediction of adenylation domains in nonribosomal peptide synthetases (NRPS) using transductive support vector machines (TSVMs): Nucleic Acids Research'. Together they form a unique fingerprint.

Cite this