Peptide-based functional annotation of carbohydrate-active enzymes by conserved unique peptide patterns (CUPP)

Kristian Barrett*, Lene Lange

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

128 Downloads (Pure)

Abstract

Background
Insight into the function of carbohydrate-active enzymes is required to understand their biological role and industrial potential. There is a need for better use of the ample genomic data in order to enable selection of the most interesting proteins for further studies. The basis for elaborating a new approach to sequence analysis is the hypothesis that when using conserved peptide patterns to determine the similarities between proteins, the exact spacing between conserved adjacent amino acids in the proteins plays a prominent functional role. Thus, the objective of developing the method of conserved unique peptide patterns (CUPP) is to construct a peptide-based grouping and validate the method to provide evidence that CUPP captures function-related features of the individual carbohydrate-active enzymes (as defined by CAZy families). This approach facilitates grouping of enzymes at a level lower than protein families and/or subfamilies. A standardized, efficient, and robust approach to functional annotation of carbohydrate-active enzymes would support improved molecular insight into enzyme–substrate interaction.

Results
A new nonalignment-based clustering and functional annotation tool was developed that uses conserved unique peptides patterns to perform automated clustering of proteins and formation of protein groups. A peptide-based model was constructed for each of these protein CUPP groups to be used to automatically annotate protein family, subfamily, and EC function of carbohydrate-active enzymes. CUPP prediction can annotate proteins (from any CAZy family) with high F-score to existing family (0.966), subfamily (0.961), and EC-function (0.843). The speed of the CUPP program was estimated and exemplified by prediction of the 504,017 nonredundant proteins of CAZy in less than four CPU hours.

Conclusion
It was possible to construct an automated system for clustering proteins within families and use the resulting CUPP groups to directly build peptide-based models for genome annotation. The CUPP runtime, F-score, sensitivity, and precisions of family and subfamily annotations match or represent an improvement compared to state-of-the-art tools. The speed of the CUPP annotation is similar to the rapid DIAMOND annotation tool. CUPP facilitates automated annotation of full genome assemblies to any CAZy family.
Original languageEnglish
Article number102
JournalBiotechnology for Biofuels
Volume12
Number of pages21
ISSN1754-6834
DOIs
Publication statusPublished - 2019

Cite this

@article{edad5c96c585413fba87f0d5dd51c0fb,
title = "Peptide-based functional annotation of carbohydrate-active enzymes by conserved unique peptide patterns (CUPP)",
abstract = "BackgroundInsight into the function of carbohydrate-active enzymes is required to understand their biological role and industrial potential. There is a need for better use of the ample genomic data in order to enable selection of the most interesting proteins for further studies. The basis for elaborating a new approach to sequence analysis is the hypothesis that when using conserved peptide patterns to determine the similarities between proteins, the exact spacing between conserved adjacent amino acids in the proteins plays a prominent functional role. Thus, the objective of developing the method of conserved unique peptide patterns (CUPP) is to construct a peptide-based grouping and validate the method to provide evidence that CUPP captures function-related features of the individual carbohydrate-active enzymes (as defined by CAZy families). This approach facilitates grouping of enzymes at a level lower than protein families and/or subfamilies. A standardized, efficient, and robust approach to functional annotation of carbohydrate-active enzymes would support improved molecular insight into enzyme–substrate interaction.ResultsA new nonalignment-based clustering and functional annotation tool was developed that uses conserved unique peptides patterns to perform automated clustering of proteins and formation of protein groups. A peptide-based model was constructed for each of these protein CUPP groups to be used to automatically annotate protein family, subfamily, and EC function of carbohydrate-active enzymes. CUPP prediction can annotate proteins (from any CAZy family) with high F-score to existing family (0.966), subfamily (0.961), and EC-function (0.843). The speed of the CUPP program was estimated and exemplified by prediction of the 504,017 nonredundant proteins of CAZy in less than four CPU hours.ConclusionIt was possible to construct an automated system for clustering proteins within families and use the resulting CUPP groups to directly build peptide-based models for genome annotation. The CUPP runtime, F-score, sensitivity, and precisions of family and subfamily annotations match or represent an improvement compared to state-of-the-art tools. The speed of the CUPP annotation is similar to the rapid DIAMOND annotation tool. CUPP facilitates automated annotation of full genome assemblies to any CAZy family.",
author = "Kristian Barrett and Lene Lange",
year = "2019",
doi = "10.1186/s13068-019-1436-5",
language = "English",
volume = "12",
journal = "Biotechnology for Biofuels",
issn = "1754-6834",
publisher = "BioMed Central Ltd.",

}

Peptide-based functional annotation of carbohydrate-active enzymes by conserved unique peptide patterns (CUPP). / Barrett, Kristian; Lange, Lene.

In: Biotechnology for Biofuels, Vol. 12, 102, 2019.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - Peptide-based functional annotation of carbohydrate-active enzymes by conserved unique peptide patterns (CUPP)

AU - Barrett, Kristian

AU - Lange, Lene

PY - 2019

Y1 - 2019

N2 - BackgroundInsight into the function of carbohydrate-active enzymes is required to understand their biological role and industrial potential. There is a need for better use of the ample genomic data in order to enable selection of the most interesting proteins for further studies. The basis for elaborating a new approach to sequence analysis is the hypothesis that when using conserved peptide patterns to determine the similarities between proteins, the exact spacing between conserved adjacent amino acids in the proteins plays a prominent functional role. Thus, the objective of developing the method of conserved unique peptide patterns (CUPP) is to construct a peptide-based grouping and validate the method to provide evidence that CUPP captures function-related features of the individual carbohydrate-active enzymes (as defined by CAZy families). This approach facilitates grouping of enzymes at a level lower than protein families and/or subfamilies. A standardized, efficient, and robust approach to functional annotation of carbohydrate-active enzymes would support improved molecular insight into enzyme–substrate interaction.ResultsA new nonalignment-based clustering and functional annotation tool was developed that uses conserved unique peptides patterns to perform automated clustering of proteins and formation of protein groups. A peptide-based model was constructed for each of these protein CUPP groups to be used to automatically annotate protein family, subfamily, and EC function of carbohydrate-active enzymes. CUPP prediction can annotate proteins (from any CAZy family) with high F-score to existing family (0.966), subfamily (0.961), and EC-function (0.843). The speed of the CUPP program was estimated and exemplified by prediction of the 504,017 nonredundant proteins of CAZy in less than four CPU hours.ConclusionIt was possible to construct an automated system for clustering proteins within families and use the resulting CUPP groups to directly build peptide-based models for genome annotation. The CUPP runtime, F-score, sensitivity, and precisions of family and subfamily annotations match or represent an improvement compared to state-of-the-art tools. The speed of the CUPP annotation is similar to the rapid DIAMOND annotation tool. CUPP facilitates automated annotation of full genome assemblies to any CAZy family.

AB - BackgroundInsight into the function of carbohydrate-active enzymes is required to understand their biological role and industrial potential. There is a need for better use of the ample genomic data in order to enable selection of the most interesting proteins for further studies. The basis for elaborating a new approach to sequence analysis is the hypothesis that when using conserved peptide patterns to determine the similarities between proteins, the exact spacing between conserved adjacent amino acids in the proteins plays a prominent functional role. Thus, the objective of developing the method of conserved unique peptide patterns (CUPP) is to construct a peptide-based grouping and validate the method to provide evidence that CUPP captures function-related features of the individual carbohydrate-active enzymes (as defined by CAZy families). This approach facilitates grouping of enzymes at a level lower than protein families and/or subfamilies. A standardized, efficient, and robust approach to functional annotation of carbohydrate-active enzymes would support improved molecular insight into enzyme–substrate interaction.ResultsA new nonalignment-based clustering and functional annotation tool was developed that uses conserved unique peptides patterns to perform automated clustering of proteins and formation of protein groups. A peptide-based model was constructed for each of these protein CUPP groups to be used to automatically annotate protein family, subfamily, and EC function of carbohydrate-active enzymes. CUPP prediction can annotate proteins (from any CAZy family) with high F-score to existing family (0.966), subfamily (0.961), and EC-function (0.843). The speed of the CUPP program was estimated and exemplified by prediction of the 504,017 nonredundant proteins of CAZy in less than four CPU hours.ConclusionIt was possible to construct an automated system for clustering proteins within families and use the resulting CUPP groups to directly build peptide-based models for genome annotation. The CUPP runtime, F-score, sensitivity, and precisions of family and subfamily annotations match or represent an improvement compared to state-of-the-art tools. The speed of the CUPP annotation is similar to the rapid DIAMOND annotation tool. CUPP facilitates automated annotation of full genome assemblies to any CAZy family.

U2 - 10.1186/s13068-019-1436-5

DO - 10.1186/s13068-019-1436-5

M3 - Journal article

VL - 12

JO - Biotechnology for Biofuels

JF - Biotechnology for Biofuels

SN - 1754-6834

M1 - 102

ER -