TY - JOUR
T1 - Functional annotation of enzyme-encoding genes using deep learning with transformer layers
AU - Kim, Gi Bae
AU - Kim, Ji Yeon
AU - Lee, Jong An
AU - Norsigian, Charles J.
AU - Palsson, Bernhard O.
AU - Lee, Sang Yup
N1 - Publisher Copyright:
© 2023, The Author(s).
PY - 2023
Y1 - 2023
N2 - Functional annotation of open reading frames in microbial genomes
remains substantially incomplete. Enzymes constitute the most prevalent
functional gene class in microbial genomes and can be described by their
specific catalytic functions using the Enzyme Commission (EC) number.
Consequently, the ability to predict EC numbers could substantially
reduce the number of un-annotated genes. Here we present a deep learning
model, DeepECtransformer, which utilizes transformer layers as a neural
network architecture to predict EC numbers. Using the extensively
studied Escherichia coli K-12 MG1655 genome, DeepECtransformer
predicted EC numbers for 464 un-annotated genes. We experimentally
validated the enzymatic activities predicted for three proteins (YgfF,
YciO, and YjdM). Further examination of the neural network’s reasoning
process revealed that the trained neural network relies on functional
motifs of enzymes to predict EC numbers. Thus, DeepECtransformer is a
method that facilitates the functional annotation of uncharacterized
genes.
AB - Functional annotation of open reading frames in microbial genomes
remains substantially incomplete. Enzymes constitute the most prevalent
functional gene class in microbial genomes and can be described by their
specific catalytic functions using the Enzyme Commission (EC) number.
Consequently, the ability to predict EC numbers could substantially
reduce the number of un-annotated genes. Here we present a deep learning
model, DeepECtransformer, which utilizes transformer layers as a neural
network architecture to predict EC numbers. Using the extensively
studied Escherichia coli K-12 MG1655 genome, DeepECtransformer
predicted EC numbers for 464 un-annotated genes. We experimentally
validated the enzymatic activities predicted for three proteins (YgfF,
YciO, and YjdM). Further examination of the neural network’s reasoning
process revealed that the trained neural network relies on functional
motifs of enzymes to predict EC numbers. Thus, DeepECtransformer is a
method that facilitates the functional annotation of uncharacterized
genes.
U2 - 10.1038/s41467-023-43216-z
DO - 10.1038/s41467-023-43216-z
M3 - Journal article
C2 - 37963869
AN - SCOPUS:85176391236
SN - 2041-1723
VL - 14
JO - Nature Communications
JF - Nature Communications
M1 - 7370
ER -