Activities per year
Abstract
Machine learning models trained on protein data tend to underperform due to the low amount of annotated data. Current research has shown that Language Models (LM) trained on unlabeled protein sequences can be used to improve performance on protein prediction tasks. However, protein LMs have not been fully studied, and their full capabilities are yet to be explored. A protein LM can be defined as a model that predicts the next amino acid given the context previous to that amino acid. In this research, we focus on assembling a high-quality protein dataset suitable for protein language modelling and training a Recurrent Neural Language Model on this dataset. We show that the protein LM learns to predict the next amino acid in a sequence and creates amino acid representations that are context dependent. In addition, our protein LM is able to predict the probability of a protein sequence, being able to discriminate between real and fake proteins. Finally, we show that our model also can generate new protein sequences with similar features to real proteins.
Original language | English |
---|---|
Publication date | 2019 |
Number of pages | 1 |
Publication status | Published - 2019 |
Event | ISMB/ECCB 27th Conference on Intelligent Systems for Molecular Biology and the 18th European Conference on Computational Biology - Basel, Switzerland Duration: 21 Jul 2019 → 25 Jul 2019 Conference number: 27 https://www.iscb.org/ismbeccb2019 |
Conference
Conference | ISMB/ECCB 27th Conference on Intelligent Systems for Molecular Biology and the 18th European Conference on Computational Biology |
---|---|
Number | 27 |
Country/Territory | Switzerland |
City | Basel |
Period | 21/07/2019 → 25/07/2019 |
Internet address |
Fingerprint
Dive into the research topics of 'Learning the language of life'. Together they form a unique fingerprint.Activities
- 1 Conference presentations
-
Learning the language of life
Almagro Armenteros, J. J. (Guest lecturer), Nielsen, H. (Other), Johansen, A. R. (Other) & Winther, O. (Other)
25 Jul 2019Activity: Talks and presentations › Conference presentations
File