Abstract
Signal peptides (SPs) are short sequences at the N terminus of proteins that control
their secretion in all living organisms. Secretion is of great importance in biotechnology, as industrial production of proteins in host organisms often requires the proteins to be secreted. SPs have varying secretion efficiency that is dependent both on the host organism and the protein they are combined with. Therefore, to optimize production yields, an SP with good efficiency needs to be identified for each protein. While SPs can be predicted accurately by machine learning models, such models have so far shown limited utility for predicting secretion efficiency. We introduce SecretoGen, a generative transformer trained on millions of naturally occuring SPs from diverse organisms. Evaluation on a range of secretion efficiency datasets show that SecretoGen’s perplexity has promising performance for selecting efficient SPs, without requiring training on experimental efficiency data.
their secretion in all living organisms. Secretion is of great importance in biotechnology, as industrial production of proteins in host organisms often requires the proteins to be secreted. SPs have varying secretion efficiency that is dependent both on the host organism and the protein they are combined with. Therefore, to optimize production yields, an SP with good efficiency needs to be identified for each protein. While SPs can be predicted accurately by machine learning models, such models have so far shown limited utility for predicting secretion efficiency. We introduce SecretoGen, a generative transformer trained on millions of naturally occuring SPs from diverse organisms. Evaluation on a range of secretion efficiency datasets show that SecretoGen’s perplexity has promising performance for selecting efficient SPs, without requiring training on experimental efficiency data.
Original language | English |
---|---|
Publication date | 2023 |
Number of pages | 10 |
Publication status | Published - 2023 |
Event | 37th Annual Conference on Neural Information Processing Systems - Ernest N. Morial Convention Center, New Orleans, United States Duration: 10 Dec 2023 → 16 Dec 2023 Conference number: 37 |
Conference
Conference | 37th Annual Conference on Neural Information Processing Systems |
---|---|
Number | 37 |
Location | Ernest N. Morial Convention Center |
Country/Territory | United States |
City | New Orleans |
Period | 10/12/2023 → 16/12/2023 |