SecretoGen: towards prediction of signal peptides for efficient protein secretion

Felix Teufel, Carsten Stahlhut, Jan Christian Refsgaard, Henrik Nielsen, Ole Winther, Dennis Madsen

Research output: Contribution to conferencePaperResearchpeer-review

Abstract

Signal peptides (SPs) are short sequences at the N terminus of proteins that control
their secretion in all living organisms. Secretion is of great importance in biotechnology, as industrial production of proteins in host organisms often requires the proteins to be secreted. SPs have varying secretion efficiency that is dependent both on the host organism and the protein they are combined with. Therefore, to optimize production yields, an SP with good efficiency needs to be identified for each protein. While SPs can be predicted accurately by machine learning models, such models have so far shown limited utility for predicting secretion efficiency. We introduce SecretoGen, a generative transformer trained on millions of naturally occuring SPs from diverse organisms. Evaluation on a range of secretion efficiency datasets show that SecretoGen’s perplexity has promising performance for selecting efficient SPs, without requiring training on experimental efficiency data.
Original languageEnglish
Publication date2023
Number of pages10
Publication statusPublished - 2023
Event37th Annual Conference on Neural Information Processing Systems - Ernest N. Morial Convention Center, New Orleans, United States
Duration: 10 Dec 202316 Dec 2023
Conference number: 37

Conference

Conference37th Annual Conference on Neural Information Processing Systems
Number37
LocationErnest N. Morial Convention Center
Country/TerritoryUnited States
CityNew Orleans
Period10/12/202316/12/2023

Fingerprint

Dive into the research topics of 'SecretoGen: towards prediction of signal peptides for efficient protein secretion'. Together they form a unique fingerprint.

Cite this