Skip to main navigation Skip to search Skip to main content

Project Details

Layman's description

The central dogma of molecular biology tells us that DNA is transcribed into RNA, which is then translated into proteins – the molecules that carry out functions in living cells. Here one often assumes there exists a one-to-one mapping between a specific gene in the DNA and the protein that is produced. However, this is not the full story. Every gene in the genome consists of coding and non-coding regions, which are called exons and introns, respectively. When a gene is transcribed into RNA, it first exists as a pre-mRNA molecule, which contains both introns and exons. Which exons are retained in the mature mRNA molecule is regulated by RNA and protein complexes during a process called alternative splicing. Simply put, the exons can be joined in different combinations, which in turn leads to different mRNA isoforms. The proteins translated from these different mRNA isoforms may contain differences in the amino acid sequence. This means that a single gene can produce multiple protein isoforms, i.e. different versions of a protein with potentially distinct biological functions. In fact, over 75% of human genes generate multiple isoforms, and which ones are expressed can differ between tissues, cell types, and even disease states. Understanding this layer of regulation is essential for deciphering how our cells work and what goes wrong in disease.

Modern sequencing technologies allow us to measure how genes are expressed in individual cells, but directly quantifying isoform expression remains technically difficult and expensive. In contrast, gene-level expression data are abundant, with millions of publicly available samples. This creates an opportunity: if we can accurately infer isoform expression from gene expression, we could unlock a much deeper understanding of biology using existing data.

Hence this project aims at solving this problem, by developing machine learning models that learn the rules governing isoform usage. As described above which exons are retained in the mature mRNA molecule is controlled by other proteins and RNAs, and hence there exist a tight regulation between how certain splicing-related genes regulate the isoform usage of others. Hence, we believe that using machine learning approaches we will be able to learn the rules of this fundamental regulation. If successful, this approach could reveal the hidden layer of isoform regulation and make isoform-level insights accessible for millions of existing gene expression samples.
StatusActive
Effective start/end date15/03/202514/03/2028

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.