Improved Immunoinformatic Methods for Rationale T Cell Epitope Discovery

Alessandro Montemurro

Research output: Book/ReportPh.D. thesis

91 Downloads (Pure)


The research presented in this doctoral thesis involves the development of data-driven methods for understanding the mechanisms behind T cell recognition and predicting T cell specificity.
T cells play a crucial role in adaptive immunity as they are able to detect the presence of pathogens or malignant cell mutations. T cells engage with the other cells through the T cell receptor (TCR), and TCRs interact with the peptide-MHC complexes expressed on the cell surface. Upon detection of foreign antigens or malfunctioning self-antigens, T cells trigger a cascade of events that leads to the elimination of the malfunctioning cells. To ensure protection against the broadest variety of pathogens possible, the immune system has evolved to generate a highly diverse TCR repertoire. This diversity is achieved through a stochastic process of TCR generation. TCR repertoire diversity is what makes the immune system very powerful, but it also makes it challenging to understand the extract some common rules governing TCR-epitope recognition.
The first part of the thesis gives an overview of the theoretical aspects of the thesis’s topics, followed by three research projects. The thesis is concluded with an epilogue, summarizing the main findings of the research and future perspectives.
In the first published work we proposed NetTCR-2.0, a convolutional network trained on TCR and epitope amino acid sequences. We successfully built a model able to predict binding between a TCR and a peptide presented by the MHC I molecule HLAA* 02:01. We trained the neural network using both α and β -chain CDR3 loops, showing that this method consistently outperformed the models trained on single chain inputs. Subsequently, we expanded the proposed model to include the full set of six CDR sequences as input, showing that this yields a gain in performance. Furthermore, as new data was released, NetTCR-2.1 was trained on a larger dataset covering more HLA molecules. Special attention was given to data curation during the model development. We defined a pipeline to pre-process the input data and prevent performance inflation due to data redundancy. The pipeline also included an analysis on how to artificially generate a set of negative interactions, as these are usually not available.
The final research project reported in this dissertation presents the results from an ongoing project and proposes an application of the NetTCR method described in the previous research papers. Given the potentially large amount of data generated with single-cell RNA sequencing platforms, filtering pipelines are being developed to remove artifacts and noisy data points from the dataset. We presented two data-driven filtering approaches, ICON and ATRAP, and compared their ability to filter the data. We concluded that the two pipelines successfully filter out noisy TCRpeptide annotations, retaining only the most reliable interactions. We confirmed this by training a neural network on the raw and the filtered data, showing that the models trained on the cleaned dataset yield improved performance.
As a whole, the presented work aims to uncover the mechanisms behind TCR recognition and provides a computational framework to predict TCR-peptide interaction. Being able to predict T cell specificity will make it easier to create novel strategies for the treatment of infections, autoimmune diseases, as well as cancer.
Original languageEnglish
PublisherDTU Health Technology
Number of pages160
Publication statusPublished - 2022


Dive into the research topics of 'Improved Immunoinformatic Methods for Rationale T Cell Epitope Discovery'. Together they form a unique fingerprint.

Cite this