Molecular representations in computational drug design - from early stage protein functions to machine learning based prediction of pharmacokinetic parameters

Kasper Alnor Einarson

Research output: Book/ReportPh.D. thesis

23 Downloads (Pure)

Abstract

The drug discovery and development process is both costly and time-consuming due to various factors such as in vivo animal studies and demanding clinical trials. On top of this, only a small fraction of the developed drug candidates eventually reach the market. The goal of this thesis is to address some of these challenges by using computational models to predict properties of biological drugs at an early stage of the drug development. This allows for cheap and scalable screening of a large number of drug candidates to select and design the best-fitting lead candidates to progress through the drug development pipeline. The thesis specifically focus on protein-based therapeutics (biologicals) with data originating from acylated peptide analogs as well as homologous protein sequences.

The thesis is divided into two parts. The first part presents an overview of protein representations used in computational protein design. This includes a detailed literature review on current representations for predicting drug pharmacokinetics (PK) in both small molecule drugs and biologicals. Subsequently, Machine Learning (ML) methods used in this thesis are introduced which are employed to predict and evaluate protein properties based on the molecular representations.

The second part of the thesis evaluates and discusses the practical utility of the molecular representations in protein drug design. A graph representation based on evolutionary protein sequence data is presented to focus the search for therapeutic effects in early drug discovery. Furthermore, a combination of molecule representation for biologicals is proposed, which allows for early-stage, industrially applicable evaluation of PK parameters. It is found that representations that combine numeric physiochemical descriptors with embeddings from pre-trained deep learning models provide more accurate PK predictions. Furthermore, the practical use of the proposed representations is investigated to minimize new animal experiments which continues to face increasing regulations. In order to do so, a two-step sampling strategy is proposed to minimize bias in historical data while also covering the chemical space of a new scientific project. It is shown that this method outperforms standard sampling techniques and thus, with a lower amount of needed animal studies, holds the potential to increase the efficiency and ethicality of the drug development pipeline.
Original languageEnglish
PublisherTechnical University of Denmark
Number of pages201
Publication statusPublished - 2024

Fingerprint

Dive into the research topics of 'Molecular representations in computational drug design - from early stage protein functions to machine learning based prediction of pharmacokinetic parameters'. Together they form a unique fingerprint.

Cite this