Machine Learning for Molecular Science

Mathias Schreiner

Research output: Book/ReportPh.D. thesis

132 Downloads (Pure)

Abstract

Methods from Machine Learning (ML) and in particular Neural Network (NN) models have in recent years proved to be capable emulators of expensive ab inito methods for electronic-structure calculations, while operating several orders of magnitude faster. These models are slowly transforming the field of computational quantum chemistry as accurate predictions of molecular properties can be obtained at unprecedented speeds, opening up for an exciting array of new possibilities.

In this thesis, I explore how NNs can be used to accelerate Transition State (TS)- search and multi time-scale simulation of molecular systems. It covers fundamental topics in physics for stochastic processes and quantum mechanics, methods and challenges related to calculating electronic structure in molecules, and an introduction to the NNs architectures used in this work.

My work has resulted in three notable scientific contributions which are presented in the thesis. The first of these contributions is the Transition1x dataset. This consists of Density Functional Theory (DFT) calculations for 10M molecular configurations, sampled with Nudged Elastic Band (NEB), around reaction pathways for 10K different reactions involving H, C, N, and O. This dataset provides valuable data for training NN models for tasks related to chemical reactions.

In the next contribution, NeuralNEB, NNs are trained on various datasets and evaluated on their ability to act as Potential Energy Surfaces (PESs) for NEB. Here it is shown that models trained on Transition1x outperform models trained on other datasets, underlining the importance of specific data relevant for the task.

Finally, the Implicit Transfer Operator Learning framework is presented. Here conditional Denoising Diffusion Probabilistic Models (DDPMs) are trained using a new data-augmentation scheme where training data in the form of trajectories from Molecular Dynamics (MD) simulations are augmented by sampling different lag-times during training. With this scheme, our models demonstrate the ability to capture dynamics at a range of timescales, providing a crucial step forward in multiple timeresolution MD.
Original languageEnglish
PublisherTechnical University of Denmark
Number of pages158
Publication statusPublished - 2023

Fingerprint

Dive into the research topics of 'Machine Learning for Molecular Science'. Together they form a unique fingerprint.
  • Machine Learning for Molecular Science

    Schreiner, J. M. (PhD Student), Winther, O. (Main Supervisor), Vegge, T. (Supervisor), Boomsma, W. K. (Examiner) & Csanyi, G. (Examiner)

    01/09/202011/03/2024

    Project: PhD

Cite this