Accelerating catalysis simulations using surrogate machine learning models

Research output: Book/ReportPh.D. thesis

166 Downloads (Pure)


Climate change is evident, and it calls for an immediate global transition to a green and sustainable energy structure. However, an effective transition requires the discovery of new materials for solar cells, batteries, catalysts, etc. Artificial intelligence, or machine learning, has proven that it can accelerate the search for new materials significantly. A Gaussian process can be a self-taught machine learning method by applying an active learning approach since the Gaussian process can predict energies and corresponding uncertainty estimations. Thereby, a substantial amount of time is saved on the manual setup
of databases and screenings for new materials.

In this thesis, the robustness of a Gaussian process and how common mistakes are avoided when training the Gaussian process are discussed. A correction to the covariance matrix is derived, which ensures that exception errors are avoided when the Gaussian process is optimized. Furthermore, boundary conditions for the hyperparameters are defined, which makes variable transformations of the hyperparameters possible. The variable transformations make the important regions of the hyperparameter space larger and more probable without restricting the hyperparameters. By applying the variable transformation, a new method is developed that globally optimizes the hyperparameters. The new method locates the global maximum for the hyperparameters in all the test systems with different training set sizes, which is not the case for any other investigated optimizers. Another important advantage of the new method is that the time of the optimization is reduced compared to the other investigated global optimizers. Therefore, a new method has been implemented which makes the Gaussian process robust and reliable.

Different objective functions are tested to investigate if they improve the Gaussian process. The most used objective function, log-likelihood, is confirmed to be the best objective function in terms of the prediction of energies and uncertainties for the chosen test systems. The evaluation was possible due to a newly defined uncertainty measure. The uncertainty predictions from the Gaussian process are improved by modifying the solution obtained from log-likelihood without changing the energy predictions or increasing the computational cost.

The uncertainty predictions are also improved by deriving a new process called a Student’s t process. The new process has the same energy predictions as the Gaussian process, but it has one hyperparameter less, which is removed with a Bayesian approach. The fully Bayesian solution to the predictions of the energies and uncertainties is approximated by applying the Kullback-Leibler divergence. This is a substantial improvement to the uncertainty predictions. The approximated solution does not require retraining of the Gaussian process to predict a new point, which is normally required for a fully Bayesian solution.

A developed structure optimization method for finding the most stable adsorption structure for any surface is presented. The optimization method finds the most stable adsorption structures for all tested systems. Furthermore, the quantum calculations are significantly reduced by a factor of 40. This reduction is expected to be even larger for more complex surfaces. The new robust Student’s t process is implemented into a new version of the machine learning accelerated Nudged Elastic Band method, which is essential for finding activation energies. A reduction factor of 200 compared to the required quantum mechanical calculations for the Nudge Elastic Band method is obtained. Therefore, it is expected that the developed and robust methods can be powerful tools in automated material discovery.
Original languageEnglish
Place of PublicationKgs. Lyngby
PublisherTechnical University of Denmark
Number of pages120
Publication statusPublished - 2023


Dive into the research topics of 'Accelerating catalysis simulations using surrogate machine learning models'. Together they form a unique fingerprint.

Cite this