Generation of Computational Data Sets for Machine Learning Applied to Battery Materials

Research output: Chapter in Book/Report/Conference proceedingBook chapterResearchpeer-review


Understanding and improving the performance of a battery requires deep knowledge of a plethora of phenomena that occur on a wide range of length and time scales. For predictive modelling of batteries, the length scales span from the atomistic level, where quantities such as the open circuit voltage (OCV) can be determined, to the device level, which is employed in battery management systems. The modelling time scales start at the pico to nanosecond scale, to capture phenomena such as electron transfer and ionic diffusion, and reaches the year-scale in order to describe battery aging. Similarly, the length scales span from angstrom to meters. Obviously, there is no a single modelling framework that can cover all these scales simultaneously, but specific simulation tools have been developed to treat each of them independently. The biggest challenge to build a reliable multiscale approach to model batteries is to feed the larger scale levels, both length and time, with appropriate parameters. Ideally, these parameters for mesoscopic and macroscopic simulations would be derive from the output of microscopic scale calculations. In this regard, Machine Learning algorithms (MLAs) and related data-driven approaches are showing great promise to accelerate the coupling between the different simulation scales.
Machine learning is a subfield of AI, where an algorithm learns from examples to establish a functional mapping from input to output, and improves the mapping upon training. MLAs therefore rely on a ‘training set’ of systems from which the algorithm learns. Each computation in the training set is described by a ‘vector descriptor’ which contains a unique and meaningful way to describe the computed material. The training set also contains a number of target properties for each material. If the size of the training set is large enough, the MLA can figure out how the vector descriptors and the targeted properties are correlated. In order to obtain reliable outputs from the MLA, the data in the training set must be reliable. On the other hand, producing data to train the MLAs can be time-consuming. Thus, the method to produce the data in the training set must also be affordable. The delicate balance between reliability and affordability depends on the targeted property. In some cases, it is better to have a vast amount of data with moderate fidelity, while in other situations it is more convenient to use a limited amount of high-fidelity data (here fidelity is understood as the degree to which a simulation reproduces the state and evolution of a set of given properties of a physically real entity). In this chapter, we present examples of the two situations in the context of microscopic modelling of batteries. First, we show how to produce a large set of data with moderate fidelity by means of a computational workflow. That workflow, based on Density Functional Theory (DFT) simulations, is able to predict open circuit voltages (OCV) and diffusivities of electrode materials that can be later usedas input parameters in macroscopic models based on Finite Difference Elements (FDE). Secondly, we present how to produce high-fidelity data on the formation of solid electrolyte interphase (SEIs) by means of ab initio molecular dynamics. We conclude by illustrating how these data sets are employed to train the MLAs.
This chapter is arranged as follows. Section 11.2 presents the global structure of the workflow which creates moderate-fidelity data and describes the workflow to produce a large set of moderate-fidelity data on OCVs, mechanical stability, and cation diffusivity in intercalation electrodes. This workflow relies on several novel computational techniqueswhich contribute to accelerate the data production and enhance its reliability. In subsection 11.2.1 we show how diffusivity is calculated within the workflow explaining how reflective symmetry can be exploited to boost Nudged Elastic Band (NEB) calculations ( and discussing the importance of the choice of the right exchange-correlation functionals ( Subsection 11.2.2 deals with the modelling of disorder in battery electrode materials, which is also part of the workflow. Section 11.3 shows one example of computational production of high-fidelity data, namely the use of ab initio molecular dynamics to understand the reduction reactions that bring to the first stages in the formation of SEIs. We conclude with a section 11.4 MLAs explaining how they can help to predict synthesizability and structure of battery materials and the evolution of interfaces based on high- and moderate-fidelity computational data.
Original languageEnglish
Title of host publicationAtomic-Scale Modelling of Electrochemical Systems
EditorsMarko M. Melander, Tomi T. Laurila, Kari Laasonen
Number of pages26
Publication date2022
ISBN (Print)9781119605614
ISBN (Electronic)9781119605652
Publication statusPublished - 2022


Dive into the research topics of 'Generation of Computational Data Sets for Machine Learning Applied to Battery Materials'. Together they form a unique fingerprint.

Cite this