Training sets based on uncertainty estimates in the cluster-expansion method

David Kleiven*, Jaakko Akola, Andrew Alvin Peterson, Tejs Vegge, Jinhyun Chang*

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

9 Downloads (Pure)


Cluster expansion (CE) has gained an increasing level of popularity in recent years, and its applications go far beyond its original root in binary alloys, reaching even complex crystalline systems often used in energy materials research. Similar to other modern machine learning approaches in materials science, many strategies have been proposed for training and fitting the CE models to first-principles calculation results. Here, we propose a new strategy for constructing a training set based on their relevance in Monte Carlo sampling for statistical analysis and reduction of the expected error. The CE model constructed from the proposed approach has lower dependence on the specific details of the training set, thereby increasing the reproducibility of the model. The same method can be applied to other machine learning approaches where it is desirable to sample relevant configurational space with a small set of training data, which is often the case when they consist of first-principles calculations.

Original languageEnglish
Article number034012
JournalJPhys Energy
Issue number3
Publication statusPublished - 2021

Bibliographical note

Funding Information:
The authors acknowledge support from the European Union’s Horizon 2020 research and innovation programme under Grant Agreement No. 957189.

Publisher Copyright:
© 2021 The Author(s). Published by IOP Publishing Ltd.


  • Bootstrapping
  • Cluster expansion
  • Energy materials
  • Machine learning
  • Monte Carlo
  • Phase transition


Dive into the research topics of 'Training sets based on uncertainty estimates in the cluster-expansion method'. Together they form a unique fingerprint.

Cite this