MultiXC-QM9: Large dataset of molecular and reaction energies from multi-level quantum chemical methods

Surajit Nandi, Tejs Vegge, Arghya Bhowmik*

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

74 Downloads (Orbit)

Abstract

Well curated extensive datasets have helped spur intense molecular machine learning (ML) method development activities over the last few years, encouraging nonchemists to be part of the effort as well. QM9 dataset is one of the benchmark databases for small molecules with molecular energies based on B3LYP functional. G4MP2 based energies of these molecules were published later. To enable a wide variety of ML tasks like transfer learning, delta learning, multitask learning, etc. with QM9 molecules, in this article, we introduce a new dataset with QM9 molecule energies estimated with 76 different DFT functionals and three different basis sets (228 energy numbers for each molecule). We additionally enumerated all possible A ↔ B monomolecular interconversions within the QM9 dataset and provided the reaction energies based on these 76 functionals, and basis sets. Lastly, we also provide the bond changes for all the 162 million reactions with the dataset to enable structure- and bond-based reaction energy prediction tools based on ML.
Original languageEnglish
Article number783
JournalScientific Data
Volume10
Issue number1
Number of pages6
ISSN2052-4463
DOIs
Publication statusPublished - 2023

Fingerprint

Dive into the research topics of 'MultiXC-QM9: Large dataset of molecular and reaction energies from multi-level quantum chemical methods'. Together they form a unique fingerprint.

Cite this