TY - JOUR
T1 - Decomposing Chemical Space
T2 - Applications to the Machine Learning of Atomic Energies
AU - Kjeldal, Frederik I.
AU - Eriksen, Janus J.
N1 - Publisher Copyright:
© 2023 American Chemical Society.
PY - 2023
Y1 - 2023
N2 - We apply a number of atomic decomposition schemes across the standard QM7 data set─a small model set of organic molecules at equilibrium geometry─to inspect the possible emergence of trends among contributions to atomization energies from distinct elements embedded within molecules. Specifically, a recent decomposition scheme of ours based on spatially localized molecular orbitals is compared to alternatives that instead partition molecular energies on account of which nuclei individual atomic orbitals are centered on. We find these partitioning schemes to expose the composition of chemical compound space in very dissimilar ways in terms of the grouping, binning, and heterogeneity of discrete atomic contributions, e.g., those associated with hydrogens bonded to different heavy atoms. Furthermore, unphysical dependencies on the one-electron basis set are found for some, but not all of these schemes. The relevance and importance of these compositional factors for training tailored neural network models based on atomic energies are next assessed. We identify both limitations and possible advantages with respect to contemporary machine learning models and discuss the design of potential counterparts based on atoms and the intrinsic energies of these as the principal decomposition units.
AB - We apply a number of atomic decomposition schemes across the standard QM7 data set─a small model set of organic molecules at equilibrium geometry─to inspect the possible emergence of trends among contributions to atomization energies from distinct elements embedded within molecules. Specifically, a recent decomposition scheme of ours based on spatially localized molecular orbitals is compared to alternatives that instead partition molecular energies on account of which nuclei individual atomic orbitals are centered on. We find these partitioning schemes to expose the composition of chemical compound space in very dissimilar ways in terms of the grouping, binning, and heterogeneity of discrete atomic contributions, e.g., those associated with hydrogens bonded to different heavy atoms. Furthermore, unphysical dependencies on the one-electron basis set are found for some, but not all of these schemes. The relevance and importance of these compositional factors for training tailored neural network models based on atomic energies are next assessed. We identify both limitations and possible advantages with respect to contemporary machine learning models and discuss the design of potential counterparts based on atoms and the intrinsic energies of these as the principal decomposition units.
U2 - 10.1021/acs.jctc.2c01290
DO - 10.1021/acs.jctc.2c01290
M3 - Journal article
C2 - 36926874
AN - SCOPUS:85150435487
SN - 1549-9618
VL - 19
SP - 2029−2038
JO - Journal of Chemical Theory and Computation
JF - Journal of Chemical Theory and Computation
ER -