Potential for Machine Learning to Address Data Gaps in Human Toxicity and Ecotoxicity Characterization

Kerstin von Borries*, Hanna Holmquist, Marissa Kosnik, Katie V. Beckwith, Olivier Jolliet, Jonathan M. Goodman, Peter Fantke*

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

25 Downloads (Pure)


Machine Learning (ML) is increasingly applied to fill data gaps in assessments to quantify impacts associated with chemical emissions and chemicals in products. However, the systematic application of ML-based approaches to fill chemical data gaps is still limited, and their potential for addressing a wide range of chemicals is unknown. We prioritized chemical-related parameters for chemical toxicity characterization to inform ML model development based on two criteria: (1) each parameter's relevance to robustly characterize chemical toxicity described by the uncertainty in characterization results attributable to each parameter and (2) the potential for ML-based approaches to predict parameter values for a wide range of chemicals described by the availability of chemicals with measured parameter data. We prioritized 13 out of 38 parameters for developing ML-based approaches, while flagging another nine with critical data gaps. For all prioritized parameters, we performed a chemical space analysis to assess further the potential for ML-based approaches to predict data for diverse chemicals considering the structural diversity of available measured data, showing that ML-based approaches can potentially predict 8-46% of marketed chemicals based on 1-10% with available measured data. Our results can systematically inform future ML model development efforts to address data gaps in chemical toxicity characterization.
Original languageEnglish
JournalEnvironmental Science and Technology
Issue number46
Pages (from-to)18259-18270
Number of pages12
Publication statusPublished - 2023


  • Prioritization
  • Uncertainty
  • Chemical space
  • Chemical properties
  • Life cycle impact assessment
  • Chemical substitution
  • Risk screening
  • Safe and sustainable by design


Dive into the research topics of 'Potential for Machine Learning to Address Data Gaps in Human Toxicity and Ecotoxicity Characterization'. Together they form a unique fingerprint.

Cite this