Projects per year
Abstract
Chemicals are vital to modern society, but their rapid production and use lead to hazardous emissions, affecting human and ecosystem health. Chemical toxicity characterization is an essential tool to help assess and mitigate these impacts but requires diverse chemical input data that are unavailable for most of the >350,000 globally registered chemicals and mixtures. Machine learning (ML) methods have achieved remarkable predictive performance across scientific fields and offer high potential to fill these data gaps across input parameters and chemicals. However, the systematic uptake of ML methods to address data gaps in chemical toxicity characterization remains limited due to challenges undermining confidence in their predictions. In particular, ML’s limited extrapolative capacity constrains reliable predictions to domains represented within the training data. This obscures which data gaps across various chemicals and input parameters can be effectively addressed by developing ML prediction methods. Further, integrating predictions from different ML models in chemical assessments requires quantifying uncertainty in predictions to account for input data quality variations. However, uncertainty quantification is challenging and not standard in ML practice. Therefore, practical examples of developing and integrating ML-based predictions with quantified uncertainty into chemical toxicity characterization are urgently needed to build trust in prediction-based chemical assessments.
The work presented in this PhD thesis addresses these challenges by focusing on four research objectives: (1) Prioritize input parameters in characterizing human toxicity and ecotoxicity impacts for developing ML-based approaches based on their relevance for obtaining robust characterization results and their suitability for ML. (2) Analyze the chemical space covered by ML-based approaches trained with available measured data for relevant input parameters for characterizing human toxicity and ecotoxicity impacts relative to the global chemical market. (3) Identify and develop suitable ML-based approaches capable of quantifying data- and model-related uncertainty in predictions across diverse chemical structures. (4) Demonstrate the use of uncertain ML-based predictions to gain insights on chemical toxicity for the global chemical market and for designing safer and more sustainable chemical synthesis using illustrative case studies.
Following an introductory chapter, chapter 2 presents a framework for prioritizing input data gaps in chemical toxicity characterization, prioritizing 13 out of its 38 input parameters for ML model development while flagging additional nine parameters with critical data gaps. Chapter 3 offers an assessment of the potential for ML to address the broader realm of >130,000 marketed chemicals for the prioritized parameters, finding that based on 1 to 10% of available data, ML can potentially predict 8 to 46% of marketed chemicals. This predictive potential was highly dependent on the chemical diversity represented in the available input parameter data. These results demonstrated that ML can significantly contribute to filling data gaps in chemical toxicity characterization. However, it left several crucial input parameters and more than 50% of marketed chemicals across prioritized parameters difficult to address. Chapter 2 and 3 highlighted that strategic efforts are needed to increase data availability focusing on data diversity and advanced modeling approaches to leverage alternative data sources and domain knowledge. Chapter 4 presents an approach for developing uncertainty-aware ML models that transparently communicate prediction reliability through fully quantified uncertainty intervals. It demonstrates that both conformal prediction (CP) and Bayesian neural networks (BNN) can provide robust estimates of prediction reliability by providing quantified uncertainty ranges that effectively address various aspects of data- and model-related uncertainty. Developing uncertainty-aware models caused no substantial loss of predictive performance compared to standard ML approaches without uncertainty quantification. Additionally, these models harnessed the full potential of available data, enabling robust predictions across a broader range of chemicals by accurately reflecting differences in prediction reliability. Chapter 5 applies this approach to predict human non-cancer toxicity points of departure (PODs) for >130,000 marketed chemicals, identifying chemical classes with high toxicity and significant prediction uncertainty. These results fill critical data gaps by providing predictions for many marketed chemicals with no prior estimates and guide future research to enhance predictions for chemical classes with high uncertainty, such as metals, inorganics, and macromolecules. Chapter 5 further explores practical challenges in applying digital tools to obtain robust toxicity characterization results through an illustrative case study aiming to identify safer building blocks for enzymatic amide bond synthesis. By propagating (semi)-quantitative input parameter uncertainties, the results revealed significant deviations between probabilistic best estimates and deterministic results, as well as large uncertainties that hindered reliable identification of toxicity impact differences among similar chemicals. This underscored the importance of quantifying uncertainty in input data predictions to obtain robust conclusions in comparative chemical assessments.
This PhD project built the foundation for developing fit-for-purpose digital prediction tools for chemical toxicity characterization, offering a comprehensive view on critical gaps and strategies for addressing them. By prioritizing relevant input parameters and establishing the chemical target domain as a benchmark for the predictive capabilities required from ML-based approaches, the presented approaches guide future efforts for data curation and ML model development that can systematically enhance the availability and robustness of chemical toxicity characterization results. As an essential aspect, this project demonstrated the development of uncertainty-aware ML and its importance for effectively integrating predictions in chemical toxicity characterization to obtain robust conclusions from prediction-based chemical assessments. This approach also significantly improved data availability across globally marketed chemicals, as it allowed providing predictions with quantified uncertainty for highly diverse chemicals. Applying it to fill data gaps for other critical input parameters holds practical relevance for industry and academia, as it would significantly improve the availability and robustness of chemical risk and impact assessments, opening new possibilities for comparing alternatives across marketed chemicals and new chemical designs to create safer and more sustainable products. This PhD project thereby made a substantial contribution to the fields of chemical impact and risk assessment to support effective chemical management in minimizing chemical impacts on humans and ecosystems.
The work presented in this PhD thesis addresses these challenges by focusing on four research objectives: (1) Prioritize input parameters in characterizing human toxicity and ecotoxicity impacts for developing ML-based approaches based on their relevance for obtaining robust characterization results and their suitability for ML. (2) Analyze the chemical space covered by ML-based approaches trained with available measured data for relevant input parameters for characterizing human toxicity and ecotoxicity impacts relative to the global chemical market. (3) Identify and develop suitable ML-based approaches capable of quantifying data- and model-related uncertainty in predictions across diverse chemical structures. (4) Demonstrate the use of uncertain ML-based predictions to gain insights on chemical toxicity for the global chemical market and for designing safer and more sustainable chemical synthesis using illustrative case studies.
Following an introductory chapter, chapter 2 presents a framework for prioritizing input data gaps in chemical toxicity characterization, prioritizing 13 out of its 38 input parameters for ML model development while flagging additional nine parameters with critical data gaps. Chapter 3 offers an assessment of the potential for ML to address the broader realm of >130,000 marketed chemicals for the prioritized parameters, finding that based on 1 to 10% of available data, ML can potentially predict 8 to 46% of marketed chemicals. This predictive potential was highly dependent on the chemical diversity represented in the available input parameter data. These results demonstrated that ML can significantly contribute to filling data gaps in chemical toxicity characterization. However, it left several crucial input parameters and more than 50% of marketed chemicals across prioritized parameters difficult to address. Chapter 2 and 3 highlighted that strategic efforts are needed to increase data availability focusing on data diversity and advanced modeling approaches to leverage alternative data sources and domain knowledge. Chapter 4 presents an approach for developing uncertainty-aware ML models that transparently communicate prediction reliability through fully quantified uncertainty intervals. It demonstrates that both conformal prediction (CP) and Bayesian neural networks (BNN) can provide robust estimates of prediction reliability by providing quantified uncertainty ranges that effectively address various aspects of data- and model-related uncertainty. Developing uncertainty-aware models caused no substantial loss of predictive performance compared to standard ML approaches without uncertainty quantification. Additionally, these models harnessed the full potential of available data, enabling robust predictions across a broader range of chemicals by accurately reflecting differences in prediction reliability. Chapter 5 applies this approach to predict human non-cancer toxicity points of departure (PODs) for >130,000 marketed chemicals, identifying chemical classes with high toxicity and significant prediction uncertainty. These results fill critical data gaps by providing predictions for many marketed chemicals with no prior estimates and guide future research to enhance predictions for chemical classes with high uncertainty, such as metals, inorganics, and macromolecules. Chapter 5 further explores practical challenges in applying digital tools to obtain robust toxicity characterization results through an illustrative case study aiming to identify safer building blocks for enzymatic amide bond synthesis. By propagating (semi)-quantitative input parameter uncertainties, the results revealed significant deviations between probabilistic best estimates and deterministic results, as well as large uncertainties that hindered reliable identification of toxicity impact differences among similar chemicals. This underscored the importance of quantifying uncertainty in input data predictions to obtain robust conclusions in comparative chemical assessments.
This PhD project built the foundation for developing fit-for-purpose digital prediction tools for chemical toxicity characterization, offering a comprehensive view on critical gaps and strategies for addressing them. By prioritizing relevant input parameters and establishing the chemical target domain as a benchmark for the predictive capabilities required from ML-based approaches, the presented approaches guide future efforts for data curation and ML model development that can systematically enhance the availability and robustness of chemical toxicity characterization results. As an essential aspect, this project demonstrated the development of uncertainty-aware ML and its importance for effectively integrating predictions in chemical toxicity characterization to obtain robust conclusions from prediction-based chemical assessments. This approach also significantly improved data availability across globally marketed chemicals, as it allowed providing predictions with quantified uncertainty for highly diverse chemicals. Applying it to fill data gaps for other critical input parameters holds practical relevance for industry and academia, as it would significantly improve the availability and robustness of chemical risk and impact assessments, opening new possibilities for comparing alternatives across marketed chemicals and new chemical designs to create safer and more sustainable products. This PhD project thereby made a substantial contribution to the fields of chemical impact and risk assessment to support effective chemical management in minimizing chemical impacts on humans and ecosystems.
| Original language | English |
|---|
| Place of Publication | Kgs. Lyngby |
|---|---|
| Publisher | Technical University of Denmark |
| Number of pages | 309 |
| Publication status | Published - 2024 |
Fingerprint
Dive into the research topics of 'Advancing life cycle based chemical toxicity characterization through digitalization'. Together they form a unique fingerprint.Projects
- 1 Finished
-
PhD Kerstin von Borries: Advancing life cycle based chemical toxicity characterization through digitalization
Borries, K. V. (PhD Student), Holmquist, H. M. (Supervisor), Kosnik, M. B. (Supervisor), Kristiansson, E. (Examiner), Jolliet, O. (Main Supervisor), Wambaugh, J. (Examiner) & Fantke, P. (Supervisor)
01/08/2021 → 02/12/2024
Project: PhD