Precision and Trust: Algorithms for Substance Identification and Collaborative Learning

Bo Li

Research output: Book/ReportPh.D. thesis

8 Downloads (Pure)

Abstract

Rapidly identifying unknown substances is critical in many fields, such as bacterial detection and explosives quantification. Spectroscopy techniques such as Raman spectroscopy and surface-enhanced Raman spectroscopy (SERS) have emerged as cost-effective and reusable techniques, gathering significant attention. Traditional approaches to analyzing spectra often involve a two-stage pipeline comprising data preprocessing and modelling, demanding extensive domain knowledge and incurring substantial costs. Therefore, developing deep neural networks that can be optimized end-to-end by using raw spectra as the input for identifying substances is critical.

A standard strategy to optimize deep neural networks is to centralize the training data, as the precision of a model can be positively related to the training data size. While being effective, this strategy can compromise user’s privacy and data confidentiality in specific scenarios, such as healthcare applications or explosives detection. Therefore, allowing users to collaboratively learn a deep neural network without privacy leakage is gaining attention. Federated learning is one of the commonly used frameworks for achieving this.

This thesis aims to advance the existing state-of-the-art performance in substance identification and lay the groundwork for ensuring the effective collaboration towards building an trustworthy and automated tool for chemical threats detection.

Specifically, we present a contrastive learning-based deep neural network for identifying Raman spectra and a vision-transformer-based neural network for nitroaromatic explosives’ detection and concentration quantification given SERS maps. Both methods are optimized end-to-end, eliminating the need for preprocessing, and have demonstrated superior performance or are on-par compared to existing state-of-theart approaches at the time of publication. To enhance trustworthy collaboration, we address the unavoidable data heterogeneity challenge in federated learning. We present two frameworks that accelerate the convergence of non-convex federated learning in the context of heterogeneous workers. Additionally, towards developing the federated learning algorithm with a privacy focus, we perform a rigorous analysis of the impact of gradient clipping and noise permutation, which are necessary techniques for guaranteeing privacy, on the convergence of the federated learning algorithm.
Original languageEnglish
PublisherTechnical University of Denmark
Number of pages224
Publication statusPublished - 2024

Fingerprint

Dive into the research topics of 'Precision and Trust: Algorithms for Substance Identification and Collaborative Learning'. Together they form a unique fingerprint.

Cite this