Projects per year
Abstract
As businesses increasingly rely on machine learning models to make informed decisions, the ability to develop accurate and reliable models is critical. However, in many industrial contexts, data annotation represents a major bottleneck to the training and deployment of predictive models. This thesis focuses on data-efficient strategies for developing machine learning models in label-scarce settings. The increasing availability of unlabeled data in various applications has led to the need for efficient methods that minimize the cost associated with collecting labeled observations. Traditional active learning approaches, such as pool-based methods, have been extensively studied, but the emergence of data streams has necessitated the development of stream-based active learning strategies able to select the most informative observations from data streams in real time.
The thesis begins with a survey of active learning, providing an overview of recently proposed approaches for selecting informative observations from data streams. It presents the strengths and limitations of the state of the art and discusses the challenges and opportunities that arise in this area of research. Next, the thesis presents a novel stream-based active learning strategy for linear models inspired by the optimal experimental design theory. By setting a threshold on the informativeness of unlabeled data points, the proposed strategy enables the learner to decide in real time whether to label an instance or discard it. Then, the thesis investigates the robustness of online active learning in the presence of outliers and irrelevant features. The thesis also provides initial results related to an adaptive sampling scheme for drifting regression data streams.
Finally, the thesis presents a stream-based active distillation framework for developing lightweight yet powerful object detection models. This approach combines active learning and knowledge distillation, allowing a compact student model to be finetuned using pseudo-labels generated by a large pre-trained teacher model.
Overall, this thesis contributes to the field of stream-based active learning by providing insights into various techniques and addressing concerns related to robustness and scalability. The findings expand the potential applications of active learning in real-time data streams and pave the way for more efficient and effective model development.
The thesis begins with a survey of active learning, providing an overview of recently proposed approaches for selecting informative observations from data streams. It presents the strengths and limitations of the state of the art and discusses the challenges and opportunities that arise in this area of research. Next, the thesis presents a novel stream-based active learning strategy for linear models inspired by the optimal experimental design theory. By setting a threshold on the informativeness of unlabeled data points, the proposed strategy enables the learner to decide in real time whether to label an instance or discard it. Then, the thesis investigates the robustness of online active learning in the presence of outliers and irrelevant features. The thesis also provides initial results related to an adaptive sampling scheme for drifting regression data streams.
Finally, the thesis presents a stream-based active distillation framework for developing lightweight yet powerful object detection models. This approach combines active learning and knowledge distillation, allowing a compact student model to be finetuned using pseudo-labels generated by a large pre-trained teacher model.
Overall, this thesis contributes to the field of stream-based active learning by providing insights into various techniques and addressing concerns related to robustness and scalability. The findings expand the potential applications of active learning in real-time data streams and pave the way for more efficient and effective model development.
Original language | English |
---|
Publisher | Technical University of Denmark |
---|---|
Number of pages | 214 |
Publication status | Published - 2023 |
Fingerprint
Dive into the research topics of 'Active Learning for Data Streams'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Data Analytics in Production
Cacciarelli, D. (PhD Student), Külahci, M. (Main Supervisor), Tyssedal, J. (Supervisor), Costantino, F. (Examiner) & Iosifidis, A. (Examiner)
01/11/2020 → 07/05/2024
Project: PhD