Abstract
The proliferation of automated data collection schemes and the advances in
sensorics are increasing the amount of data we are able to monitor in
real-time. However, given the high annotation costs and the time required by
quality inspections, data is often available in an unlabeled form. This is
fostering the use of active learning for the development of soft sensors and
predictive models. In production, instead of performing random inspections to
obtain product information, labels are collected by evaluating the information
content of the unlabeled data. Several query strategy frameworks for regression
have been proposed in the literature but most of the focus has been dedicated
to the static pool-based scenario. In this work, we propose a new strategy for
the stream-based scenario, where instances are sequentially offered to the
learner, which must instantaneously decide whether to perform the quality check
to obtain the label or discard the instance. The approach is inspired by the
optimal experimental design theory and the iterative aspect of the
decision-making process is tackled by setting a threshold on the
informativeness of the unlabeled data points. The proposed approach is
evaluated using numerical simulations and the Tennessee Eastman Process
simulator. The results confirm that selecting the examples suggested by the
proposed algorithm allows for a faster reduction in the prediction error.
Original language | English |
---|---|
Article number | 109664 |
Journal | Knowledge-Based Systems |
Volume | 254 |
Number of pages | 14 |
ISSN | 0950-7051 |
DOIs | |
Publication status | Published - 2022 |
Keywords
- Active learning
- Unlabeled data
- Optimal experimental design
- Linear regression
- Soft sensor