Abstract
In the era of big data, companies are increasingly driven to amass vast amounts of data, particularly in process industries where advanced sensor technologies are prevalent. However, obtaining accurate labels or product information through quality inspections can be prohibitively expensive. Active learning emerges as a promising approach to optimize data sampling by prioritizing the most informative data points. Nevertheless, active learning strategies heavily rely on predictive models that are iteratively updated. Aligning with the principles of data-centric AI, this study highlights the detrimental effects of passively incorporating all available process variables into a predictive model for guiding data collection. Specifically, in real-time sampling strategies based on online active learning, the inclusion of irrelevant features significantly hampers the efficiency of the learning process.
Original language | English |
---|---|
Title of host publication | Proceedings of 5th International Conference on Transdisciplinary AI (TransAI) |
Publisher | IEEE |
Publication date | 2023 |
Pages | 243-248 |
ISBN (Print) | 979-8-3503-5802-5 |
DOIs | |
Publication status | Published - 2023 |
Event | 2023 Fifth International Conference on Transdisciplinary AI - Hills Hotel, Laguna Hills, United States Duration: 25 Sept 2023 → 27 Sept 2023 |
Conference
Conference | 2023 Fifth International Conference on Transdisciplinary AI |
---|---|
Location | Hills Hotel |
Country/Territory | United States |
City | Laguna Hills |
Period | 25/09/2023 → 27/09/2023 |
Keywords
- Data-centric AI
- Active learning
- Unlabeled data
- Data streams
- Feature selection
- Design of experiments