Abstract
This paper proposes a scalable technique for developing lightweight yet powerful models for object detection in videos using self-training with knowledge distillation. This approach involves training a compact student model using pseudo-labels generated by a computationally complex but generic teacher model, which can help to reduce the need for massive amounts of data and computational power. However, model-based annotations in large-scale applications may propagate errors or biases. To address these issues, our paper introduces Stream-Based Active Distillation (SBAD) to endow pre-trained students with effective and efficient fine-tuning methods that are robust to teacher imperfections. The proposed pipeline: (i) adapts a pre-trained student model to a specific use case, based on a set of frames whose pseudo-labels are predicted by the teacher, and (ii) selects on-the-fly, along a streamed video, the images that should be considered to fine-tune the student model. Various selection strategies are compared, demonstrating: 1) the effectiveness of implementing distillation with pseudo-labels, and 2) the importance of selecting images for which the pre-trained student detects with a high confidence.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) |
Publisher | IEEE |
Publication date | 2023 |
Pages | 4999-5007 |
ISBN (Print) | 979-8-3503-0250-9 |
ISBN (Electronic) | 979-8-3503-0249-3 |
DOIs | |
Publication status | Published - 2023 |
Event | 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops - Vancouver, Canada Duration: 17 Jun 2023 → 24 Jun 2023 |
Conference
Conference | 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops |
---|---|
Country/Territory | Canada |
City | Vancouver |
Period | 17/06/2023 → 24/06/2023 |