Abstract
The increased availability and maturity of head-mounted and wearable devices opens up opportunities for remote communication and collaboration. However, the signal streams provided by these devices (e.g., head pose, hand pose, and gaze direction) do not represent a whole person. One of the main open problems is therefore how to leverage these signals to build faithful representations of the user. In this paper, we propose a method based on variational autoencoders to generate articulated poses of a human skeleton based on noisy streams of head and hand pose. Our approach relies on a model of pose likelihood that is novel and theoretically well-grounded. We demonstrate on publicly available datasets that our method is effective even from very impoverished signals and investigate how pose prediction can be made more accurate and realistic.
Original language | English |
---|---|
Title of host publication | Proceedings of 2021 International Conference on Computer Vision |
Publisher | IEEE |
Publication date | 2021 |
Pages | 11687-11697 |
Publication status | Published - 2021 |
Event | 2021 International Conference on Computer Vision - Virtual event Duration: 11 Oct 2021 → 17 Oct 2021 https://iccv2021.thecvf.com/ |
Conference
Conference | 2021 International Conference on Computer Vision |
---|---|
Location | Virtual event |
Period | 11/10/2021 → 17/10/2021 |
Internet address |