Abstract
Speech emotion recognition (SER) refers to the technique of inferring the emotional state of an individual from speech signals. SERs continue to garner interest due to their wide applicability. While the domain is mainly founded on signal processing, machine learning and deep learning methods, generalizing over languages continues to remain a challenge. To improve performance over languages, in this paper we propose a denoising autoencoder with semi-supervision using a continuous metric loss. The novelty of this work lies in our proposal for continuous metric learning, which is among the first proposals on the topic to the best of our knowledge. Furthermore, we contribute labels corresponding to the dimensional model, that were used to evaluate the quality of embedding (the labels will be made available by the time of the publication). We show that the proposed method consistently outperforms the baseline method in terms of the classification accuracy and correlation with respect to the dimensional variables.
Original language | English |
---|---|
Title of host publication | Proceedings of the Northern Lights Deep Learning Workshop 2022 |
Number of pages | 10 |
Volume | 3 |
Publication date | 2022 |
DOIs | |
Publication status | Published - 2022 |
Event | Northern Lights Deep Learning Workshop 2022 - Tromsø, Norway Duration: 10 Jan 2022 → 12 Jan 2022 |
Conference
Conference | Northern Lights Deep Learning Workshop 2022 |
---|---|
Country/Territory | Norway |
City | Tromsø |
Period | 10/01/2022 → 12/01/2022 |
Keywords
- Speech emotion recognition
- Transferability
- Continuous metric learning
- Dimensional emotion model
- Low-resource machine learning