Abstract
Deep learning has led to the rapid advancement of speech emotion recognition (SER) hence enabling its application and deployment in wide ranging applications and sectors. However, conventional challenges like generalizing over unseen corpora and languages, and newer challenges like the lack of interpretability and transparency of deep learning models impact the security of these methods, thereby negatively influencing their usability and acceptability in real-world applications. Here, we address this gap by investigating the influence of the formulation and design of the learning function on the ability to transfer emotion representation learned in one language to other languages. Furthermore, we examine the importance of the different feature groups for the emotion classes, and the associations between the feature groups and the learning functions. From the evaluation, we conclude that the dimensional model of emotion, specifically activation is more transferable than emotion classes over unseen languages than valence. However, this transferability does not necessarily translate to higher classification accuracy.
Original language | English |
---|---|
Title of host publication | Proceedings of 2nd Symposium on Security and Privacy in Speech Communication |
Number of pages | 7 |
Publication date | 2022 |
DOIs | |
Publication status | Published - 2022 |
Event | 2nd Symposium on Security and Privacy in Speech Communication - Incheon National University, Incheon, Korea, Republic of Duration: 23 Sep 2022 → 24 Sep 2022 |
Conference
Conference | 2nd Symposium on Security and Privacy in Speech Communication |
---|---|
Location | Incheon National University |
Country/Territory | Korea, Republic of |
City | Incheon |
Period | 23/09/2022 → 24/09/2022 |