Zero-shot Cross-lingual Speech Emotion Recognition: A Study of Loss Functions and Feature Importance

Sneha Das, Nicole Nadine Lonfeldt, Nicklas Leander Lund, Anne Katrine Pagsberg, Line Katrine Harder Clemmensen

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

Deep learning has led to the rapid advancement of speech emotion recognition (SER) hence enabling its application and deployment in wide ranging applications and sectors. However, conventional challenges like generalizing over unseen corpora and languages, and newer challenges like the lack of interpretability and transparency of deep learning models impact the security of these methods, thereby negatively influencing their usability and acceptability in real-world applications. Here, we address this gap by investigating the influence of the formulation and design of the learning function on the ability to transfer emotion representation learned in one language to other languages. Furthermore, we examine the importance of the different feature groups for the emotion classes, and the associations between the feature groups and the learning functions. From the evaluation, we conclude that the dimensional model of emotion, specifically activation is more transferable than emotion classes over unseen languages than valence. However, this transferability does not necessarily translate to higher classification accuracy.
Original languageEnglish
Title of host publicationProceedings of 2nd Symposium on Security and Privacy in Speech Communication
Number of pages7
Publication date2022
DOIs
Publication statusPublished - 2022
Event2nd Symposium on Security and Privacy in Speech Communication - Incheon National University, Incheon, Korea, Republic of
Duration: 23 Sep 202224 Sep 2022

Conference

Conference2nd Symposium on Security and Privacy in Speech Communication
LocationIncheon National University
Country/TerritoryKorea, Republic of
CityIncheon
Period23/09/202224/09/2022

Fingerprint

Dive into the research topics of 'Zero-shot Cross-lingual Speech Emotion Recognition: A Study of Loss Functions and Feature Importance'. Together they form a unique fingerprint.

Cite this