Continuous Metric Learning For Transferable Speech Emotion Recognition and Embedding Across Low-resource Languages

Sneha Das, Nicklas Leander Lund, Nicole Nadine Lønfeldt, Anne Katrine Pagsberg, Line Katrine Harder Clemmensen

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

83 Downloads (Pure)

Abstract

Speech emotion recognition (SER) refers to the technique of inferring the emotional state of an individual from speech signals. SERs continue to garner interest due to their wide applicability. While the domain is mainly founded on signal processing, machine learning and deep learning methods, generalizing over languages continues to remain a challenge. To improve performance over languages, in this paper we propose a denoising autoencoder with semi-supervision using a continuous metric loss. The novelty of this work lies in our proposal for continuous metric learning, which is among the first proposals on the topic to the best of our knowledge. Furthermore, we contribute labels corresponding to the dimensional model, that were used to evaluate the quality of embedding (the labels will be made available by the time of the publication). We show that the proposed method consistently outperforms the baseline method in terms of the classification accuracy and correlation with respect to the dimensional variables.

Original languageEnglish
Title of host publicationProceedings of the Northern Lights Deep Learning Workshop 2022
Number of pages10
Volume3
Publication date2022
DOIs
Publication statusPublished - 2022
EventNorthern Lights Deep Learning Workshop 2022 - Tromsø, Norway
Duration: 10 Jan 202212 Jan 2022

Conference

ConferenceNorthern Lights Deep Learning Workshop 2022
Country/TerritoryNorway
CityTromsø
Period10/01/202212/01/2022

Keywords

  • Speech emotion recognition
  • Transferability
  • Continuous metric learning
  • Dimensional emotion model
  • Low-resource machine learning

Fingerprint

Dive into the research topics of 'Continuous Metric Learning For Transferable Speech Emotion Recognition and Embedding Across Low-resource Languages'. Together they form a unique fingerprint.

Cite this