Combining Semantic and Acoustic Features for Valence and Arousal Recognition in Speech

Publication: Research - peer-reviewArticle in proceedings – Annual report year: 2012

Standard

Combining Semantic and Acoustic Features for Valence and Arousal Recognition in Speech. / Karadogan, Seliz; Larsen, Jan.

2012 3rd International Workshop on Cognitive Information Processing (CIP). IEEE, 2012.

Publication: Research - peer-reviewArticle in proceedings – Annual report year: 2012

Harvard

Karadogan, S & Larsen, J 2012, 'Combining Semantic and Acoustic Features for Valence and Arousal Recognition in Speech'. in 2012 3rd International Workshop on Cognitive Information Processing (CIP). IEEE., 10.1109/CIP.2012.6232924

APA

Karadogan, S., & Larsen, J. (2012). Combining Semantic and Acoustic Features for Valence and Arousal Recognition in Speech. In 2012 3rd International Workshop on Cognitive Information Processing (CIP). IEEE. 10.1109/CIP.2012.6232924

CBE

Karadogan S, Larsen J. 2012. Combining Semantic and Acoustic Features for Valence and Arousal Recognition in Speech. In 2012 3rd International Workshop on Cognitive Information Processing (CIP). IEEE. Available from: 10.1109/CIP.2012.6232924

MLA

Karadogan, Seliz and Jan Larsen "Combining Semantic and Acoustic Features for Valence and Arousal Recognition in Speech". 2012 3rd International Workshop on Cognitive Information Processing (CIP). IEEE. 2012. Available: 10.1109/CIP.2012.6232924

Vancouver

Karadogan S, Larsen J. Combining Semantic and Acoustic Features for Valence and Arousal Recognition in Speech. In 2012 3rd International Workshop on Cognitive Information Processing (CIP). IEEE. 2012. Available from: 10.1109/CIP.2012.6232924

Author

Karadogan, Seliz; Larsen, Jan / Combining Semantic and Acoustic Features for Valence and Arousal Recognition in Speech.

2012 3rd International Workshop on Cognitive Information Processing (CIP). IEEE, 2012.

Publication: Research - peer-reviewArticle in proceedings – Annual report year: 2012

Bibtex

@inbook{c46236d708af49a2a6f14b9d127db316,
title = "Combining Semantic and Acoustic Features for Valence and Arousal Recognition in Speech",
publisher = "IEEE",
author = "Seliz Karadogan and Jan Larsen",
year = "2012",
doi = "10.1109/CIP.2012.6232924",
isbn = "978-1-4673-1877-8",
booktitle = "2012 3rd International Workshop on Cognitive Information Processing (CIP)",

}

RIS

TY - GEN

T1 - Combining Semantic and Acoustic Features for Valence and Arousal Recognition in Speech

A1 - Karadogan,Seliz

A1 - Larsen,Jan

AU - Karadogan,Seliz

AU - Larsen,Jan

PB - IEEE

PY - 2012

Y1 - 2012

N2 - The recognition of affect in speech has attracted a lot of interest recently; especially in the area of cognitive and computer sciences. Most of the previous studies focused on the recognition of basic emotions (such as happiness, sadness and anger) using categorical approach. Recently, the focus has been shifting towards dimensional affect recognition based on the idea that emotional states are not independent from one another but related in a systematic manner. In this paper, we design a continuous dimensional speech affect recognition model that combines acoustic and semantic features. We design our own corpus that consists of 59 short movie clips with audio and text in subtitle format, rated by human subjects in arousal and valence (A-V) dimensions. For the acoustic part, we combine many features and use correlation based feature selection and apply support vector regression. For the semantic part, we use the affective norms for English words (ANEW), that are rated also in A-V dimensions, as keywords and apply latent semantics analysis (LSA) on those words and words in the clips to estimate A-V values in the clips. Finally, the results of acoustic and semantic parts are combined. We show that combining semantic and acoustic information for dimensional speech recognition improves the results. Moreover, we show that valence is better estimated using semantic features while arousal is better estimated using acoustic features.

AB - The recognition of affect in speech has attracted a lot of interest recently; especially in the area of cognitive and computer sciences. Most of the previous studies focused on the recognition of basic emotions (such as happiness, sadness and anger) using categorical approach. Recently, the focus has been shifting towards dimensional affect recognition based on the idea that emotional states are not independent from one another but related in a systematic manner. In this paper, we design a continuous dimensional speech affect recognition model that combines acoustic and semantic features. We design our own corpus that consists of 59 short movie clips with audio and text in subtitle format, rated by human subjects in arousal and valence (A-V) dimensions. For the acoustic part, we combine many features and use correlation based feature selection and apply support vector regression. For the semantic part, we use the affective norms for English words (ANEW), that are rated also in A-V dimensions, as keywords and apply latent semantics analysis (LSA) on those words and words in the clips to estimate A-V values in the clips. Finally, the results of acoustic and semantic parts are combined. We show that combining semantic and acoustic information for dimensional speech recognition improves the results. Moreover, we show that valence is better estimated using semantic features while arousal is better estimated using acoustic features.

U2 - 10.1109/CIP.2012.6232924

DO - 10.1109/CIP.2012.6232924

SN - 978-1-4673-1877-8

BT - 2012 3rd International Workshop on Cognitive Information Processing (CIP)

T2 - 2012 3rd International Workshop on Cognitive Information Processing (CIP)

ER -