Music consists of several structures and patterns evolving through time which greatly influences the human decoding of higher-level cognitive aspects of music like the emotions expressed in music. For tasks, such as genre, tag and emotion recognition, these structures have often been identified and used as individual and non-temporal features and representations. In this work, we address the hypothesis whether using multiple temporal and non-temporal representations of different features is beneficial for modeling music structure with the aim to predict the emotions expressed in music. We test this hypothesis by representing temporal and non-temporal structures using generative models of multiple audio features. The representations are used in a discriminative setting via the Product Probability Kernel and the Gaussian Process model enabling Multiple Kernel Learning, finding optimized combinations of both features and temporal/ non-temporal representations. We show the increased predictive performance using the combination of different features and representations along with the great interpretive prospects of this approach.
|Title of host publication||Proceedings of the 1st International Workshop on Affect & Sentiment in Multimedia (ASM '15)|
|Publisher||Association for Computing Machinery|
|Publication status||Published - 2015|
|Event||1st International Workshop on Affect and Sentiment in Multimedia (ASM '15) - Brisbane, Australia|
Duration: 26 Oct 2015 → 30 Oct 2015
Conference number: 1
|Workshop||1st International Workshop on Affect and Sentiment in Multimedia (ASM '15)|
|Period||26/10/2015 → 30/10/2015|
- Music emotion prediction
- Expressed emotions
- Pairwise comparisons
- Multiple kernel learning
- Gaussian process
Madsen, J., Jensen, B. S., & Larsen, J. (2015). Learning Combinations of Multiple Feature Representations for Music Emotion Prediction. In Proceedings of the 1st International Workshop on Affect & Sentiment in Multimedia (ASM '15) (pp. 3-8). Association for Computing Machinery.