Learning Combinations of Multiple Feature Representations for Music Emotion Prediction

Jens Madsen, Bjørn Sand Jensen, Jan Larsen

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review


Music consists of several structures and patterns evolving through time which greatly influences the human decoding of higher-level cognitive aspects of music like the emotions expressed in music. For tasks, such as genre, tag and emotion recognition, these structures have often been identified and used as individual and non-temporal features and representations. In this work, we address the hypothesis whether using multiple temporal and non-temporal representations of different features is beneficial for modeling music structure with the aim to predict the emotions expressed in music. We test this hypothesis by representing temporal and non-temporal structures using generative models of multiple audio features. The representations are used in a discriminative setting via the Product Probability Kernel and the Gaussian Process model enabling Multiple Kernel Learning, finding optimized combinations of both features and temporal/ non-temporal representations. We show the increased predictive performance using the combination of different features and representations along with the great interpretive prospects of this approach.
Original languageEnglish
Title of host publicationProceedings of the 1st International Workshop on Affect & Sentiment in Multimedia (ASM '15)
PublisherAssociation for Computing Machinery
Publication date2015
ISBN (Print)978-1-4503-3750-2
Publication statusPublished - 2015
Event1st International Workshop on Affect and Sentiment in Multimedia (ASM '15) - Brisbane, Australia
Duration: 26 Oct 201530 Oct 2015
Conference number: 1


Workshop1st International Workshop on Affect and Sentiment in Multimedia (ASM '15)
Internet address


  • Music emotion prediction
  • Expressed emotions
  • Pairwise comparisons
  • Multiple kernel learning
  • Gaussian process

Fingerprint Dive into the research topics of 'Learning Combinations of Multiple Feature Representations for Music Emotion Prediction'. Together they form a unique fingerprint.

Cite this