Automatic music genre classification is the classification of a piece of music into its corresponding genre (such as jazz or rock) by a computer. It is considered to be a cornerstone of the research area Music Information Retrieval (MIR) and closely linked to the other areas in MIR. It is thought that MIR will be a key element in the processing, searching and retrieval of digital music in the near future.
This dissertation is concerned with music genre classification systems and in particular systems which use the raw audio signal as input to estimate the corresponding genre. This is in contrast to systems which use e.g. a symbolic representation or textual information about the music. The approach to music genre classification systems has here been system-oriented. In other words, all the different aspects of the systems have been considered and it is emphasized that the systems should be applicable to ordinary real-world music collections.
The considered music genre classification systems can basically be seen as a feature representation of the song followed by a classification system which predicts the genre. The feature representation is here split into a Short-time feature extraction part followed by Temporal feature integration which combines the (multivariate) time-series of short-time feature vectors into feature vectors on a larger time scale.
Several different short-time features with 10-40 ms frame sizes have been examined and ranked according to their significance in music genre classification. A Consensus sensitivity analysis method was proposed for feature ranking. This method has the advantage of being able to combine the sensitivities over several resamplings into a single ranking.
The main efforts have been in temporal feature integration. Two general frameworks have been proposed; the Dynamic Principal Component Analysis model as well as the Multivariate Autoregressive Model for temporal feature integration. Especially the Multivariate Autoregressive Model was found to be successful and outperformed a selection of state-of-the-art temporal feature integration methods. For instance, an accuracy of 48% was achieved in comparison to 57% for the human performance on an 11-genre problem.
A selection of classifiers were examined and compared. We introduced Cooccurrence models for music genre classification. These models include the whole song within a probabilistic framework which is often an advantage compared to many traditional classifiers which only model the individual feature vectors in a song.