Abstract
In recent years deep neural networks (DNNs) have become a popular choice for audio content analysis. This may be attributed to various factors including advancements in training algorithms, computational power, and the potential for DNNs to implicitly learn a set of feature detectors. We have recently re-examined two works that consider DNNs for the task of music genre recognition (MGR). These papers conclude that frame-level features learned by DNNs offer an improvement over traditional, hand-crafted features such as Mel-frequency cepstrum coefficients (MFCCs). However, these conclusions were drawn based on training/testing using the GTZAN dataset, which is now known to contain several flaws including replicated observations and artists. We illustrate how considering these flaws dramatically changes the results, which leads one to question the degree to which the learned frame-level features are actually useful for MGR. We make available a reproducible software package allowing other researchers to completely duplicate our figures and results.
Original language | English |
---|---|
Publication date | 2014 |
Publication status | Published - 2014 |
Event | DMRN+9: Digital Music Research Network One-day Workshop 2014 - London, United Kingdom Duration: 16 Dec 2014 → 16 Dec 2014 |
Workshop
Workshop | DMRN+9: Digital Music Research Network One-day Workshop 2014 |
---|---|
Country/Territory | United Kingdom |
City | London |
Period | 16/12/2014 → 16/12/2014 |