Are deep neural networks really learning relevant features?

Corey Mose Kereliuk, Jan Larsen, Bob L. Sturm

Research output: Contribution to conferencePaperResearch

Abstract

In recent years deep neural networks (DNNs) have become a popular choice for audio content analysis. This may be attributed to various factors including advancements in training algorithms, computational power, and the potential for DNNs to implicitly learn a set of feature detectors. We have recently re-examined two works that consider DNNs for the task of music genre recognition (MGR). These papers conclude that frame-level features learned by DNNs offer an improvement over traditional, hand-crafted features such as Mel-frequency cepstrum coefficients (MFCCs). However, these conclusions were drawn based on training/testing using the GTZAN dataset, which is now known to contain several flaws including replicated observations and artists. We illustrate how considering these flaws dramatically changes the results, which leads one to question the degree to which the learned frame-level features are actually useful for MGR. We make available a reproducible software package allowing other researchers to completely duplicate our figures and results.
Original languageEnglish
Publication date2014
Publication statusPublished - 2014
EventDMRN+9: Digital Music Research Network One-day Workshop 2014 - London, United Kingdom
Duration: 16 Dec 201416 Dec 2014

Workshop

WorkshopDMRN+9: Digital Music Research Network One-day Workshop 2014
Country/TerritoryUnited Kingdom
CityLondon
Period16/12/201416/12/2014

Fingerprint

Dive into the research topics of 'Are deep neural networks really learning relevant features?'. Together they form a unique fingerprint.

Cite this