Exploiting deep neural networks and head movements for binaural localisation of multiple speakers in reverberant conditions

Ning Ma, Guy J. Brown, Tobias May

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    442 Downloads (Orbit)

    Abstract

    This paper presents a novel machine-hearing system that exploits deep neural networks (DNNs) and head movements for binaural localisation of multiple speakers in reverberant conditions. DNNs are used to map binaural features, consisting of the complete crosscorrelation function (CCF) and interaural level differences (ILDs), to the source azimuth. Our approach was evaluated using a localisation task in which sources were located in a full 360-degree azimuth range. As a result, front-back confusions often occurred due to the similarity of binaural features in the front and rear hemifields. To address this, a head movement strategy was incorporated in the DNN-based model to help reduce the front-back errors. Our experiments show that, compared to a system based on a Gaussian mixture model (GMM) classifier, the proposed DNN system substantially reduces localisation errors under challenging acoustic scenarios in which multiple speakers and room reverberation are present.
    Original languageEnglish
    Title of host publicationProceedings of Interspeech 2015
    Number of pages5
    PublisherISCA
    Publication date2015
    Pages3302-3306
    Publication statusPublished - 2015
    EventINTERSPEECH 2015 : Speech beyond Speech - Dresden, Germany
    Duration: 6 Sept 201510 Sept 2015

    Conference

    ConferenceINTERSPEECH 2015
    Country/TerritoryGermany
    CityDresden
    Period06/09/201510/09/2015

    Keywords

    • Binaural source localisation
    • Deep neural networks
    • Head movements
    • Machine hearing
    • Reverberation

    Fingerprint

    Dive into the research topics of 'Exploiting deep neural networks and head movements for binaural localisation of multiple speakers in reverberant conditions'. Together they form a unique fingerprint.

    Cite this