Simultaneous localization and identification of speakers in noisy and reverberant environments

Tobias May, Steven Van De Par, Armin Kohlrausch

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

Whereas the human auditory system has remarkable capabilities to focus on a particular target source in complex multi-source scenarios, it has remained a challenging task to develop algorithms that are able to retrieve information about sound sources in a complex acoustic scene (e.g. to localize and identify active speech sources). A robust binaural scene recognizer will be presented that is able to simultaneously localize and classify a predefined number of target speech sources in the presence of reverberation and interfering noise. The model consists of three stages: localization stage, detection of speech sources, and recognition of speaker identities. First, a binaural front-end is used to localize relevant sound source activity. Based on this localization information, a binary mask is determined which identifies the activity of individual sound sources on a time-frequency (T-F) basis. The localization is based on the supervised learning of azimuth-dependent binaural features, namely interaural time and level differences (ITDs and ILDs). Secondly, a speech detection module determines whether the corresponding source type is speech or noise for all sound sources that have been found. For this purpose the estimated binary mask and the corresponding spectral features are passed to a missing data classifier for each sound source candidate. Finally, the speaker identity of all detected speech sources is recognized. The proposed system is analyzed in simulated, adverse conditions including interfering noise, reverberation and the presence of multiple target sources. Compared to a state-of-the art MFCC recognizer, the proposed model achieves significant speaker recognition accuracy improvements.
Original languageEnglish
Title of host publicationProceedings of Forum Acusticum
Publication date2011
Pages2121-2126
Publication statusPublished - 2011
Externally publishedYes
EventForum Acusticum 2011 - Aalborg, Denmark
Duration: 26 Jun 20111 Jul 2011
http://www.fa2011.org/

Conference

ConferenceForum Acusticum 2011
Country/TerritoryDenmark
CityAalborg
Period26/06/201101/07/2011
Internet address
SeriesProceedings of Forum Acusticum
ISSN2221-3767

Fingerprint

Dive into the research topics of 'Simultaneous localization and identification of speakers in noisy and reverberant environments'. Together they form a unique fingerprint.

Cite this