Predicting binaural speech intelligibility using the signal-to-noise ratio in the envelope power spectrum domain

Alexandre Chabot-Leclerc, Ewen MacDonald, Torsten Dau

Research output: Contribution to journalJournal articleResearchpeer-review

389 Downloads (Pure)

Abstract

This study proposes a binaural extension to the multi-resolution speech-based envelope power spectrum model (mr-sEPSM) [Jørgensen, Ewert, and Dau (2013). J. Acoust. Soc. Am. 134, 436–446]. It consists of a combination of better-ear (BE) and binaural unmasking processes, implemented as two monaural realizations of the mr-sEPSM combined with a short-term equalization-cancellation process, and uses the signal-to-noise ratio in the envelope domain (SNRenv) as the decision metric. The model requires only two parameters to be fitted per speech material and does not require an explicit frequency weighting. The model was validated against three data sets from the literature, which covered the following effects: the number of maskers, the masker types [speech-shaped noise (SSN), speech-modulated SSN, babble, and reversed speech], the masker(s) azimuths, reverberation on the target and masker, and the interaural time difference of the target and masker. The Pearson correlation coefficient between the simulated speech reception thresholds and the data across all experiments was 0.91. A model version that considered only BE processing performed similarly (correlation coefficient of 0.86) to the complete model, suggesting that BE processing could be considered sufficient to predict intelligibility in most realistic conditions.
Original languageEnglish
JournalJournal of the Acoustical Society of America
Volume140
Issue number1
Pages (from-to)192–205
ISSN0001-4966
DOIs
Publication statusPublished - 2016

Fingerprint Dive into the research topics of 'Predicting binaural speech intelligibility using the signal-to-noise ratio in the envelope power spectrum domain'. Together they form a unique fingerprint.

Cite this