In everyday life, the speech we listen to is often mixed with many other sound sources as well as reverberation. In such situations, people with normal hearing are able to almost effortlessly segregate a single voice out of the background. In contrast, hearing-impaired people have great difficulty understanding speech when more than one person is talking, even when reduced audibility has been fully compensated for by a hearing aid. The reasons for these difficulties are not well understood. This presentation highlights recent concepts of the monaural and binaural signal processing strategies employed by the normal as well as impaired auditory system. Jørgensen and Dau [(2011). J. Acoust. Soc. Am. 130, 1475-1487] proposed the speech-based envelope power spectrum model (sEPSM) in an attempt to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII) in conditions with nonlinearly processed speech. Instead of considering the reduction of the temporal modulation energy as the intelligibility metric, as assumed in the STI, the sEPSM applies the signal-to-noise ratio in the envelope domain (SNRenv). This metric was shown to be the key for predicting the intelligibility of reverberant speech as well as noisy speech processed by spectral subtraction. However, the sEPSM cannot account for speech subjected to phase jitter, a condition in which the spectral structure of speech is destroyed, while the broadband temporal envelope is kept largely intact. In contrast, the effects of this distortion can be predicted successfully by the spectro-temporal modulation index (STMI) [Elhilali et al., (2003). Speech Commun. 41, 331-348], which assumes an explicit analysis of the spectral modulation energy. However, since the STMI applies the same decision metric as the STI, it fails to account for spectral subtraction. The results from the different modeling approaches suggest that the SNRenv might be a key decision metric while some explicit across-frequency pre-processing seems crucial to extract relevant speech features in some conditions.
|Title of host publication||The Listening Talker : Proceedings|
|Publication status||Published - 2012|
|Event||The Listening Talker: An interdisciplinary workshop on natural and synthetic modification of speech in response to listening conditions - Informatics Forum, Edinburgh, United Kingdom|
Duration: 2 May 2012 → 3 May 2012
|Conference||The Listening Talker|
|Period||02/05/2012 → 03/05/2012|