TY - JOUR
T1 - Effects of manipulating the signal-to-noise envelope power ratio on speech intelligibility
AU - Jørgensen, Søren
AU - Decorsière, Remi Julien Blaise
AU - Dau, Torsten
PY - 2015
Y1 - 2015
N2 - Jørgensen and Dau [(2011). J. Acoust. Soc. Am. 130, 1475–1487] suggested a metric for speech
intelligibility prediction based on the signal-to-noise envelope power ratio (SNRenv), calculated at
the output of a modulation-frequency selective process. In the framework of the speech-based envelope
power spectrum model (sEPSM), the SNRenv was demonstrated to account for speech intelligibility
data in various conditions with linearly and nonlinearly processed noisy speech, as well as for
conditions with stationary and fluctuating interferers. Here, the relation between the SNRenv and
speech intelligibility was investigated further by systematically varying the modulation power of either
the speech or the noise before mixing the two components, while keeping the overall power ratio
of the two components constant. A good correspondence between the data and the
corresponding sEPSM predictions was obtained when the noise was manipulated and mixed with
the unprocessed speech, consistent with the hypothesis that SNRenv is indicative of speech intelligibility.
However, discrepancies between data and predictions occurred for conditions where the
speech was manipulated and the noise left untouched. In these conditions, distortions introduced by
the applied modulation processing were detrimental for speech intelligibility, but not reflected in
the SNRenv metric, thus representing a limitation of the modeling framework.
AB - Jørgensen and Dau [(2011). J. Acoust. Soc. Am. 130, 1475–1487] suggested a metric for speech
intelligibility prediction based on the signal-to-noise envelope power ratio (SNRenv), calculated at
the output of a modulation-frequency selective process. In the framework of the speech-based envelope
power spectrum model (sEPSM), the SNRenv was demonstrated to account for speech intelligibility
data in various conditions with linearly and nonlinearly processed noisy speech, as well as for
conditions with stationary and fluctuating interferers. Here, the relation between the SNRenv and
speech intelligibility was investigated further by systematically varying the modulation power of either
the speech or the noise before mixing the two components, while keeping the overall power ratio
of the two components constant. A good correspondence between the data and the
corresponding sEPSM predictions was obtained when the noise was manipulated and mixed with
the unprocessed speech, consistent with the hypothesis that SNRenv is indicative of speech intelligibility.
However, discrepancies between data and predictions occurred for conditions where the
speech was manipulated and the noise left untouched. In these conditions, distortions introduced by
the applied modulation processing were detrimental for speech intelligibility, but not reflected in
the SNRenv metric, thus representing a limitation of the modeling framework.
U2 - 10.1121/1.4908240
DO - 10.1121/1.4908240
M3 - Journal article
C2 - 25786952
SN - 0001-4966
VL - 137
SP - 1401
EP - 1410
JO - Journal of the Acoustical Society of America
JF - Journal of the Acoustical Society of America
IS - 3
ER -