Predicting speech intelligibility in adverse conditions: evaluation of the speech-based envelope power spectrum model
Publication: Research - peer-review › Article in proceedings – Annual report year: 2011
The speech-based envelope power spectrum model (sEPSM) [Jørgensen and Dau (2011). J. Acoust. Soc. Am., 130 (3),
1475–1487] estimates the envelope signal-to-noise ratio (SNRenv) of distorted speech and accurately
describes the speech recognition thresholds (SRT) for normal-hearing listeners in conditions with additive noise, reverberation, and nonlinear
processing by spectral subtraction. The latter represents a condition where the standardized speech intelligibility index and speech transmission index
fail. However, the sEPSM is limited to stationary interferers due to the fact that predictions are based on the long-term SNRenv. As an attempt to extent
the model to deal with fluctuating interferers, a short-time version of the sEPSM is presented. The SNRenv of a speech sample is estimated from a
combination of SNRenv-values calculated in short time frames. The model is evaluated in adverse conditions by comparing predictions to measured data
from [Kjems et al. (2009). J. Acoust. Soc. Am. 126 (3), 1415-1426] where speech is mixed with four different interferers, including speech-shaped
noise, bottle noise, car noise, and cafe noise. The model accounts well for the differences in intelligibility observed for the different interferers. None
of the standardized models successfully describe these data.
1475–1487] estimates the envelope signal-to-noise ratio (SNRenv) of distorted speech and accurately
describes the speech recognition thresholds (SRT) for normal-hearing listeners in conditions with additive noise, reverberation, and nonlinear
processing by spectral subtraction. The latter represents a condition where the standardized speech intelligibility index and speech transmission index
fail. However, the sEPSM is limited to stationary interferers due to the fact that predictions are based on the long-term SNRenv. As an attempt to extent
the model to deal with fluctuating interferers, a short-time version of the sEPSM is presented. The SNRenv of a speech sample is estimated from a
combination of SNRenv-values calculated in short time frames. The model is evaluated in adverse conditions by comparing predictions to measured data
from [Kjems et al. (2009). J. Acoust. Soc. Am. 126 (3), 1415-1426] where speech is mixed with four different interferers, including speech-shaped
noise, bottle noise, car noise, and cafe noise. The model accounts well for the differences in intelligibility observed for the different interferers. None
of the standardized models successfully describe these data.
| Original language | English |
|---|---|
| Title | Proceedings of the 3rd International Symposium on Auditory and Audiological Research : Speech Perception and Auditory Disorders |
| Editors | Torsten Dau, Morten L. Jepsen, Torben Poulsen, Jakob C. Dalsgaard |
| Publisher | The Danavox Jubilee Foundation |
| Publication date | 2011 |
| Pages | 307-314 |
| ISBN (print) | 978-87-990013-3-0 |
| State | Published |
Conference
| Conference | 3rd International Symposium on Auditory and Audiological Research |
|---|---|
| Country | Denmark |
| City | Nyborg |
| Period | 24-08-11 → 26-08-11 |
| Internet address | http://www.isaar.eu/ISAAR_2011 |
Loading map data...
Download statistics
No data available
ID: 9595824