The speech-based envelope power spectrum model (sEPSM) family: Development, achievements, and current challenges

Helia Relano-Iborra, Alexandre Chabot-Leclerc, Christoph Scheidiger, Johannes Zaar, Torsten Dau

Research output: Contribution to journalConference abstract in journalResearchpeer-review

2 Downloads (Pure)


Intelligibility models provide insights regarding the effects of target speech characteristics, transmission channels and/or auditory processing on the speech perception performance of listeners. In 2011, Jørgensen and Dau proposed the speech-based envelope power spectrum model [sEPSM, Jørgensen and Dau (2011). J. Acoust. Soc. Am. 130(3), 1475-1487]. It uses the signal-to-noise ratio in the modulation domain (SNRenv) as a decision metric and was shown to accurately predict the intelligibility of processed noisy speech. The sEPSM concept has since been applied in various subsequent models, which have extended the predictive power of the original model to a broad range of conditions. This contribution presents the most recent developments within the sEPSM “family:” (i) A binaural extension, the B-sEPSM [Chabot-Leclerc et al. (2016). J. Acoust. Soc. Am. 140(1), 192-205] which combines better-ear and binaural unmasking processes and accounts for a large variety of spatial phenomena in speech perception; (ii) a correlation-based version [Relaño-Iborra et al. (2016). J. Acoust. Soc. Am. 140(4), 2670-2679] which extends the predictions of the early model to non-linear distortions, such as phase jitter and binary mask-processing; and (iii) a recent physiologically inspired extension, which allows to functionally account for effects of individual hearing impairment on speech perception.
Original languageEnglish
Article number3970
JournalThe Journal of the Acoustical Society of America
Publication statusPublished - 2017

Cite this