Predicting speech intelligibility based on a correlation metric in the envelope power spectrum domain

Research output: Contribution to journalJournal articleResearchpeer-review

290 Downloads (Pure)

Abstract

A speech intelligibility prediction model is proposed that combines the auditory processing front end of the multi-resolution speech-based envelope power spectrum model [mr-sEPSM; Jørgensen, Ewert, and Dau (2013). J. Acoust. Soc. Am. 134(1), 436–446] with a correlation back end inspired by the short-time objective intelligibility measure [STOI; Taal, Hendriks, Heusdens, and Jensen (2011). IEEE Trans. Audio Speech Lang. Process. 19(7), 2125–2136]. This “hybrid” model, named sEPSMcorr, is shown to account for the effects of stationary and fluctuating additive interferers as well as for the effects of non-linear distortions, such as spectral subtraction, phase jitter, and ideal time frequency segregation (ITFS). The model shows a broader predictive range than both the original mr-sEPSM (which fails in the phase-jitter and ITFS conditions) and STOI (which fails to predict the influence of fluctuating interferers), albeit with lower accuracy than the source models in some individual conditions. Similar to other models that employ a short-term correlation-based back end, including STOI, the proposed model fails to account for the effects of room reverberation on speech intelligibility. Overall, the model might be valuable for evaluating the effects of a large range of interferers and distortions on speech intelligibility, including consequences of hearing impairment and hearing-instrument signal processing.
Original languageEnglish
JournalJournal of the Acoustical Society of America
Volume140
Issue number4
Pages (from-to)2670–2679
ISSN0001-4966
DOIs
Publication statusPublished - 2016

Bibliographical note

©2016 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution (CC BY)

Cite this

@article{4e4a3f9e265840838c5c2c94cfdf8408,
title = "Predicting speech intelligibility based on a correlation metric in the envelope power spectrum domain",
abstract = "A speech intelligibility prediction model is proposed that combines the auditory processing front end of the multi-resolution speech-based envelope power spectrum model [mr-sEPSM; J{\o}rgensen, Ewert, and Dau (2013). J. Acoust. Soc. Am. 134(1), 436–446] with a correlation back end inspired by the short-time objective intelligibility measure [STOI; Taal, Hendriks, Heusdens, and Jensen (2011). IEEE Trans. Audio Speech Lang. Process. 19(7), 2125–2136]. This “hybrid” model, named sEPSMcorr, is shown to account for the effects of stationary and fluctuating additive interferers as well as for the effects of non-linear distortions, such as spectral subtraction, phase jitter, and ideal time frequency segregation (ITFS). The model shows a broader predictive range than both the original mr-sEPSM (which fails in the phase-jitter and ITFS conditions) and STOI (which fails to predict the influence of fluctuating interferers), albeit with lower accuracy than the source models in some individual conditions. Similar to other models that employ a short-term correlation-based back end, including STOI, the proposed model fails to account for the effects of room reverberation on speech intelligibility. Overall, the model might be valuable for evaluating the effects of a large range of interferers and distortions on speech intelligibility, including consequences of hearing impairment and hearing-instrument signal processing.",
author = "Helia Rela{\~n}o-Iborra and Tobias May and Johannes Zaar and Christoph Scheidiger and Torsten Dau",
note = "{\circledC}2016 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution (CC BY)",
year = "2016",
doi = "10.1121/1.4964505",
language = "English",
volume = "140",
pages = "2670–2679",
journal = "Acoustical Society of America. Journal",
issn = "0001-4966",
publisher = "A I P Publishing LLC",
number = "4",

}

Predicting speech intelligibility based on a correlation metric in the envelope power spectrum domain. / Relaño-Iborra, Helia; May, Tobias; Zaar, Johannes; Scheidiger, Christoph; Dau, Torsten.

In: Journal of the Acoustical Society of America, Vol. 140, No. 4, 2016, p. 2670–2679.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - Predicting speech intelligibility based on a correlation metric in the envelope power spectrum domain

AU - Relaño-Iborra, Helia

AU - May, Tobias

AU - Zaar, Johannes

AU - Scheidiger, Christoph

AU - Dau, Torsten

N1 - ©2016 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution (CC BY)

PY - 2016

Y1 - 2016

N2 - A speech intelligibility prediction model is proposed that combines the auditory processing front end of the multi-resolution speech-based envelope power spectrum model [mr-sEPSM; Jørgensen, Ewert, and Dau (2013). J. Acoust. Soc. Am. 134(1), 436–446] with a correlation back end inspired by the short-time objective intelligibility measure [STOI; Taal, Hendriks, Heusdens, and Jensen (2011). IEEE Trans. Audio Speech Lang. Process. 19(7), 2125–2136]. This “hybrid” model, named sEPSMcorr, is shown to account for the effects of stationary and fluctuating additive interferers as well as for the effects of non-linear distortions, such as spectral subtraction, phase jitter, and ideal time frequency segregation (ITFS). The model shows a broader predictive range than both the original mr-sEPSM (which fails in the phase-jitter and ITFS conditions) and STOI (which fails to predict the influence of fluctuating interferers), albeit with lower accuracy than the source models in some individual conditions. Similar to other models that employ a short-term correlation-based back end, including STOI, the proposed model fails to account for the effects of room reverberation on speech intelligibility. Overall, the model might be valuable for evaluating the effects of a large range of interferers and distortions on speech intelligibility, including consequences of hearing impairment and hearing-instrument signal processing.

AB - A speech intelligibility prediction model is proposed that combines the auditory processing front end of the multi-resolution speech-based envelope power spectrum model [mr-sEPSM; Jørgensen, Ewert, and Dau (2013). J. Acoust. Soc. Am. 134(1), 436–446] with a correlation back end inspired by the short-time objective intelligibility measure [STOI; Taal, Hendriks, Heusdens, and Jensen (2011). IEEE Trans. Audio Speech Lang. Process. 19(7), 2125–2136]. This “hybrid” model, named sEPSMcorr, is shown to account for the effects of stationary and fluctuating additive interferers as well as for the effects of non-linear distortions, such as spectral subtraction, phase jitter, and ideal time frequency segregation (ITFS). The model shows a broader predictive range than both the original mr-sEPSM (which fails in the phase-jitter and ITFS conditions) and STOI (which fails to predict the influence of fluctuating interferers), albeit with lower accuracy than the source models in some individual conditions. Similar to other models that employ a short-term correlation-based back end, including STOI, the proposed model fails to account for the effects of room reverberation on speech intelligibility. Overall, the model might be valuable for evaluating the effects of a large range of interferers and distortions on speech intelligibility, including consequences of hearing impairment and hearing-instrument signal processing.

U2 - 10.1121/1.4964505

DO - 10.1121/1.4964505

M3 - Journal article

VL - 140

SP - 2670

EP - 2679

JO - Acoustical Society of America. Journal

JF - Acoustical Society of America. Journal

SN - 0001-4966

IS - 4

ER -