Predicting Speech Intelligibility Using a Nonlinear and Level-Dependent Auditory Processing Front End

Research output: Contribution to conferencePosterResearchpeer-review

Abstract

Relaño-Iborra et al. [2016, J. Acoust. Soc. Am., 140(4), 2670-2679] proposed a model, termed sEPSMcorr, which showed that the correlation between the envelope representations of clean and degraded speech is a powerful predictor of speech intelligibility in a wide range of listening conditions. However, due to its simplistic linear preprocessing, sEPSMcorr cannot account for the level-dependent effects and nonlinear properties of the sound transduction in the auditory periphery, which is a prerequisite for accounting for the consequences of sensorineural hearing loss. Thus, in the present study, a more realistic, nonlinear preprocessing was combined with the correlation-based back end. Specifically, the front end of the computational auditory signal processing and perception model [CASP; Jepsen et al. (2008), J. Acoust. Soc. Am. 124(1), 422-438] was employed, which has been shown to successfully account for psychoacoustic data in conditions of, e.g., spectral masking, amplitude-modulation detection as well as forward masking, for both normal-hearing (NH) and hearing impaired listeners. The proposed speech-based CASP model, denoted sCASP, receives the clean and degraded speech signals as input. The signals are processed through outer- and middle-ear filtering, a nonlinear auditory filterbank including inner- and outer hair-cell processing, adaptation, as well as a modulation filterbank. The internal representations at the output of these stages are analyzed using a correlation-based back end.
Speech intelligibility predictions obtained with the speech-based CASP implementation are presented and compared to NH listener data obtained in conditions of additive noise, phase jitter, ideal binary mask processing and reverberation. The results demonstrate a large predictive power of the model. As the front end of sCASP can - unlike the front end of its predecessor sEPSMcorr- be parametrized to account for sensorineural hearing loss, the proposed framework may provide a valuable basis for evaluating the consequences of different aspects of hearing loss on speech intelligibility in the various experimental conditions
Original languageEnglish
Publication date2018
Publication statusPublished - 2018
Event41st annual ARO Midwinter Meeting - Manchester Grand Hyatt, San Diego, United States
Duration: 10 Feb 201814 Feb 2018

Conference

Conference41st annual ARO Midwinter Meeting
LocationManchester Grand Hyatt
CountryUnited States
CitySan Diego
Period10/02/201814/02/2018

Cite this

Iborra, H. R., Zaar, J., & Dau, T. (2018). Predicting Speech Intelligibility Using a Nonlinear and Level-Dependent Auditory Processing Front End. Poster session presented at 41st annual ARO Midwinter Meeting , San Diego, United States.
@conference{f9a5244ea53044d4a93e39018182a56f,
title = "Predicting Speech Intelligibility Using a Nonlinear and Level-Dependent Auditory Processing Front End",
abstract = "Rela{\~n}o-Iborra et al. [2016, J. Acoust. Soc. Am., 140(4), 2670-2679] proposed a model, termed sEPSMcorr, which showed that the correlation between the envelope representations of clean and degraded speech is a powerful predictor of speech intelligibility in a wide range of listening conditions. However, due to its simplistic linear preprocessing, sEPSMcorr cannot account for the level-dependent effects and nonlinear properties of the sound transduction in the auditory periphery, which is a prerequisite for accounting for the consequences of sensorineural hearing loss. Thus, in the present study, a more realistic, nonlinear preprocessing was combined with the correlation-based back end. Specifically, the front end of the computational auditory signal processing and perception model [CASP; Jepsen et al. (2008), J. Acoust. Soc. Am. 124(1), 422-438] was employed, which has been shown to successfully account for psychoacoustic data in conditions of, e.g., spectral masking, amplitude-modulation detection as well as forward masking, for both normal-hearing (NH) and hearing impaired listeners. The proposed speech-based CASP model, denoted sCASP, receives the clean and degraded speech signals as input. The signals are processed through outer- and middle-ear filtering, a nonlinear auditory filterbank including inner- and outer hair-cell processing, adaptation, as well as a modulation filterbank. The internal representations at the output of these stages are analyzed using a correlation-based back end.Speech intelligibility predictions obtained with the speech-based CASP implementation are presented and compared to NH listener data obtained in conditions of additive noise, phase jitter, ideal binary mask processing and reverberation. The results demonstrate a large predictive power of the model. As the front end of sCASP can - unlike the front end of its predecessor sEPSMcorr- be parametrized to account for sensorineural hearing loss, the proposed framework may provide a valuable basis for evaluating the consequences of different aspects of hearing loss on speech intelligibility in the various experimental conditions",
author = "Iborra, {Helia Relano} and Johannes Zaar and Torsten Dau",
year = "2018",
language = "English",
note = "41<sup>st</sup> annual ARO Midwinter Meeting ; Conference date: 10-02-2018 Through 14-02-2018",

}

Iborra, HR, Zaar, J & Dau, T 2018, 'Predicting Speech Intelligibility Using a Nonlinear and Level-Dependent Auditory Processing Front End' 41st annual ARO Midwinter Meeting , San Diego, United States, 10/02/2018 - 14/02/2018, .

Predicting Speech Intelligibility Using a Nonlinear and Level-Dependent Auditory Processing Front End. / Iborra, Helia Relano; Zaar, Johannes; Dau, Torsten.

2018. Poster session presented at 41st annual ARO Midwinter Meeting , San Diego, United States.

Research output: Contribution to conferencePosterResearchpeer-review

TY - CONF

T1 - Predicting Speech Intelligibility Using a Nonlinear and Level-Dependent Auditory Processing Front End

AU - Iborra, Helia Relano

AU - Zaar, Johannes

AU - Dau, Torsten

PY - 2018

Y1 - 2018

N2 - Relaño-Iborra et al. [2016, J. Acoust. Soc. Am., 140(4), 2670-2679] proposed a model, termed sEPSMcorr, which showed that the correlation between the envelope representations of clean and degraded speech is a powerful predictor of speech intelligibility in a wide range of listening conditions. However, due to its simplistic linear preprocessing, sEPSMcorr cannot account for the level-dependent effects and nonlinear properties of the sound transduction in the auditory periphery, which is a prerequisite for accounting for the consequences of sensorineural hearing loss. Thus, in the present study, a more realistic, nonlinear preprocessing was combined with the correlation-based back end. Specifically, the front end of the computational auditory signal processing and perception model [CASP; Jepsen et al. (2008), J. Acoust. Soc. Am. 124(1), 422-438] was employed, which has been shown to successfully account for psychoacoustic data in conditions of, e.g., spectral masking, amplitude-modulation detection as well as forward masking, for both normal-hearing (NH) and hearing impaired listeners. The proposed speech-based CASP model, denoted sCASP, receives the clean and degraded speech signals as input. The signals are processed through outer- and middle-ear filtering, a nonlinear auditory filterbank including inner- and outer hair-cell processing, adaptation, as well as a modulation filterbank. The internal representations at the output of these stages are analyzed using a correlation-based back end.Speech intelligibility predictions obtained with the speech-based CASP implementation are presented and compared to NH listener data obtained in conditions of additive noise, phase jitter, ideal binary mask processing and reverberation. The results demonstrate a large predictive power of the model. As the front end of sCASP can - unlike the front end of its predecessor sEPSMcorr- be parametrized to account for sensorineural hearing loss, the proposed framework may provide a valuable basis for evaluating the consequences of different aspects of hearing loss on speech intelligibility in the various experimental conditions

AB - Relaño-Iborra et al. [2016, J. Acoust. Soc. Am., 140(4), 2670-2679] proposed a model, termed sEPSMcorr, which showed that the correlation between the envelope representations of clean and degraded speech is a powerful predictor of speech intelligibility in a wide range of listening conditions. However, due to its simplistic linear preprocessing, sEPSMcorr cannot account for the level-dependent effects and nonlinear properties of the sound transduction in the auditory periphery, which is a prerequisite for accounting for the consequences of sensorineural hearing loss. Thus, in the present study, a more realistic, nonlinear preprocessing was combined with the correlation-based back end. Specifically, the front end of the computational auditory signal processing and perception model [CASP; Jepsen et al. (2008), J. Acoust. Soc. Am. 124(1), 422-438] was employed, which has been shown to successfully account for psychoacoustic data in conditions of, e.g., spectral masking, amplitude-modulation detection as well as forward masking, for both normal-hearing (NH) and hearing impaired listeners. The proposed speech-based CASP model, denoted sCASP, receives the clean and degraded speech signals as input. The signals are processed through outer- and middle-ear filtering, a nonlinear auditory filterbank including inner- and outer hair-cell processing, adaptation, as well as a modulation filterbank. The internal representations at the output of these stages are analyzed using a correlation-based back end.Speech intelligibility predictions obtained with the speech-based CASP implementation are presented and compared to NH listener data obtained in conditions of additive noise, phase jitter, ideal binary mask processing and reverberation. The results demonstrate a large predictive power of the model. As the front end of sCASP can - unlike the front end of its predecessor sEPSMcorr- be parametrized to account for sensorineural hearing loss, the proposed framework may provide a valuable basis for evaluating the consequences of different aspects of hearing loss on speech intelligibility in the various experimental conditions

M3 - Poster

ER -

Iborra HR, Zaar J, Dau T. Predicting Speech Intelligibility Using a Nonlinear and Level-Dependent Auditory Processing Front End. 2018. Poster session presented at 41st annual ARO Midwinter Meeting , San Diego, United States.