The role of auditory spectro-temporal modulation filtering and the decision metric for speech intelligibility prediction

Alexandre Chabot-Leclerc, Søren Jørgensen, Torsten Dau

    Research output: Contribution to journalJournal articleResearchpeer-review

    Abstract

    Speech intelligibility models typically consist of a preprocessing part that transforms stimuli into some internal (auditory) representation and a decision metric that relates the internal representation to speech intelligibility. The present study analyzed the role of modulation filtering in the preprocessing of different speech intelligibility models by comparing predictions from models that either assume a spectro-temporal (i.e., two-dimensional) or a temporal-only (i.e., one-dimensional) modulation filterbank. Furthermore, the role of the decision metric for speech intelligibility was investigated by comparing predictions from models based on the signal-to-noise envelope power ratio, SNRenv, and the modulation transfer function, MTF. The models were evaluated in conditions of noisy speech (1) subjected to reverberation, (2) distorted by phase jitter, or (3) processed by noise reduction via spectral subtraction. The results suggested that a decision metric based on the
    SNRenv may provide a more general basis for predicting speech intelligibility than a metric based on the MTF. Moreover, the one-dimensional modulation filtering process was found to be sufficient to account for the data when combined with a measure of across (audio) frequency variability at the output of the auditory preprocessing. A complex spectro-temporal modulation filterbank might therefore not be required for speech intelligibility prediction.
    Original languageEnglish
    JournalJournal of the Acoustical Society of America
    Volume135
    Issue number6
    Pages (from-to)3502–3512
    ISSN0001-4966
    DOIs
    Publication statusPublished - 2014

    Fingerprint

    Dive into the research topics of 'The role of auditory spectro-temporal modulation filtering and the decision metric for speech intelligibility prediction'. Together they form a unique fingerprint.

    Cite this