Standard

Harvard

APA

CBE

MLA

Vancouver

Author

Bibtex

@article{afb18100858b4ca099e7918699d146e4,
title = "Perceptual Confusions Among Consonants, Revisited: Cross-Spectral Integration of Phonetic-Feature Information and Consonant Recognition",
keywords = "Speech perception, Cross-spectral integration, Consonant recognition, Phonetic features, Information theory",
publisher = "I E E E",
author = "Christiansen, {Thomas Ulrich} and Steven Greenberg",
year = "2012",
doi = "10.1109/TASL.2011.2159202",
volume = "20",
number = "1",
pages = "147--161",
journal = "I E E E Transactions on Audio, Speech and Language Processing",
issn = "1558-7916",

}

RIS

TY - JOUR

T1 - Perceptual Confusions Among Consonants, Revisited: Cross-Spectral Integration of Phonetic-Feature Information and Consonant Recognition

A1 - Christiansen,Thomas Ulrich

A1 - Greenberg,Steven

AU - Christiansen,Thomas Ulrich

AU - Greenberg,Steven

PB - I E E E

PY - 2012

Y1 - 2012

N2 - The perceptual basis of consonant recognition was experimentally investigated through a study of how information associated with phonetic features (Voicing, Manner, and Place of Articulation) combines across the acoustic-frequency spectrum. The speech signals, 11 Danish consonants embedded in Consonant + Vowel + Liquid syllables, were partitioned into 3/4-octave bands (“slits”) centered at 750 Hz, 1500 Hz, and 3000 Hz, and presented individually and in two- or three-slit combinations. The amount of information transmitted (IT) was calculated from consonant- confusion matrices for each feature and slit combination. The growth of IT was measured as a function of the number of slits presented and their center frequency for the phonetic features and consonants. The IT associated with Voicing, Manner, and Consonants sums nearly linearly for two-band stimuli irrespective of their center frequency. Adding a third band increases the IT by an amount somewhat less than predicted by linear cross-spectral integration (i.e., a compressive function). In contrast, for Place of Articulation, the IT gained through addition of a second or third slit is far more than predicted by linear, cross-spectral summation. This difference is mirrored in a measure of error-pattern similarity across bands—Symmetric Redundancy. Consonants, as well as Voicing and Manner, share a moderate degree of redundancy between bands. In contrast, the cross-spectral redundancy associated with Place is close to zero, which means the bands are essentially independent in terms of decoding this feature. Because consonant recognition and Place decoding are highly correlated (correlation coefficient r2 = 0.99), these results imply that the auditory processes underlying consonant recognition are not strictly linear. This may account for why conventional cross-spectral integration speech models, such as the Articulation Index, Speech Intelligibility Index, and the Speech Transmission Index do not predict intelligibility and segment recognition well under certain conditions (e.g., discontiguous frequency bands, audio-visual speech).

AB - The perceptual basis of consonant recognition was experimentally investigated through a study of how information associated with phonetic features (Voicing, Manner, and Place of Articulation) combines across the acoustic-frequency spectrum. The speech signals, 11 Danish consonants embedded in Consonant + Vowel + Liquid syllables, were partitioned into 3/4-octave bands (“slits”) centered at 750 Hz, 1500 Hz, and 3000 Hz, and presented individually and in two- or three-slit combinations. The amount of information transmitted (IT) was calculated from consonant- confusion matrices for each feature and slit combination. The growth of IT was measured as a function of the number of slits presented and their center frequency for the phonetic features and consonants. The IT associated with Voicing, Manner, and Consonants sums nearly linearly for two-band stimuli irrespective of their center frequency. Adding a third band increases the IT by an amount somewhat less than predicted by linear cross-spectral integration (i.e., a compressive function). In contrast, for Place of Articulation, the IT gained through addition of a second or third slit is far more than predicted by linear, cross-spectral summation. This difference is mirrored in a measure of error-pattern similarity across bands—Symmetric Redundancy. Consonants, as well as Voicing and Manner, share a moderate degree of redundancy between bands. In contrast, the cross-spectral redundancy associated with Place is close to zero, which means the bands are essentially independent in terms of decoding this feature. Because consonant recognition and Place decoding are highly correlated (correlation coefficient r2 = 0.99), these results imply that the auditory processes underlying consonant recognition are not strictly linear. This may account for why conventional cross-spectral integration speech models, such as the Articulation Index, Speech Intelligibility Index, and the Speech Transmission Index do not predict intelligibility and segment recognition well under certain conditions (e.g., discontiguous frequency bands, audio-visual speech).

KW - Speech perception

KW - Cross-spectral integration

KW - Consonant recognition

KW - Phonetic features

KW - Information theory

U2 - 10.1109/TASL.2011.2159202

DO - 10.1109/TASL.2011.2159202

JO - I E E E Transactions on Audio, Speech and Language Processing

JF - I E E E Transactions on Audio, Speech and Language Processing

SN - 1558-7916

IS - 1

VL - 20

SP - 147

EP - 161

ER -