Abstract
The perceptual basis of consonant recognition was
experimentally investigated through a study of how information
associated with phonetic features (Voicing, Manner, and Place of
Articulation) combines across the acoustic-frequency spectrum.
The speech signals, 11 Danish consonants embedded in Consonant
+ Vowel + Liquid syllables, were partitioned into 3/4-octave
bands (“slits”) centered at 750 Hz, 1500 Hz, and 3000 Hz, and
presented individually and in two- or three-slit combinations. The
amount of information transmitted (IT) was calculated from consonant-
confusion matrices for each feature and slit combination.
The growth of IT was measured as a function of the number of
slits presented and their center frequency for the phonetic features
and consonants. The IT associated with Voicing, Manner, and
Consonants sums nearly linearly for two-band stimuli irrespective
of their center frequency. Adding a third band increases the IT by
an amount somewhat less than predicted by linear cross-spectral
integration (i.e., a compressive function). In contrast, for Place of
Articulation, the IT gained through addition of a second or third
slit is far more than predicted by linear, cross-spectral summation.
This difference is mirrored in a measure of error-pattern
similarity across bands—Symmetric Redundancy. Consonants, as
well as Voicing and Manner, share a moderate degree of redundancy
between bands. In contrast, the cross-spectral redundancy
associated with Place is close to zero, which means the bands
are essentially independent in terms of decoding this feature.
Because consonant recognition and Place decoding are highly
correlated (correlation coefficient r2 = 0.99), these results imply
that the auditory processes underlying consonant recognition
are not strictly linear. This may account for why conventional
cross-spectral integration speech models, such as the Articulation
Index, Speech Intelligibility Index, and the Speech Transmission
Index do not predict intelligibility and segment recognition well
under certain conditions (e.g., discontiguous frequency bands,
audio-visual speech).
| Original language | English |
|---|---|
| Journal | IEEE Transactions on Audio, Speech and Language Processing |
| Volume | 20 |
| Issue number | 1 |
| Pages (from-to) | 147-161 |
| ISSN | 1558-7916 |
| DOIs | |
| Publication status | Published - 2012 |
Keywords
- Speech perception
- Cross-spectral integration
- Consonant recognition
- Phonetic features
- Information theory
Fingerprint
Dive into the research topics of 'Perceptual Confusions Among Consonants, Revisited: Cross-Spectral Integration of Phonetic-Feature Information and Consonant Recognition'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver