TY - GEN
T1 - Frequency Selective Filtering of the Modulation Spectrum and its Impact on Consonant Identification
AU - Christiansen, Thomas Ulrich
AU - Greenberg, Steven
PY - 2009
Y1 - 2009
N2 - The spectro-temporal coding of Danish consonants was investigated using an
information-theoretic approach. Listeners were asked to identify eleven different
consonants spoken in a CV[l] syllable context (where C refers to the initial consonant, V
refers to one of three vowels, [I, a, u], and [l] refers to the syllable-final liquid segment).
Each syllable was processed so that only a portion of the original audio spectrum was
present. Narrow (three-quarter octave) bands of speech, with center frequencies of 750
Hz, 1500 Hz and 3000 Hz, were presented individually and in combination with each
other. The modulation spectrum of each band was low-pass filtered at 24, 12, 6 and 3 Hz.
Confusion matrices of the consonant-identification data were computed, and from these
the amount of information transmitted for each of three phonetic feature dimensions
– voicing, manner and place of articulation – was calculated for each condition.
This form of analysis provides a simple means of determining whether information
associated with each phonetic feature dimension combines linearly across the audio
spectrum, and, if not, delineates a method for characterizing the (non-linear) nature of
information integration. In addition, the analysis provides a means to associate specific
portions of the modulation spectrum with phonetic feature properties. Such analyses
indicate that:
(1) Accurate, robust decoding of place-of-articulation information requires
broadband cross-spectral integration
(2) Place-of-articulation information is associated most closely with the modulation
spectrum above 6 Hz, with the most significant contribution coming from the
region above 12 Hz.
(3) Place-of-articulation information is crucial for accurate consonant recognition.
Hence, consonant decoding requires cross-spectral integration of the modulation
spectrum above 8 Hz.
(4) Voicing is mainly associated with the modulation spectrum between 3 and 6 Hz
(with a smaller contribution made by the region above 12 Hz).
(5) Manner of articulation is most closely associated with the portion of the
modulation spectrum above 12 Hz.
This form of information-theoretic analysis can be used to delineate those parts of the
speech signal of greatest importance for encoding phonetic features associated with
intelligibility and speech understanding.
AB - The spectro-temporal coding of Danish consonants was investigated using an
information-theoretic approach. Listeners were asked to identify eleven different
consonants spoken in a CV[l] syllable context (where C refers to the initial consonant, V
refers to one of three vowels, [I, a, u], and [l] refers to the syllable-final liquid segment).
Each syllable was processed so that only a portion of the original audio spectrum was
present. Narrow (three-quarter octave) bands of speech, with center frequencies of 750
Hz, 1500 Hz and 3000 Hz, were presented individually and in combination with each
other. The modulation spectrum of each band was low-pass filtered at 24, 12, 6 and 3 Hz.
Confusion matrices of the consonant-identification data were computed, and from these
the amount of information transmitted for each of three phonetic feature dimensions
– voicing, manner and place of articulation – was calculated for each condition.
This form of analysis provides a simple means of determining whether information
associated with each phonetic feature dimension combines linearly across the audio
spectrum, and, if not, delineates a method for characterizing the (non-linear) nature of
information integration. In addition, the analysis provides a means to associate specific
portions of the modulation spectrum with phonetic feature properties. Such analyses
indicate that:
(1) Accurate, robust decoding of place-of-articulation information requires
broadband cross-spectral integration
(2) Place-of-articulation information is associated most closely with the modulation
spectrum above 6 Hz, with the most significant contribution coming from the
region above 12 Hz.
(3) Place-of-articulation information is crucial for accurate consonant recognition.
Hence, consonant decoding requires cross-spectral integration of the modulation
spectrum above 8 Hz.
(4) Voicing is mainly associated with the modulation spectrum between 3 and 6 Hz
(with a smaller contribution made by the region above 12 Hz).
(5) Manner of articulation is most closely associated with the portion of the
modulation spectrum above 12 Hz.
This form of information-theoretic analysis can be used to delineate those parts of the
speech signal of greatest importance for encoding phonetic features associated with
intelligibility and speech understanding.
M3 - Article in proceedings
SN - 978-87-593-1479-1
T3 - Copenhagen Studies in Language
SP - 119
EP - 133
BT - Linguistic Theory Raw Sound
A2 - Juel-Henrichsen, Peter
PB - Samfundslitteratur
T2 - Linguistic Theory and Raw Sound
Y2 - 1 January 2009
ER -