Frequency Selective Filtering of the Modulation Spectrum and its Impact on Consonant Identification

Thomas Ulrich Christiansen, Steven Greenberg, A.N. Rasmussen (Editor), Torben Poulsen (Editor)

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review


The spectro-temporal coding of Danish consonants was investigated using an information-theoretic approach. Listeners were asked to identify eleven different consonants spoken in a CV[l] syllable context (where C refers to the initial consonant, V refers to one of three vowels, [I, a, u], and [l] refers to the syllable-final liquid segment). Each syllable was processed so that only a portion of the original audio spectrum was present. Narrow (three-quarter octave) bands of speech, with center frequencies of 750 Hz, 1500 Hz and 3000 Hz, were presented individually and in combination with each other. The modulation spectrum of each band was low-pass filtered at 24, 12, 6 and 3 Hz. Confusion matrices of the consonant-identification data were computed, and from these the amount of information transmitted for each of three phonetic feature dimensions – voicing, manner and place of articulation – was calculated for each condition. This form of analysis provides a simple means of determining whether information associated with each phonetic feature dimension combines linearly across the audio spectrum, and, if not, delineates a method for characterizing the (non-linear) nature of information integration. In addition, the analysis provides a means to associate specific portions of the modulation spectrum with phonetic feature properties. Such analyses indicate that: (1) Accurate, robust decoding of place-of-articulation information requires broadband cross-spectral integration (2) Place-of-articulation information is associated most closely with the modulation spectrum above 6 Hz, with the most significant contribution coming from the region above 12 Hz. (3) Place-of-articulation information is crucial for accurate consonant recognition. Hence, consonant decoding requires cross-spectral integration of the modulation spectrum above 8 Hz. (4) Voicing is mainly associated with the modulation spectrum between 3 and 6 Hz (with a smaller contribution made by the region above 12 Hz). (5) Manner of articulation is most closely associated with the portion of the modulation spectrum above 12 Hz. This form of information-theoretic analysis can be used to delineate those parts of the speech signal of greatest importance for encoding phonetic features associated with intelligibility and speech understanding.
Original languageEnglish
Title of host publicationProceedings of the 21st Danavox Symposium "Hearing Aid Fitting"
Publication date2005
Publication statusPublished - 2005
Event21st Danavox Symposium - Kolding, Denmark
Duration: 31 Aug 20052 Sept 2005
Conference number: 21


Conference21st Danavox Symposium


Dive into the research topics of 'Frequency Selective Filtering of the Modulation Spectrum and its Impact on Consonant Identification'. Together they form a unique fingerprint.

Cite this