Frequency Selective Filtering of the Modulation Spectrum and its Impact on Consonant Identification

Publication: Research - peer-reviewArticle in proceedings – Annual report year: 2009

View graph of relations

The spectro-temporal coding of Danish consonants was investigated using an information-theoretic approach. Listeners were asked to identify eleven different consonants spoken in a CV[l] syllable context (where C refers to the initial consonant, V refers to one of three vowels, [I, a, u], and [l] refers to the syllable-final liquid segment). Each syllable was processed so that only a portion of the original audio spectrum was present. Narrow (three-quarter octave) bands of speech, with center frequencies of 750 Hz, 1500 Hz and 3000 Hz, were presented individually and in combination with each other. The modulation spectrum of each band was low-pass filtered at 24, 12, 6 and 3 Hz. Confusion matrices of the consonant-identification data were computed, and from these the amount of information transmitted for each of three phonetic feature dimensions – voicing, manner and place of articulation – was calculated for each condition. This form of analysis provides a simple means of determining whether information associated with each phonetic feature dimension combines linearly across the audio spectrum, and, if not, delineates a method for characterizing the (non-linear) nature of information integration. In addition, the analysis provides a means to associate specific portions of the modulation spectrum with phonetic feature properties. Such analyses indicate that: (1) Accurate, robust decoding of place-of-articulation information requires broadband cross-spectral integration (2) Place-of-articulation information is associated most closely with the modulation spectrum above 6 Hz, with the most significant contribution coming from the region above 12 Hz. (3) Place-of-articulation information is crucial for accurate consonant recognition. Hence, consonant decoding requires cross-spectral integration of the modulation spectrum above 8 Hz. (4) Voicing is mainly associated with the modulation spectrum between 3 and 6 Hz (with a smaller contribution made by the region above 12 Hz). (5) Manner of articulation is most closely associated with the portion of the modulation spectrum above 12 Hz. This form of information-theoretic analysis can be used to delineate those parts of the speech signal of greatest importance for encoding phonetic features associated with intelligibility and speech understanding.
Original languageEnglish
TitleLinguistic Theory Raw Sound
EditorsPeter Juel-Henrichsen
Number of pages260
PublisherSamfundslitteratur
Publication date2009
Pages119-133
ISBN (print)978-87-593-1479-1
StatePublished

Conference

ConferenceLinguistic Theory and Raw Sound
CityMullsjö, Sweden
Period01/01/09 → …
NameCopenhagen Studies in Language
Number40
Download as:
Download as PDF
Select render style:
APAAuthorCBEHarvardMLAStandardVancouverShortLong
PDF
Download as HTML
Select render style:
APAAuthorCBEHarvardMLAStandardVancouverShortLong
HTML
Download as Word
Select render style:
APAAuthorCBEHarvardMLAStandardVancouverShortLong
Word

ID: 5650198