Listening talkers produce great spectral tilt contrasts

Thomas Ulrich Christiansen, Jan Heegård, Peter Juel Henrichsen

Research output: Contribution to conferenceConference abstract for conferenceResearchpeer-review

104 Downloads (Pure)


It is well known that the envelope of the long-term average speech spectrum flattens with vocal effort. A recent study [1] showed that content words had a flatter spectral envelope than content words at the same overall level for a specific Danish
speech material. The present paper investigates whether this effect is present in a larger and more diverse speech material, and if the effect is greater when the talker is listening (participating in a dialogue) as compared to monologue. The monologue speech material consisted of recordings from 18 native talkers of Danish describing a network of colored geometrical shapes taken from DanPASS [2]. The spectral tilt was gauged by calculating the band-level difference in dB between two frequency bands with pass-bands 150 to 803 Hz and 803 to 1358 Hz respectively in 5 ms intervals.
This was done separately for intervals containing content words and function words and grouped by talker. The spectral tilt difference was then calculated as the average band-level difference for function words minus the average band-level difference for content words. This calculation was grouped per talker. For the monologues these differences ranged between 5 and 8 dB for the 18 talkers. Content words were defined as nouns, active verbs, adjectives and adverbs. Function words were defined as articles, pronouns, conjunctions and auxiliary verbs. Words not belonging to any of these categories were not used. The dialogue speech material was also from DanPASS and consisted of ecordings from 13 of the same talkers as the monologues. In the dialogue speech aterial talkers where asked to describe a map with certain discrepancies and negotiate their
way through the map. Spectral tilt differences between content- and functions words were calculated in the same way as for the monologues. The results show that the spectral tilt differences are slightly higher for dialogues than monologues. A two-way anova (grouped by talker and word type) showed that these differences
are significant. We conclude that Danish talkers mark high information
density in spontaneous speech (=content words) by means of flat spectral envelope, not just for monologues, but also for dialogues. Moreover, when engaged in dialogue, talkers enhance this spectral flattening. In our view it is remarkable that conclusions with statistical validity can be reached based on the over-simplified definition of spectral tilt employed in this paper. We speculate that optimizing both the definition of spectral tilt and the word categories comprising content- and function words, may allow us to
observe even greater effects than reported here. The eventual goal of this line of research is to devise a simple, tractable method for distinguishing high information
content from low information content in speech, based on the ubiquitous assumption that content words carry more information than function words. Such a method could
potentially be applied in hearing aids, cochlear implants and automatic speech recognition.
Original languageEnglish
Publication date2012
Publication statusPublished - 2012
EventThe Listening Talker: An interdisciplinary workshop on natural and synthetic modification of speech in response to listening conditions - Informatics Forum, Edinburgh, United Kingdom
Duration: 2 May 20123 May 2012


ConferenceThe Listening Talker
LocationInformatics Forum
CountryUnited Kingdom


  • Spectral tilt
  • Spectral envelope
  • Speech production
  • Speech perception
  • Content words
  • Function words


Dive into the research topics of 'Listening talkers produce great spectral tilt contrasts'. Together they form a unique fingerprint.

Cite this