Audio-visual scene analysis in reverberant multi-talker environments

Axel Ahrens, Kasper Duemose Lund, Torsten Dau

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

142 Downloads (Pure)


Normal-hearing subjects are accurate in localizing sound sources even in reverberant multi-talker environments (e.g., Kopčo, 2010; Weller, 2016). Weller et al. (2016) showed that subjects can accurately analyse reverberant multi-talker scenes with up to four simultaneous talkers. While multi-talker scene analysis has mainly been investigated with only auditory information, the addition of visual information might influence the subjects’ perception. To investigate the visual influence, audio-visual scenes with a varying number of talkers and degrees of reverberation were considered in the present study. The acoustic information was provided using a spherical loudspeaker array and the visual information was provided using head-tracked virtual reality glasses. The visual information represented various possible talker locations and the subjects were asked to identify the number of talkers and their specific locations. For the identification of talkers, subjects had to label visual locations with headlines from the talker’s speech topic. It was hypothesized that the addition of visual information improves subjects’ ability to analyse complex auditory scenes, while the amount of reverberation impairs the overall performance.

Original languageEnglish
Title of host publicationProceedings of the 23rd International Congress on Acoustics
PublisherDeutsche Gesellschaft für Akustik e.V.
Publication date2019
ISBN (Print)978-3-939296-15-7
Publication statusPublished - 2019
Event23rd International Congress on Acoustics - Eurogress, Aachen , Germany
Duration: 9 Sept 201913 Sept 2019


Conference23rd International Congress on Acoustics
Internet address


  • Auditory Scene Analysis
  • Speech Perception
  • Virtual Reality


Dive into the research topics of 'Audio-visual scene analysis in reverberant multi-talker environments'. Together they form a unique fingerprint.

Cite this