Closed-loop attention control of audio-visual speech

Alessandro Catania, Daniel D.E. Wong, Jonatan Märcher-Rørsted, Torsten Dau, Alain de Cheveigne, Jens Hjortkjær

Research output: Contribution to conferenceConference abstract for conferenceResearchpeer-review


When attending to a speech source in acoustic environments with many talkers, low-frequency activity in auditory cortex is known to be selectively synchronized with slow amplitude fluctuation in the attended speech signal. In everyday communication, a listener can typically also see the face of the attended talker, but it remains unclear how attention-driven speech processing is influenced by visual information. Here, we investigated the impact of visual information on a closed-loop system that decodes the attended talker from scalp EEG and then amplifies the acoustic speech signal of that talker. To decode attention in real-time from scalp EEG, we used canonical correlation analysis (CCA) in order to relate multichannel EEG to a model of the audio-visual (AV) speech stimulus. First, we investigated a model of the temporal envelope of the acoustic speech signals passed through a modulation filtering stage mimicking the auditory midbrain. We found higher attention decoding accuracy and faster attention switching of the closed-loop system for listeners trained with audio-visual speech, compared to listeners presented with only audio. We also observed an earlier response to the acoustic envelope with audio-visual speech compared to audio-only speech. Next, we found that the attended talker could be decoded based on a CCA-model of visual features alone, using a measure of optical flow. Finally, combining audio and visual features in a CCA model improved accuracy further compared to models based on either auditory or visual features alone.
Original languageEnglish
Publication date2019
Publication statusPublished - 2019
EventBernstein Conference 2019 - Berlin, Germany
Duration: 17 Sep 201920 Sep 2019


ConferenceBernstein Conference 2019


Dive into the research topics of 'Closed-loop attention control of audio-visual speech'. Together they form a unique fingerprint.

Cite this