Skip to main navigation Skip to search Skip to main content

Time and Uncertainty in Computational Models of Audio-Visual Speech Integration

  • Agata Wlaszczyk

Research output: Book/ReportPh.D. thesis

244 Downloads (Orbit)

Abstract

Speech perception is a complex process that involves combining the auditory and visual information into a coherent percept. To do so, the brain must continuously assess whether the incoming noisy sensory information is coherent and either integrate or separate it. When integrated, incongruent cues from different modalities may result in illusions. One well-known example is the McGurk illusion, where conflicting auditory and visual syllabic cues lead to the perception of either fused or combined percepts. In this thesis, we use the McGurk effect to extend existing models of multisensory integration of speech into new domains through two computational studies.

In the first study, we develop a computational Maximum Likelihood Estimation (MLE) model that emphasizes the temporal aspect of cue influence on perception. The model is based on the assumption that the audio-visual speech features are integrated sequentially. We use the model to predict responses to audio-visual congruent and incongruent syllable utterances including single consonants and consonant clusters. We show that it successfully predicts all audio-visual effects found in the data, including both types of the McGurk illusion, and it produces more generalizable predictions than other well-known models of audio-visual integration of speech. In addition, we demonstrate that the model yields interpretable parameter values that correspond to the cues’ contribution to the final percept.

In the second study, we present a computational model based on the Bayesian Causal Inference (BCI) framework to explain both perceptual responses and confidence ratings in a behavioral experiment. We investigate various computational hypotheses about how confidence arises from perceptual uncertainty in a categorization task with multiple discrete categories. We find that accounting for sensory noise alone cannot account for the distributions of confidence ratings, suggesting that confidence ratings are impacted by additional factors, like decision noise. Furthermore, our results provide moderate support for the confidence strategy where the confidence ratings depend on the difference between the posterior probability of the chosen perceptual response and the next most probable option.

The goal of the current work is to develop models that account for a wider range of perceptual phenomena related to speech perception and extend the already existing Bayesian observer models of audio-visual integration. Additionally, we aim to show the importance of using various quantitative methods, like cross-validation, parameter and model recovery, for critical assessment of the computational models.
Original languageEnglish
PublisherTechnical University of Denmark
Number of pages118
Publication statusPublished - 2024

Fingerprint

Dive into the research topics of 'Time and Uncertainty in Computational Models of Audio-Visual Speech Integration'. Together they form a unique fingerprint.

Cite this