Regularized models of audiovisual integration of speech with predictive power for sparse behavioral data

Tobias S. Andersen*, Ole Winther

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

69 Downloads (Pure)

Abstract

Audiovisual integration can facilitate speech comprehension by integrating information from lip-reading with auditory speech perception. When incongruent acoustic speech is dubbed onto a video of a talking face, this integration can lead to the McGurk illusion of hearing a different phoneme than that spoken by the voice. Several computational models of the information integration process underlying these phenomena exist. All are based on the assumption that the integration process is, in some sense, optimal. They differ, however, in assuming that it is based on either continuous or categorical internal representations. Here we develop models of audiovisual integration of the phonetic information represented on an internal representation that is continuous and cyclical. We compare these models to the Fuzzy Logical Model of Perception (FLMP), which is based on a categorical internal representation. Using cross-validation, we show that model evaluation criteria based on the goodness-of-fit are poor measures of the models’ generalization error even if they take the number of free parameters into account. We also show that the predictive power of all the models benefit from regularization that limits the precision of the internal representation. Finally, we show that, unlike the FLMP, models based on a continuous internal representation have good predictive power when properly regularized.

Original languageEnglish
Article number102404
JournalJournal of Mathematical Psychology
Volume98
Number of pages8
ISSN0022-2496
DOIs
Publication statusPublished - Sept 2020

Keywords

  • Audiovisual integration
  • Computational models
  • Speech perception

Fingerprint

Dive into the research topics of 'Regularized models of audiovisual integration of speech with predictive power for sparse behavioral data'. Together they form a unique fingerprint.

Cite this