Temporal visual cues aid speech recognition

Xiang Zhou, Lars Ross, Tue Lehn-Schiøler, Lucas Parra

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    Abstract

    BACKGROUND: It is well known that under noisy conditions, viewing a speaker's articulatory movement aids the recognition of spoken words. Conventionally it is thought that the visual input disambiguates otherwise confusing auditory input. HYPOTHESIS: In contrast we hypothesize that it is the temporal synchronicity of the visual input that aids parsing of the auditory stream. More specifically, we expected that purely temporal information, which does not convey information such as place of articulation may facility word recognition. METHODS: To test this prediction we used temporal features of audio to generate an artificial talking-face video and measured word recognition performance on simple monosyllabic words. RESULTS: When presenting words together with the artificial video we find that word recognition is improved over purely auditory presentation. The effect is significant (p
    Original languageEnglish
    Title of host publication7th Annual Meeting of the International Multisensory Research Forum
    Publication date2006
    Publication statusPublished - 2006
    Event7th Annual Meeting of the International Multisensory Research Forum -
    Duration: 1 Jan 2006 → …

    Conference

    Conference7th Annual Meeting of the International Multisensory Research Forum
    Period01/01/2006 → …

    Cite this

    Zhou, X., Ross, L., Lehn-Schiøler, T., & Parra, L. (2006). Temporal visual cues aid speech recognition. In 7th Annual Meeting of the International Multisensory Research Forum
    Zhou, Xiang ; Ross, Lars ; Lehn-Schiøler, Tue ; Parra, Lucas. / Temporal visual cues aid speech recognition. 7th Annual Meeting of the International Multisensory Research Forum. 2006.
    @inproceedings{c1323aed162f47c9a380d0593b7faff1,
    title = "Temporal visual cues aid speech recognition",
    abstract = "BACKGROUND: It is well known that under noisy conditions, viewing a speaker's articulatory movement aids the recognition of spoken words. Conventionally it is thought that the visual input disambiguates otherwise confusing auditory input. HYPOTHESIS: In contrast we hypothesize that it is the temporal synchronicity of the visual input that aids parsing of the auditory stream. More specifically, we expected that purely temporal information, which does not convey information such as place of articulation may facility word recognition. METHODS: To test this prediction we used temporal features of audio to generate an artificial talking-face video and measured word recognition performance on simple monosyllabic words. RESULTS: When presenting words together with the artificial video we find that word recognition is improved over purely auditory presentation. The effect is significant (p",
    author = "Xiang Zhou and Lars Ross and Tue Lehn-Schi{\o}ler and Lucas Parra",
    year = "2006",
    language = "English",
    booktitle = "7th Annual Meeting of the International Multisensory Research Forum",

    }

    Zhou, X, Ross, L, Lehn-Schiøler, T & Parra, L 2006, Temporal visual cues aid speech recognition. in 7th Annual Meeting of the International Multisensory Research Forum. 7th Annual Meeting of the International Multisensory Research Forum, 01/01/2006.

    Temporal visual cues aid speech recognition. / Zhou, Xiang; Ross, Lars; Lehn-Schiøler, Tue; Parra, Lucas.

    7th Annual Meeting of the International Multisensory Research Forum. 2006.

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    TY - GEN

    T1 - Temporal visual cues aid speech recognition

    AU - Zhou, Xiang

    AU - Ross, Lars

    AU - Lehn-Schiøler, Tue

    AU - Parra, Lucas

    PY - 2006

    Y1 - 2006

    N2 - BACKGROUND: It is well known that under noisy conditions, viewing a speaker's articulatory movement aids the recognition of spoken words. Conventionally it is thought that the visual input disambiguates otherwise confusing auditory input. HYPOTHESIS: In contrast we hypothesize that it is the temporal synchronicity of the visual input that aids parsing of the auditory stream. More specifically, we expected that purely temporal information, which does not convey information such as place of articulation may facility word recognition. METHODS: To test this prediction we used temporal features of audio to generate an artificial talking-face video and measured word recognition performance on simple monosyllabic words. RESULTS: When presenting words together with the artificial video we find that word recognition is improved over purely auditory presentation. The effect is significant (p

    AB - BACKGROUND: It is well known that under noisy conditions, viewing a speaker's articulatory movement aids the recognition of spoken words. Conventionally it is thought that the visual input disambiguates otherwise confusing auditory input. HYPOTHESIS: In contrast we hypothesize that it is the temporal synchronicity of the visual input that aids parsing of the auditory stream. More specifically, we expected that purely temporal information, which does not convey information such as place of articulation may facility word recognition. METHODS: To test this prediction we used temporal features of audio to generate an artificial talking-face video and measured word recognition performance on simple monosyllabic words. RESULTS: When presenting words together with the artificial video we find that word recognition is improved over purely auditory presentation. The effect is significant (p

    M3 - Article in proceedings

    BT - 7th Annual Meeting of the International Multisensory Research Forum

    ER -

    Zhou X, Ross L, Lehn-Schiøler T, Parra L. Temporal visual cues aid speech recognition. In 7th Annual Meeting of the International Multisensory Research Forum. 2006