Self-organizing maps for measuring similarity of audiovisual speech percepts

Hans-Heinrich Bothe

Research output: Contribution to conferenceConference abstract for conferenceResearchpeer-review


The goal of this work is to find a way to measure similarity of audiovisual speech percepts. Phoneme-related self-organizing maps (SOM) with a rectangular basis are trained with data material from a (labeled) video film. For the training, a combination of auditory speech features and corresponding visual lip features is used. Phoneme-related receptive fields result on the SOM basis; they are speaker dependent and show individual locations and strain. Overlapping main slopes indicate a high similarity of respective units; distortion or extra peaks originate from the influence of other units. Dependent on the training data, these other units may also be contextually immediate neighboring units. The poster demonstrates the idea with text material spoken by one individual subject using a set of simple audio-visual features. The data material for the training process consists of 44 labeled sentences in German with a balanced phoneme repertoire. As a result it can be stated that (i) the SOM can be trained to map auditory and visual features in a topology-preserving way and (ii) they show strain due to the influence of other audio-visual units. The SOM can be used to measure similarity amongst audio-visual speech percepts and to measure coarticulatory effects.
Original languageEnglish
Publication date2005
Publication statusPublished - 2005
EventJournal of the Acoustical Society of America -
Duration: 1 Jan 2005 → …


ConferenceJournal of the Acoustical Society of America
Period01/01/2005 → …

Fingerprint Dive into the research topics of 'Self-organizing maps for measuring similarity of audiovisual speech percepts'. Together they form a unique fingerprint.

Cite this