Audiovisual Quality Fusion based on Relative Multimodal Complexity

Junyong You, Jari Korhonen, Ulrich Reiter

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review


    In multimodal presentations the perceived audiovisual quality assessment is significantly influenced by the content of both the audio and visual tracks. Based on our earlier subjective quality test for finding the optimal trade-off between audio and video quality, this paper proposes a novel method for relative multimodal complexity analysis to derive the fusion parameter in objective audiovisual quality metrics. Audio and video qualities are first estimated separately using advanced quality models, and then they are combined into the overall audiovisual quality using a linear fusion. Based on carefully designed auditory and visual features, the relative complexity analysis model across sensory modalities is proposed for deriving the fusion parameter. Experimental results have demonstrated that the content adaptive fusion parameter can improve the prediction accuracy of objective audiovisual quality metrics, compared to the fusion parameters obtained from the subjective quality tests using other known optimization methods.
    Original languageEnglish
    Title of host publication2011 18th IEEE International Conference on Image Processing (ICIP)
    Publication date2011
    ISBN (Print)978-1-4577-1304-0
    ISBN (Electronic)978-1-4577-1302-6
    Publication statusPublished - 2011
    Event18th IEEE International Conference on Image Processing - Brussels, Belgium
    Duration: 11 Sep 201114 Sep 2011
    Conference number: 18


    Conference18th IEEE International Conference on Image Processing
    Internet address
    SeriesInternational Conference on Image Processing. Proceedings


    • Content analysis
    • Quality fusion
    • Audiovisual quality assessment
    • Multimodal complexity


    Dive into the research topics of 'Audiovisual Quality Fusion based on Relative Multimodal Complexity'. Together they form a unique fingerprint.

    Cite this