Perceptual Evaluation of Immersive Audiovisual Quality

Randy Frans Fela

Research output: Book/ReportPh.D. thesis

2 Downloads (Pure)


Omnidirectional multimedia (e.g., 360◦ video and spatial audio) provide a higher level of viewing experience and audiovisual spatial impression compared to traditional media, but also present a number of challenges in perceptual evaluation. It is well known that overall perceived audiovisual quality is composed of individual factors such as the
perceived quality of audio and video, as well as various terms derived from these two factors. However, two separate domains that are traditionally evaluated in omnidirectional multimedia result in the holistic audiovisual quality perception and prediction models not being well understood. This dissertation investigates the perceptual multimedia quality of 360◦ videos with spatial audio playback via head-mounted displays and multichannel loudspeaker, by conducting perceptual experiments with human subjects and computing prediction metrics. While the primary goal of the Ph.D. study is to understand how the factors underlying multimedia quality are perceived by humans, our preliminary research suggests the need for strategies for elements that support perceptual evaluation of omnidirectional multimedia, such as the availability of synchronized audio-video source/test material, the capability of assessor panel, the efficient design of experiments when conducting multimedia evaluation with varying audiovisual variables, and audiovisual quality assessment data. As a result, in conjunction with the Higher-Order Ambisonics–Sound Scene Repository (HOA-SSR) database project, a novel audiovisual quality assessment dataset that includes subjective and objective scores along with reference and degraded audiovisual material was developed. A pilot study conducted as part of this study showed that an audiovisual experiment is a challenging task. In particular, because of the unique spatial and compressive properties of 360◦ audiovisual media, it is necessary to have a panel of assessors qualified to perform audiovisual tasks so that a number of assessors can be assigned for manageable experimental size while maintaining the quality of the data collected. Therefore, a group of “selected assessor” was established and validated by using our developed framework to train naive assessors to become selected assessors who are expected more capable of providing a reliable and detailed subjective quality data. The results of the comparative study shows that a group of “selected” assessors can provide more reliable and discriminative results (mean opinion score with a confidence interval of 95%) than a group of “non-selected” assessors, commonly referred to as consumers or naive assessors. In multimodal perceptual evaluation, the number of factors and factor levels can increase rapidly, with the result that conventional experimental design becomes unwieldy and an alternative efficient design is desired. The present study addresses this challenge by applying optimal experimental design (OED) to determine the extent to which this approach can be useful in perceptual evaluation studies. The results of a comparative study with the full factorial design (FFD) suggest that OED can provide fewer data points (41.6% efficiency) and a more manageable experiment while maintaining the statistical quality of experimental data that competes FFD. The scientific publications mentioned in this thesis refer to experiments on perceptual evaluation. In the first experiment, it is reported that audio bit rate, video resolution and quantization parameters are influencing factors in the case study. Analysis of audiovisual quality models showed that the power model outperformed the other models in subjective quality (PCC=0.930, SROCC=0.935) and objective quality (PCC=0.911, SROCC=0.924). When objective quality metrics are used together with subjective ratings (MOSAV ), the VMAF–AMBIQUAL combination shows the highest performance compared to other combinations (PCC=0.878, SROCC=0.873). Moreover, all these performances increase to PCC=0.909, SROCC=0.914 by using machine learning prediction models with 10 k-folds in Support Vector Machine. Finally, the latter study proposes and evaluates the quality of a newly developed dataset. An initial benchmark with the dataset shows that the newer version of VISQOL has a higher correlation with subjective data compared to AMBIQUAL, and a version of VMAF4K outperforms the other video quality metrics.
Original languageEnglish
PublisherTechnical University of Denmark
Number of pages177
Publication statusPublished - 2022


  • Omnidirectional multimedia
  • Audiovisual quality
  • Audiovisual quality dataset
  • Optimal experimental design
  • Perceptual quality model


Dive into the research topics of 'Perceptual Evaluation of Immersive Audiovisual Quality'. Together they form a unique fingerprint.

Cite this