Introduction
The recent development of a variety of powerful virtual reality (VR) devices has led to the popularization of panoramic and omnidirectional videos and images (ODV or ODI), which can provide users with a high degree of freedom(i.e.,3DoF and 6DoF) to navigate inside the videos. Users can only see part of ODV in current viewport during viewing. While some quality assessment metric (i.e., PSNR, MS-SSIM, VIFP, VSI, FSIM and QSTAR) for traditional plane video don't work well in the immersive viewing environment when we use them to measure the quality loss in our study. Based on this fact, we develop a model to express the joint impact of different Spatial resolution(SR), Temporal resolution (TR) and Amplitude resolution (Quantization parameter QP) (STAR) on the immersive perceptual quality.
Model Description
We performed the subjective quality assessments on salient viewport videos extracted from the ODVs with the HTC Vive system. To get rating video sequences, we sampling the viewport videos extracted from the entire ODVs to different spatial resolutions, temporal resolutions and amplitude resolutions. We obtained the corresponding mean opinion socre (MOS) by conducting subjective quality assessment with videos in these STAR combinations. The viewport videos we extracted are shown at the bottom of the page.
Fig. 1. Illustration of selecting saliency region
We divide the overall model into 3 parts corresponding to the imapct of SR, QP and TR as fllowing. Each part contains only one variable parameter that can be calculated quickly with the viewport-video feature. All the rest model parameters are fixed via least squared error (LSE) fitting with the MOS data we obtained. Note that the s, q and t we taked into calculation are all normalized.
To deal with the shifty viewport loaction when viewing the ODVs, we also proposed a tile-level parameters prediction method to quickly predict the content-depend parameters in a specific viewport. We can get the content-depend parameters based on the loaction of the current viewport and some pre-calculated data in each tile. Such operations will not affect the accuracy of our proposed model.
Model Performace
To further validate the viewport-based quality model, we perform another subjective quality assessment with anotner 10 viewport videos. Based on the tiled feature, our proposed model can also correlate well with the MOS data we obtained for validation(PCC >0.95,SRCC>0.95 and rRMSE<8.50%).
Quality Model for ODV
correlates with the collected MOS very well with competitive performance to the state-of-the-art algorithm, across another four independent and third-party ODV assessment datasets, including IPP_IVQD, IVQAD2017, ISTOmnidirectional and VQA-ODV
Fig. 3. Illustration of Saliency Aggregation
The videos we used, corresponding MOS data and Matlab code for downloading here or github.
AerialCity
BearAttack
Broadway
Harbor
PoleVault
JamSession
Highway
Canolafield
KiteFlite(Training Video)