Deep Video Quality Assessor: From Spatio-Temporal Visual Sensitivity to a Convolutional Neural Aggregation Network

  • Woojae Kim
  • Jongyoo Kim
  • Sewoong Ahn
  • Jinwoo Kim
  • Sanghoon LeeEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11205)


Incorporating spatio-temporal human visual perception into video quality assessment (VQA) remains a formidable issue. Previous statistical or computational models of spatio-temporal perception have limitations to be applied to the general VQA algorithms. In this paper, we propose a novel full-reference (FR) VQA framework named Deep Video Quality Assessor (DeepVQA) to quantify the spatio-temporal visual perception via a convolutional neural network (CNN) and a convolutional neural aggregation network (CNAN). Our framework enables to figure out the spatio-temporal sensitivity behavior through learning in accordance with the subjective score. In addition, to manipulate the temporal variation of distortions, we propose a novel temporal pooling method using an attention model. In the experiment, we show DeepVQA remarkably achieves the state-of-the-art prediction accuracy of more than 0.9 correlation, which is \(\sim \)5% higher than those of conventional methods on the LIVE and CSIQ video databases.


Video quality assessment Visual sensitivity Convolutional neural network Attention mechanism HVS Temporal pooling 



This work was supported by Institute for Information & communications Technology Promotion through the Korea Government (MSIP) (No. 2016-0-00204, Development of mobile GPU hardware for photo-realistic real-time virtual reality).


  1. 1.
    Ninassi, A., Le Meur, O., Le Callet, P., Barba, D.: Considering temporal variations of spatial visual distortions in video quality assessment. IEEE J. Sel. Top. Signal Process. 3(2), 253–265 (2009)CrossRefGoogle Scholar
  2. 2.
    Bovik, A.C.: Automatic prediction of perceptual image and video quality. Proc. IEEE 101(9), 2008–2024 (2013)Google Scholar
  3. 3.
    Kim, J., Lee, S.: Deep learning of human visual sensitivity in image quality assessment framework. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  4. 4.
    Suchow, J.W., Alvarez, G.A.: Motion silences awareness of visual change. Curr. Biol. 21(2), 140–143 (2011)CrossRefGoogle Scholar
  5. 5.
    Fenimore, C., Libert, J.M., Roitman, P.: Mosquito noise in mpeg-compressed video: test patterns and metrics. In: Proceedings of SPIE the International Society For Optical Engineering, pp. 604–612. International Society for Optical Engineering (2000)Google Scholar
  6. 6.
    Jacquin, A., Okada, H., Crouch, P.: Content-adaptive postfiltering for very low bit rate video. In: Proceedings of Data Compression Conference, DCC 1997, pp. 111–120. IEEE (1997)Google Scholar
  7. 7.
    Saad, M.A., Bovik, A.C., Charrier, C.: Blind prediction of natural video quality. IEEE Trans. Image Process. 23(3), 1352–1365 (2014)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Manasa, K., Channappayya, S.S.: An optical flow-based full reference video quality assessment algorithm. IEEE Trans. Image Process. 25(6), 2480–2492 (2016)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Kim, T., Lee, S., Bovik, A.C.: Transfer function model of physiological mechanisms underlying temporal visual discomfort experienced when viewing stereoscopic 3D images. IEEE Trans. Image Process. 24(11), 4335–4347 (2015)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Kim, J., Zeng, H., Ghadiyaram, D., Lee, S., Zhang, L., Bovik, A.C.: Deep convolutional neural models for picture-quality prediction: challenges and solutions to data-driven image quality assessment. IEEE Signal Process. Mag. 34(6), 130–141 (2017)CrossRefGoogle Scholar
  11. 11.
    Seshadrinathan, K., Soundararajan, R., Bovik, A.C., Cormack, L.K.: Study of subjective and objective quality assessment of video. IEEE Trans. Image Process. 19(6), 1427–1441 (2010)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Park, J., Seshadrinathan, K., Lee, S., Bovik, A.C.: Video quality pooling adaptive to perceptual distortion severity. IEEE Trans. Image Process. 22(2), 610–620 (2013)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Vinyals, O., Bengio, S., Kudlur, M.: Order matters: sequence to sequence for sets. arXiv preprint arXiv:1511.06391 (2015)
  14. 14.
    Yang, J., et al.: Neural aggregation network for video face recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2492–2495Google Scholar
  15. 15.
    Graves, A., Wayne, G., Danihelka, I.: Neural turing machines. arXiv preprint arXiv:1410.5401 (2014)
  16. 16.
    Robson, J.: Spatial and temporal contrast-sensitivity functions of the visual system. JOSA 56(8), 1141–1142 (1966)CrossRefGoogle Scholar
  17. 17.
    Lee, S., Pattichis, M.S., Bovik, A.C.: Foveated video quality assessment. IEEE Trans. Multimed. 4(1), 129–132 (2002)CrossRefGoogle Scholar
  18. 18.
    Lee, S., Pattichis, M.S., Bovik, A.C.: Foveated video compression with optimal rate control. IEEE Trans. Image Process. 10(7), 977–992 (2001)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Legge, G.E., Foley, J.M.: Contrast masking in human vision. JOSA 70(12), 1458–1471 (1980)CrossRefGoogle Scholar
  20. 20.
    Kim, H., Lee, S., Bovik, A.C.: Saliency prediction on stereoscopic videos. IEEE Trans. Image Process. 23(4), 1476–1490 (2014)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Mittal, A., Saad, M.A., Bovik, A.C.: A completely blind video integrity oracle. IEEE Trans. Image Process. 25(1), 289–300 (2016)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Le Callet, P., Viard-Gaudin, C., Barba, D.: A convolutional neural network approach for objective video quality assessment. IEEE Trans. Neural Netw. 17(5), 1316–1327 (2006)CrossRefGoogle Scholar
  23. 23.
    Kim, J., Nguyen, A.D., Lee, S.: Deep CNN-based blind image quality predictor. IEEE Trans. Neural Netw. Learn. Syst. PP(99), 1–14 (2018)Google Scholar
  24. 24.
    Chandler, D.M., Hemami, S.S.: VSNR: a wavelet-based visual signal-to-noise ratio for natural images. IEEE Trans. Image Process. 16(9), 2284–2298 (2007)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Seshadrinathan, K., Bovik, A.C.: Motion tuned spatio-temporal quality assessment of natural videos. IEEE Trans. Image Process. 19(2), 335–350 (2010)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Li, Y., Po, L.M., Cheung, C.H., Xu, X., Feng, L., Yuan, F., Cheung, K.W.: No-reference video quality assessment with 3D shearlet transform and convolutional neural networks. IEEE Trans. Circuits Syst. Video Technol. 26(6), 1044–1057 (2016)CrossRefGoogle Scholar
  27. 27.
    Daly, S.J.: Visible differences predictor: an algorithm for the assessment of image fidelity. In: Human Vision, Visual Processing, and Digital Display III, vol. 1666, pp. 2–16. International Society for Optics and Photonics (1992)Google Scholar
  28. 28.
    Kim, J., Lee, S.: Fully deep blind image quality predictor. IEEE J. Sel. Top. Signal Process. 11(1), 206–220 (2017)CrossRefGoogle Scholar
  29. 29.
    Oh, H., Ahn, S., Kim, J., Lee, S.: Blind deep S3D image quality evaluation via local to global feature aggregation. IEEE Trans. Image Process. 26(10), 4923–4936 (2017)MathSciNetCrossRefGoogle Scholar
  30. 30.
    Ye, P., Kumar, J., Kang, L., Doermann, D.: Unsupervised feature learning framework for no-reference image quality assessment. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1098–1105. IEEE (2012)Google Scholar
  31. 31.
    Laboratory of computational perception & image quality, Oklahoma State University, CSIQ video database.
  32. 32.
    VQEG: Final report from the video quality experts group on the validation of objective models of video quality assessment, phase IIGoogle Scholar
  33. 33.
    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)CrossRefGoogle Scholar
  34. 34.
    Sheikh, H.R., Bovik, A.C.: Image information and visual quality. IEEE Trans. Image Process. 15(2), 430–444 (2006)CrossRefGoogle Scholar
  35. 35.
    Vu, P.V., Vu, C.T., Chandler, D.M.: A spatiotemporal most-apparent-distortion model for video quality assessment. In: 18th IEEE International Conference on Image Processing (ICIP), pp. 2505–2508. IEEE (2011)Google Scholar
  36. 36.
    Vu, P.V., Chandler, D.M.: ViS3: an algorithm for video quality assessment via analysis of spatial and spatiotemporal slices. J. Electron. Imaging 23(1), 013016 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Woojae Kim
    • 1
  • Jongyoo Kim
    • 2
  • Sewoong Ahn
    • 1
  • Jinwoo Kim
    • 1
  • Sanghoon Lee
    • 1
    Email author
  1. 1.Department of Electrical and Electronic EngineeringYonsei UniversitySeoulRepublic of Korea
  2. 2.Microsoft ResearchBeijingChina

Personalised recommendations