Multimedia Tools and Applications

, Volume 51, Issue 1, pp 163–186 | Cite as

Automatic prediction of perceptual quality of multimedia signals—a survey

  • Kalpana SeshadrinathanEmail author
  • Alan Conrad Bovik


We survey recent developments in multimedia signal quality assessment, including image, audio, video, and combined signals. Such an overview is timely given the recent explosion in all-digital sensory entertainment and communication devices pervading the consumer space. Owing to the sensory nature of these signals, perceptual models lie at the heart of multimedia signal quality assessment algorithms. We survey these models and recent competitive algorithms and discuss comparison studies that others have conducted. In this context we also describe existing signal quality assessment databases. We envision that the reader will gain a firmer understanding of the broad topic of multimedia quality assessment, of the various sub-disciplines corresponding to different signal types, how these signals types co-relate in producing an overall user experience, and what directions of research remain to be pursued.


Survey Quality assessment Video quality Image quality Structural SIMilarity Motion-based video integrity evaluation Audio quality Full reference Perception 


  1. 1.
    Avcibas I, Sankur B, Sayood K (2002) Statistical evaluation of image quality measures. J Electron Imaging 11(2):206–223CrossRefGoogle Scholar
  2. 2.
    Barkowsky M, Bialkowski J, Bitto R, Kaup A (2007) Temporal registration using 3D phase correlation and a maximum likelihood approach in the perceptual evaluation of video quality. In: IEEE workshop on multimedia signal procGoogle Scholar
  3. 3.
    Beerends JG, Stemerdink JA (1992) A perceptual audio quality measure based on a psychoacoustic sound representation. J Audio Eng Soc 40(12):963–978Google Scholar
  4. 4.
    Born RT, Bradley DC (2005) Structure and function of visual area MT. Annu Rev Neurosci 28:157–189CrossRefGoogle Scholar
  5. 5.
    Brandenburg T, Sporer K (1992) NMR and masking flag: evaluation of quality using perceptual criteria. In: Audio engineering society conference: 11th international conference: test & measurementGoogle Scholar
  6. 6.
    Carnec M, Le Callet P, Barba D (2008) Objective quality assessment of color images based on a generic perceptual reduced reference. Signal Process Image Commun 23(4):239–256CrossRefGoogle Scholar
  7. 7.
    Chandler DM, Hemami SS (2007) VSNR: a wavelet-based visual signal-to-noise ratio for natural images. IEEE Trans Image Process 16(9):2284–2298CrossRefMathSciNetGoogle Scholar
  8. 8.
    Channappayya SS, Bovik AC, Caramanis C, Heath RW Jr (2008) Design of linear equalizers optimized for the structural similarity index. IEEE Trans Image Process 17(6):857–872CrossRefMathSciNetGoogle Scholar
  9. 9.
    Channappayya SS, Bovik AC, Heath RW Jr (2008) Rate bounds on SSIM index of quantized images. IEEE Trans Image Process 17(9):1624–1639CrossRefMathSciNetGoogle Scholar
  10. 10.
    Colomes C, Lever M, Rault J-B, Dehery Y-F, Faucon G (1995) A perceptual model applied to audio bit-rate reduction. J Audio Eng Soc 43(4):233–240Google Scholar
  11. 11.
    Creusere C (2003) Quantifying perceptual distortion in scalably compressed mpeg audio. In: Conference record of the thirty-seventh asilomar conference on signals, systems and computers, vol 1, pp 265–269Google Scholar
  12. 12.
    Creusere C, Hardin J (2010) Assessing the quality of audio containing temporally varying distortions. IEEE Trans Speech Audio Lang Process PP(99):1–1Google Scholar
  13. 13.
    Daly S (1993) The visible difference predictor: An algorithm for the assessment of image fidelity. In: Watson AB (ed) Digital images and human vision. The MIT, pp 176–206Google Scholar
  14. 14.
    Damera-Venkata N, Kite T, Geisler W, Evans B, Bovik A (2000) Image quality assessment based on a degradation model. IEEE Trans Image Process 9(4):636–650CrossRefGoogle Scholar
  15. 15.
    Daubechies I (1988) Orthonormal bases of compactly supported wavelets. Commun Pure Appl Math 41(7):909–996zbMATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Daugman JG (1985) Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. J Opt Soc Am A (Opt Image Sci) 2(7):1160–1169CrossRefGoogle Scholar
  17. 17.
    De Simone F, Naccari M, Tagliasacchi M, Dufaux F, Tubaro S, Ebrahimi T (2009) Subjective assessment of H.264/AVC video sequences transmitted over a noisy channel. In: International workshop on quality of multimedia experience, pp 204–209Google Scholar
  18. 18.
    Dehaene S (2003) The neural basis of the weber-fechner law: a logarithmic mental number line. Trends Cogn Sci 7(4):145–147CrossRefMathSciNetGoogle Scholar
  19. 19.
    Dixon NF, Spitz L (1980) The detection of auditory visual desynchrony. Perception 9(6):719–721CrossRefGoogle Scholar
  20. 20.
    Final report from the video quality experts group on the validation of objective quality metrics for video quality assessment (2000) Available online: Accessed June 2000
  21. 21.
    Fleet DJ, Jepson AD (1990) Computation of component image velocity from local phase information. Int J Comput Vis 5(1):77–104CrossRefGoogle Scholar
  22. 22.
    Foley J (1994) Human luminance pattern-vision mechanisms: masking experiments require a new model. J Opt Soc Am A (Opt Image Sci) 11(6):1710–1719CrossRefGoogle Scholar
  23. 23.
    Fredericksen RE, Hess RF (1997) Temporal detection in human vision: dependence on stimulus energy. J Opt Soc Am A (Opt Image Sci Vis) 14(10):2557–2569CrossRefGoogle Scholar
  24. 24.
    George S, Zielinski S, Rumsey F (2006) Feature extraction for the prediction of multichannel spatial audio fidelity. IEEE Trans Speech Audio Lang Process 14(6):1994–2005CrossRefGoogle Scholar
  25. 25.
    Hands DS (2004) A basic multimedia quality model. IEEE Trans Multimedia 6(6):806–816CrossRefGoogle Scholar
  26. 26.
    Hekstra AP, Beerends JG, Ledermann D, de Caluwe FE, Kohler S, Koenen RH, Rihs S, Ehrsam M, Schlauss D (2002) PVQM—A perceptual video quality measure. Signal Process Image Commun 17:781–798CrossRefGoogle Scholar
  27. 27.
    Herre J, Eberlein E, Schott H, Schmidmer C (1992) Analysis tool for realtime measurements using perceptual criteria. In: Audio engineering society conference: 11th international conference: test & measurementGoogle Scholar
  28. 28.
    Hewage CTER, Worrall ST, Dogan S, Kondoz AM (2008) Prediction of stereoscopic video quality using objective quality models of 2-d video. Electron Lett 44(16):963–965CrossRefGoogle Scholar
  29. 29.
    Huber R, Kollmeier B (2006) PEMO-Q—A new method for objective audio quality assessment using a model of auditory perception. IEEE Trans Speech Audio Lang Process 14(6):1902–1911CrossRefGoogle Scholar
  30. 30.
    Itti L, Koch C (2001) Computational modelling of visual attention. Nat Rev Neurosci 2(3):194–203CrossRefGoogle Scholar
  31. 31.
    ITU-R Recommendation BT.500-11 (2000) Methodology for the subjective assessment of the quality of television pictures. International Telecommunications Union, Tech RepGoogle Scholar
  32. 32.
    ITU-T Recommendation P.800 (1996) Methods for subjective determination of transmission quality. International Telecommunications Union, Tech RepGoogle Scholar
  33. 33.
    Kandadai S, Hardin J, Creusere C (2008) Audio quality assessment using the mean structural similarity measure. In: IEEE international conference on acoustics, speech and signal processing, pp 221–224Google Scholar
  34. 34.
    Karjalainen M (1985) A new auditory model for the evaluation of sound quality of audio systems. In: IEEE international conference on acoustics, speech, and signal processing, vol 10, pp 608–611Google Scholar
  35. 35.
    Kelly DH (1984) Retinal inhomogeneity. i. spatiotemporal contrast sensitivity. J Opt Soc Am A 1(1):107–113CrossRefGoogle Scholar
  36. 36.
    Lambrecht CJvdB, Kunt M (1998) Characterization of human visual sensitivity for video imaging applications. Signal Process 67(3):255–269zbMATHCrossRefGoogle Scholar
  37. 37.
    Le Meur O, Le Callet P, Barba D, Thoreau D (2006) A coherent computational approach to model bottom-up visual attention. IEEE Trans Pattern Anal Mach Intell 28(5):802–817CrossRefGoogle Scholar
  38. 38.
    Legge GE, Foley JM (1980) Contrast masking in human vision. J Opt Soc Am 70(12):1458–1471CrossRefGoogle Scholar
  39. 39.
    Lubin J (1993) The use of psychophysical data and models in the analysis of display system performance. In: Watson AB (ed) Digital images and human vision. The MIT, pp 163–178Google Scholar
  40. 40.
    Malkowski M, Claben D (2008) Performance of video telephony services in UMTS using live measurements and network emulation. Wirel Pers Commun 1:19–32CrossRefGoogle Scholar
  41. 41.
    Mannos J, Sakrison D (1974) The effects of a visual fidelity criterion of the encoding of images. IEEE Trans Inf Theory 20(4):525–536zbMATHCrossRefGoogle Scholar
  42. 42.
    Masry M, Hemami SS, Sermadevi Y (2006) A scalable wavelet-based video distortion metric and applications. IEEE Trans Circuits Syst Video Technol 16(2):260–273CrossRefGoogle Scholar
  43. 43.
    Mehrgardt S, Mellert V (1977) Transformation characteristics of the external human ear. J Acoust Soc Am 61(6):1567–1576CrossRefGoogle Scholar
  44. 44.
    Method for objective measurements of perceived audio quality. ITU Std. BS. 1387, 1999Google Scholar
  45. 45.
    Moorthy A, Seshadrinathan K, Soundararajan R, Bovik AC (2010) Wireless video quality assessment: a study of subjective scores and objective algorithms. IEEE Trans Circuits Syst Video Technol 20(4):587–599CrossRefGoogle Scholar
  46. 46.
    Movshon JA, Newsome WT (1996) Visual response properties of striate cortical neurons projecting to Area MT in macaque monkeys. J Neurosci 16(23):7733–7741Google Scholar
  47. 47.
    Nachmias J, Sansbury RV (1974) Grating contrast: discrimination may be better than detection. Vis Res 14(10):1039–1042CrossRefGoogle Scholar
  48. 48.
    Objective perceptual video quality measurement techniques for digital cable television in the presence of a full reference (2004) International Telecommunications Union Std. ITU-T Rec J 144Google Scholar
  49. 49.
    Paillard B, Mabilleau P, Morissette S, Soumagne J (1992) PERCEVAL: Perceptual evaluation of the quality of audio signals. J Audio Eng Soc 40(1/2):21–31Google Scholar
  50. 50.
    Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. International Telecommunications Union Std., p 862, 2001Google Scholar
  51. 51.
    Pinson MH, Wolf S (2004) A new standardized method for objectively measuring video quality. IEEE Trans Broadcast 50(3):312–322CrossRefGoogle Scholar
  52. 52.
    Ponomarenko N, Lukin V, Zelensky A, Egiazarian K, Carli M, Battisti F (2009) TID2008—a database for evaluation of full-reference visual quality assessment metrics. Adv Modern Radio-Electronics 10:30–45Google Scholar
  53. 53.
    Rajashekar U, van der Linde I, Bovik AC, Cormack LK (2008) GAFFE: a gaze-attentive fixation finding engine. IEEE Trans Image Process 17(4):564–573CrossRefMathSciNetGoogle Scholar
  54. 54.
    Rihs S (1995) The influence of audio on perceived picture quality and subjective audio-video delay tolerance. RACE MOSAIC deliverable R211 180CESR007.B1, Tech. RepGoogle Scholar
  55. 55.
    Rix AW, Beerends JG, Kim D-S, Kroon P, Ghitza O (2006) Objective assessment of speech and audio quality—technology and applications. IEEE Trans Speech Audio Lang Process 14(6):1890–1901CrossRefGoogle Scholar
  56. 56.
    Rix AW, Hollier MP, Hekstra AP, Beerends JG (2002) Perceptual evaluation of speech quality (PESQ): the new ITU standard for end-to-end speech quality assessment part I–time-delay compensation. J Audio Eng Soc 50(10):755–764Google Scholar
  57. 57.
    Robson JG (1966) Spatial and temporal contrast-sensitivity functions of the visual system. J Opt Soc Am 56(8):1141–1142CrossRefGoogle Scholar
  58. 58.
    Ross J, Speed HD (1991) Contrast adaptation and contrast masking in human vision. Proc Biol Sci 246(1315):61–70CrossRefGoogle Scholar
  59. 59.
    Schober HAW, Hilz R (1965) Contrast sensitivity of the human eye for square-wave gratings. J Opt Soc Am 55(9):1086–1090CrossRefGoogle Scholar
  60. 60.
    Schroeder MR, Atal BS, Hall JL (1978) Optimizing digital speech coders by exploiting masking properties of the human ear. J Acoust Soc Am 64(S1):S139–S139CrossRefGoogle Scholar
  61. 61.
    Seshadrinathan K, Bovik AC (2007) A structural similarity metric for video based on motion models. In: IEEE intl. conf. on acoustics, speech, and signal procGoogle Scholar
  62. 62.
    Seshadrinathan K, Bovik AC (2008) Unifying analysis of full reference image quality assessment. In: IEEE intl. conf. on image proc. San Diego, CA, pp 1200–1203Google Scholar
  63. 63.
    Seshadrinathan K, Bovik AC (2009) Video quality assessment. In: Bovik AC (ed) The essential guide to video processing, chapter 14. Academic, pp 417–436Google Scholar
  64. 64.
    Seshadrinathan K, Bovik AC (2010) Motion tuned spatio-temporal quality assessment of natural videos. IEEE Trans Image Process 19(2):335–350CrossRefGoogle Scholar
  65. 65.
    Seshadrinathan K, Safranek RJ, Chen J, Pappas TN, Sheikh HR, Simoncelli EP, Wang Z, Bovik AC (2009) Image quality assessment. In: Bovik AC (ed) The essential guide to image processing, chapter 21. Academic, pp 553–596Google Scholar
  66. 66.
    Seshadrinathan K, Soundararajan R, Bovik AC, Cormack LK (2010) Study of subjective and objective quality assessment of video. IEEE Trans Image Process 19(6):1427–1441CrossRefGoogle Scholar
  67. 67.
    Sheikh HR, Bovik AC (2006) An evaluation of recent full reference image quality assessment algorithms. IEEE Trans Image Process 15(11):3440–3451CrossRefGoogle Scholar
  68. 68.
    Sheikh HR, Bovik AC (2006) Image information and visual quality. IEEE Trans Image Process 15(2):430–444CrossRefGoogle Scholar
  69. 69.
    Simoncelli EP, Heeger DJ (1998) A model of neuronal responses in visual area MT. Vis Res 38(5):743–761CrossRefGoogle Scholar
  70. 70.
    Sporer T (1997) Objective audio signal evaluation-applied psychoacoustics for modeling the perceived quality of digital audio. In: Audio engineering society convention 103Google Scholar
  71. 71.
    Steinmetz R (1996) Human perception of jitter and media synchronization. IEEE J Sel Areas Commun 14(1):61–72CrossRefGoogle Scholar
  72. 72.
    Terhardt E (1979) Calculating virtual pitch. Hear Res 1(2):155–182CrossRefGoogle Scholar
  73. 73.
    Teo PC, Heeger DJ (1994) Perceptual image distortion. In: Proceedings of the IEEE international conference on image processing, vol 2. IEEE, pp 982–986Google Scholar
  74. 74.
    The Video Quality Experts Group (2003) Final VQEG report on the validation of objective models of video quality assessment. Available online: Accessed 25 August 2003
  75. 75.
    Thiede E, Kabot T (1996) A new perceptual quality measure for bit-rate reduced audio. In: Audio engineering society convention 100Google Scholar
  76. 76.
    Thiede T, Treurniet WC, Bitto R, Schmidmer C, Sporer T, Beerends JG, Colomes C (2000) PEAQ—the ITU standard for objective measurement of perceived audio quality. J Audio Eng Soc 48(1/2):3–29Google Scholar
  77. 77.
    Toet A, Lucassen MP (2003) A new universal colour image fidelity metric. Displays 24(4–5):197–207CrossRefGoogle Scholar
  78. 78.
    van den Branden Lambrecht CJ, Verscheure O (1996) Perceptual quality measure using a spatiotemporal model of the human visual system. In: Proc. SPIE, vol 2668, no. 1. SPIE, San Jose, pp 450–461CrossRefGoogle Scholar
  79. 79.
    Van der Weken D, Nachtegael M, Kerre EE (2004) Using similarity measures and homogeneity for the comparison of images. Image Vis Comput 22(9):695–702CrossRefGoogle Scholar
  80. 80.
    van Dijk AM, Martens J-B, Watson AB (1995) Quality asessment of coded images using numerical category scaling. In: Proc. SPIE—advanced image and video communications and storage technologiesGoogle Scholar
  81. 81.
    van Nes FL, Bouman MA (1967) Spatial modulation transfer in the human eye. J Opt Soc Am 57(3):401–406CrossRefGoogle Scholar
  82. 82.
    Wandell BA (1995) Foundations of vision. Sinauer Associates Inc., SunderlandGoogle Scholar
  83. 83.
    Wang S, Sekey A, Gersho A (1992) An objective measure for predicting subjective quality of speech coders. IEEE J Sel Areas Commun 10(5):819–829CrossRefGoogle Scholar
  84. 84.
    Wang Z, Bovik AC (2002) A universal image quality index. IEEE Signal Process Lett 9(3):81–84CrossRefGoogle Scholar
  85. 85.
    Wang Z, Bovik AC (2006) Modern image quality assessment. Morgan and Claypool Publishing Co., New YorkGoogle Scholar
  86. 86.
    Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612CrossRefGoogle Scholar
  87. 87.
    Wang Z, Li Q (2007) Video quality assessment using a statistical model of human visual speed perception. J Opt Soc Am A Opt Image Sci Vis 24(12):B61–B69CrossRefGoogle Scholar
  88. 88.
    Wang Z, Lu L, Bovik AC (2004) Video quality assessment based on structural distortion measurement. Signal Process Image Commun 19(2):121–132CrossRefGoogle Scholar
  89. 89.
    Wang Z, Simoncelli E, Bovik A, Matthews M (2003) Multiscale structural similarity for image quality assessment. In: IEEE asilomar conference on signals, systems and computers, pp 1398–1402Google Scholar
  90. 90.
    Wang Z, Simoncelli EP (2005) Translation insensitive image similarity in complex wavelet domain. In: IEEE international conference on acoustics, speech, and signal processing, pp 573–576Google Scholar
  91. 91.
    Watson AB (1987) The cortex transform: rapid computation of simulated neural images. Comput Vis Graph Image Process 39(3):311–327CrossRefGoogle Scholar
  92. 92.
    Watson AB (ed) (1993) Digital images and human vision. The MITGoogle Scholar
  93. 93.
    Watson AB, Hu J, McGowan JF III (2001) Digital video quality metric based on human vision. J Electron Imaging 10(1):20–29CrossRefGoogle Scholar
  94. 94.
    Winkler S (1999) Perceptual distortion metric for digital color video. In: Proc. SPIE human vision and electronic imaging, vol 3644, no 1. San Jose, CA, pp 175–184Google Scholar
  95. 95.
    Winkler S (2005) Digital video quality. Wiley, New YorkGoogle Scholar
  96. 96.
    Zielinski SK, Rumsey F, Kassier R, Bech S (2005) Development and initial validation of a multichannel audio quality expert system. J Audio Eng Soc 53(1/2):4–21Google Scholar
  97. 97.
    Zwicker E (1961) Subdivision of the audible frequency range into critical bands (frequenzgruppen). J Acoust Soc Am 33(2):248–248CrossRefGoogle Scholar
  98. 98.
    Zwicker E, Scharf B (1965) A model of loudness summation. Psychol Rev 72(1):3–26CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Intel CorporationSanta ClaraUSA
  2. 2.Dept. of Electrical and Computer Engg.The University of Texas at AustinAustinUSA

Personalised recommendations