In-loop perceptual model-based rate-distortion optimization for HEVC real-time encoder

  • Qiang Hu
  • Jun Zhou
  • Xiaoyun Zhang
  • Zhiyong Gao
  • Ming-Ting Sun
Original Research Paper


In this paper, a novel High Efficiency Video Coding (HEVC)-compliant perceptual rate-distortion optimization (RDO) scheme is proposed based on motion attention and visual distortion sensitivity models, which both fully utilize in-loop coding information of HEVC. In detail, the motion attention model is designed by using the motion vectors (MVs) estimated during the inter-prediction process. The MV field is refined based on maximum a posteriori (MAP) estimation to remove MV outliers and improve the model’s efficiency. In addition, the visual distortion sensitivity is modeled by using the spatiotemporal energy of AC coefficients, which are obtained from HEVC transform process. Then, these two models are incorporated together into the RDO process. As a result, the Lagrange multiplier and quantization parameter are adjusted adaptively in an analytical way. Since the two models are calculated within the HEVC coding loop, the complexity increase is limited. The experimental results indicate that the proposed perceptual RDO scheme can achieve significantly better rate-VQM performance than the conventional RDO scheme. Specifically, the BD-rate can reach a maximum 24.45% and an average 13.68% reduction in terms of the Bjontegaard Delta metric compared to HEVC practical encoder x265.


HEVC Rate-distortion optimization (RDO) Motion attention Visual distortion sensitivity 


  1. 1.
    Sullivan, G., Ohm, J., Han, W.J., Wiegand, T.: High efficiency video coding (HEVC) text specification draft 10. JCTVC-L1003. Geneva, CH (2013)Google Scholar
  2. 2.
    Wiegand, T., Sullivan, G., Bjontegaard, G., Luthra, A.: Overview of the H.264/ AVC video coding standard. IEEE Trans. Circuits Syst. Video Technol. 13(7), 560–576 (2003)CrossRefGoogle Scholar
  3. 3.
    Ohm, J.R., Sullivan, G.J., Schwarz, H., Tan, T.K., Wiegand, T.: Comparison of the coding efficiency of video coding standards including high efficiency video coding (HEVC). IEEE Trans. Circuits Syst. Video Technol. 22(12), 1669–1684 (2012)CrossRefGoogle Scholar
  4. 4.
    Sullivan, G., Ohm, J., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012)CrossRefGoogle Scholar
  5. 5.
    Rehman, A., Wang, Z.: SSIM-inspired perceptual video coding for HEVC. In: Proceedings of IEEE International Conference on Multimedia Expo, pp. 497–502 (2012)Google Scholar
  6. 6.
    Yang, C.L., Wang, H.X., Po, L.M.: Improved inter prediction based on structural similarity in H.264. In: Proceedings of IEEE International Conference on Signal Processing Communication, vol. 2, pp. 340–343 (2007)Google Scholar
  7. 7.
    Yang, C.L., Leung, R.K., Po, L.M., Mai, Z.Y.: An SSIM-optimal H.264/AVC inter frame encoder. In: Proceedings of IEEE International Conference on Intelligence Computation and Intelligence Systems, vol. 4, pp. 291–295 (2009)Google Scholar
  8. 8.
    Huang, Y.H., Ou, T.S., Su, P.Y., Chen, H.H.: Perceptual rate-distortion optimization using structural similarity index as quality metric. IEEE Trans. Circuits Syst. Video Technol. 20(11), 1614–1624 (2010)CrossRefGoogle Scholar
  9. 9.
    Chen, H.H., Huang, Y.H., Su, P.Y., Ou, T.S.: Improving video coding quality by perceptual rate-distortion optimization. In: Proceedings of IEEE International Conference on Multimedia Expo, pp. 1287–1292 (2010)Google Scholar
  10. 10.
    Wang, S., Rehman, A., Wang, Z., Ma, S., Gao, W.: SSIM-motivated rate-distortion optimization for video coding. IEEE Trans. Circuits Syst. Video Technol. 22(4), 516–529 (2012)CrossRefGoogle Scholar
  11. 11.
    Wang, S., Rehman, A., Wang, Z., Ma, S., Gao, W.: Perceptual video coding based on SSIM-inspired divisive normalization. IEEE Trans. Image Process. 22(4), 1418–1429 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Hu, S., Jin, L., Wang, H., Zhang, Y., Kwong, S., Kuo, C.C.J.: Objective video quality assessment based on perceptually weighted mean squared error. IEEE Trans. Circuits Syst. Video Technol. 99, 1–1 (2016)Google Scholar
  13. 13.
    Xu, L., Lin, W., Ma, L., Zhang, Y., Fang, Y., Ngan, K.N., Li, S., Yan, Y.: Free-energy principle inspired video quality metric and its use in video coding. IEEE Trans. Multimed. 18(4), 590–602 (2016)CrossRefGoogle Scholar
  14. 14.
    Ahn, Y.J., Sim, D.: Fast mode decision and early termination based on perceptual visual quality for HEVC encoders. J. Real Time Image Process (2017)Google Scholar
  15. 15.
    Zeng, H., Yang, A., Ngan, K.N., Wang, M.: Perceptual sensitivity-based rate control method for high efficiency video coding. Multimed. Tools Appl. 75(17), 10383–10396 (2016)CrossRefGoogle Scholar
  16. 16.
    Tang, C.W., Chen, C.H., Yu, Y.H., Tsai, C.J.: Visual sensitivity guided bit allocation for video coding. IEEE Trans. Multimed. 8(1), 11–18 (2006)CrossRefGoogle Scholar
  17. 17.
    Tang, C.W.: Spatiotemporal visual considerations for video coding. IEEE Trans. Multimed. 9(2), 231–238 (2007)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Sun, C., Wang, H.J., Li, H.: Macroblock-level rate-distortion optimization with perceptual adjustment for video coding. In: Proceedings of IEEE Data Compressing Conference, pp. 546–546 (2008)Google Scholar
  19. 19.
    Wang, Z., Lu, L., Bovik, A.C.: Foveation scalable video coding with automatic fixation selection. IEEE Trans. Image Process. 12(2), 243–254 (2003)CrossRefGoogle Scholar
  20. 20.
    Wei, H., Zhou, X., Zhou, W., Yan, C., Duan, Z., Shan, N.: Visual saliency based perceptual video coding in HEVC. In: Proceedings of International Symposium Circuits Systems, pp. 2547–2550 (2016)Google Scholar
  21. 21.
    Zhang, F., Bull, D.R.: HEVC enhancement using content-based local QP selection. In: Proceedings of IEEE ICIP, pp. 4215–4219 (2016)Google Scholar
  22. 22.
    Yang, A., Zeng, H., Chen, J., Zhu, J., Cai, C.: Perceptual feature guided rate distortion optimization for high efficiency video coding. Multidimension. Syst. Signal Process. 28(4), 1249–1266 (2017)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Zhao, W., Fu, J., Lu, Y., Li, S., Zhao, D.: Region-of-interest based coding scheme for synthesized video. In: Proceedings of IEEE International Conference on Visual Communication and Image Processing, pp. 1–4 (2015)Google Scholar
  24. 24.
    Li, S., Xu, M., Wang, Z., Sun, X.: Optimal bit allocation for CTU level rate control in HEVC. IEEE Trans. Circuits Syst. Video Technol. 27(11), 2409–2424 (2017)CrossRefGoogle Scholar
  25. 25.
    Wang, M., Ngan, K.N., Li, H.: Low-delay rate control for consistent quality using distortion-based Lagrange multiplier. IEEE Trans. Image Process. 25(7), 2943–2955 (2016)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Perez-Daniel, K., Sanchez, V.: Luma-aware multi-model rate-control for HDR content in HEVC. In: Proceedings of IEEE ICIP (2017)Google Scholar
  27. 27.
    Meddeb, M., Cagnazzo, M., Pesquet-Popescu, B.: Region-of-interest-based rate control scheme for high-efficiency video coding. APSIPA Transactions on Signal and Information Processing 3 (2014)Google Scholar
  28. 28.
    Xu, M., Deng, X., Li, S., Wang, Z.: Region-of-interest based conversational HEVC coding with hierarchical perception model of face. IEEE J. Sel. Topics Signal Process. 8(3), 475–489 (2014)CrossRefGoogle Scholar
  29. 29.
    Yang, X., Lin, W., Lu, Z., Ong, E., Yao, S.: Motion-compensated residue preprocessing in video coding based on just-noticeable-distortion profile. IEEE Trans. Circuits Syst. Video Technol. 15(6), 742–752 (2005)CrossRefGoogle Scholar
  30. 30.
    Yang, X., Lin, W., Lu, Z., Ong, E., Yao, S.: Just noticeable distortion model and its applications in video coding. Signal Process. Image Commun. 20(7), 662–680 (2005)CrossRefGoogle Scholar
  31. 31.
    Wei, Z., Ngan, K.N.: Spatio-temporal just noticeable distortion profile for grey scale image/video in DCT domain. IEEE Trans. Circuits Syst. Video Technol. 19(3), 337–346 (2009)CrossRefGoogle Scholar
  32. 32.
    Chen, Z., Guillemot, C.: Perceptually-friendly H.264/AVC video coding based on foveated just-noticeable-distortion model. IEEE Trans. Circuits Syst. Video Technol. 20(6), 806–819 (2010)CrossRefGoogle Scholar
  33. 33.
    Luo, Z., Song, L., Zheng, S., Ling, N.: H.264/advanced video control perceptual optimization coding based on JND-directed coefficient suppression. IEEE Trans. Circuits Syst. Video Technol. 23(6), 935–948 (2013)CrossRefGoogle Scholar
  34. 34.
    Jung, C., Chen, Y.: Perceptual rate distortion optimisation for video coding using free-energy principle. Electron. Lett. 51(21), 1656–1658 (2015)CrossRefGoogle Scholar
  35. 35.
    Kim, J., Bae, S.H., Kim, M.: An HEVC-compliant perceptual video coding scheme based on JND models for variable block-sized transform kernels. IEEE Trans. Circuits Syst. Video Technol. 25(11), 1786–1800 (2015)CrossRefGoogle Scholar
  36. 36.
    Bae, S.H., Kim, J., Kim, M.: HEVC-based perceptually adaptive video coding using a DCT-based local distortion detection probability model. IEEE Trans. Image Process. 25(7), 3343–3357 (2016)MathSciNetCrossRefGoogle Scholar
  37. 37.
    Wang, G., Zhang, Y., Li, B., Fan, R., Zhou, M.: A fast and HEVC-compatible perceptual video coding scheme using a transform-domain multi-channel JND model. Multimed. Tools Appl. (2017)Google Scholar
  38. 38.
    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)CrossRefGoogle Scholar
  39. 39.
    Wang, X., Su, L., Huang, Q., Liu, C., Duan, L.Y.: Motion based perceptual distortion and rate optimization for video coding. In: Proceedings of IEEE International Conference on Multimedia Expo, pp. 1061–1066 (2012)Google Scholar
  40. 40.
    Clauset, A., Shalizi, C.R., Newman, M.E.J.: Power-law distributions in empirical data. SIAM Rev. 51(4), 661–703 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  41. 41.
    Wu, H.R., Reibman, A.R., Lin, W., Pereira, F., Hemami, S.S.: Perceptual visual signal compression and transmission. Proc. IEEE 101(9), 2025–2043 (2013)CrossRefGoogle Scholar
  42. 42.
    Pei, S.C., Lai, C.L.: Very low bit-rate coding algorithm for stereo video with spatiotemporal HVS model and binary correlation disparity estimator. IEEE J. Sel. Areas Commun. 16(1), 98–107 (1998)CrossRefGoogle Scholar
  43. 43.
    Chen, J., Zheng, J., Mei, S., He, Y.: Macroblock-level adaptive frequency weighting. In: Proceedings of IEEE International Conference on Multimedia Expo, pp. 304–307 (2007)Google Scholar
  44. 44.
    Watson, A.B.: DCT quantization matrices visually optimized for individual images. In: Proceedings of SPIE, vol. 1913 (1993)Google Scholar
  45. 45.
    Marzuki, I., Ma, J., Ahn, Y.J., Sim, D.: A context-adaptive fast intra coding algorithm of high-efficiency video coding (HEVC). J. Real Time Image Process. (2016)Google Scholar
  46. 46.
    Lee, J.H., Goswami, K., Kim, B.G., Jeong, S., Choi, J.S.: Fast encoding algorithm for high-efficiency video coding (HEVC) system based on spatio-temporal correlation. J. Real Time Image Proc. 12(2), 407–418 (2016)CrossRefGoogle Scholar
  47. 47.
    Hu, Q., Zhang, X., Shi, Z., Gao, Z.: Neyman-Pearson-based early mode decision for HEVC encoding. IEEE Trans. Multimed. 18(3), 379–391 (2016)CrossRefGoogle Scholar
  48. 48.
    Lin, T.L., Chou, C.C., Liu, Z., Tung, K.H.: HEVC early termination methods for optimal cu decision utilizing encoding residual information. J. Real Time Image Process. (2016)Google Scholar
  49. 49.
    Sullivan, G.J., Wiegand, T.: Rate-distortion optimization for video compression. IEEE Signal Process. Mag. 15(6), 74–90 (1998)CrossRefGoogle Scholar
  50. 50.
    Itti, L., Baldi, P.: A principled approach to detecting surprising events in video. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 631–637 (2005)Google Scholar
  51. 51.
    Seshadrinathan, K., Bovik, A.C.: Motion tuned spatio-temporal quality assessment of natural videos. IEEE Trans. Image Process. 19(2), 335–350 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  52. 52.
    Watson, A.B., Ahumada, J., J, A.: Model of human visual-motion sensing. J. Opt. Soc. Am. A 2(2), 322–342 (1985)CrossRefGoogle Scholar
  53. 53.
    Bae, S.H., Kim, M.: DCT-QM: A DCT-based quality degradation metric for image quality optimization problems. IEEE Trans. Image Process. 25(10), 4916–4930 (2016)MathSciNetCrossRefGoogle Scholar
  54. 54.
    Ning, Z., Zhang, Z., Liu, Z.: Visual attention based video object segmentation in MPEG compressed domain. In: Proceedings of IEEE CCWMSN07, pp. 564–567 (2007)Google Scholar
  55. 55.
    Westen, S.J.P., Lagendijk, R.L., Biemond, J.: Perceptual optimization of image coding algorithms. Proc. IEEE ICIP 2, 69–72 (1995)Google Scholar
  56. 56.
    Konrad, J., Dubois, E.: Bayesian estimation of motion vector fields. IEEE Trans. Pattern Anal. Mach. Intell. 14(9), 910–927 (1992)CrossRefGoogle Scholar
  57. 57.
    Aly, H.A.: Data hiding in motion vectors of compressed video based on their associated prediction error. IEEE Trans. Inf. Forensics Secur. 6(1), 14–18 (2011)CrossRefGoogle Scholar
  58. 58.
    Wang, Y.K., Hannuksela, M.M., Varsa, V., Hourunranta, A., Gabbouj, M.: The error concealment feature in the H.26L test model. In: Proceedings of IEEE ICIP, vol. 2, pp. II-729–II-732 (2002)Google Scholar
  59. 59.
    Shen, B., Sethi, I.K., Vasudev, B.: Adaptive motion-vector resampling for compressed video downscaling. IEEE Trans. Circuits Syst. Video Technol. 9(6), 929–936 (1999)CrossRefGoogle Scholar
  60. 60.
    Stiller, C., Konrad, J.: Estimating motion in image sequences. IEEE Signal Processing Mag. 16(4), 70–91 (1999)CrossRefGoogle Scholar
  61. 61.
    Chiang, T., Zhang, Y.Q.: A new rate control scheme using quadratic rate distortion model. IEEE Trans. Circuits Syst. Video Technol. 7(1), 246–250 (1997)CrossRefGoogle Scholar
  62. 62.
    Yang, E.H., Yu, X., Meng, J., Sun, C.: Transparent composite model for DCT coefficients: design and analysis. IEEE Trans. Image Process. 23(3), 1303–1316 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  63. 63.
    Cho, S.H., Mathews, V.J.: Tracking analysis of the sign algorithm in nonstationary environments. IEEE Trans. Acoust. Speech Signal Process. 38(12), 2046–2057 (1990)CrossRefGoogle Scholar
  64. 64.
    Chen, Y.M., Bajic, I.V.: A joint approach to global motion estimation and motion segmentation from a coarsely sampled motion vector field. IEEE Trans. Circuits Syst. Video Technol. 21(9), 1316–1328 (2011)CrossRefGoogle Scholar
  65. 65.
    Wan, P., Feng, Y., Cheung, G., Bajic, I.V., Au, O.C.: 3-D motion estimation for visual saliency modeling. IEEE Signal Process. Lett. 20(10), 972–975 (2013)CrossRefGoogle Scholar
  66. 66.
    Su, Y., Sun, M.T., Hsu, V.: Global motion estimation from coarsely sampled motion vector field and the applications. IEEE Trans. Circuits Syst. Video Technol. 15(2), 232–242 (2005)CrossRefGoogle Scholar
  67. 67.
    Smolic, A., Hoeynck, M., Ohm, J.R.: Low-complexity global motion estimation from p-frame motion vectors for mpeg-7 applications. Proc. IEEE ICIP 2, 271–274 (2000)Google Scholar
  68. 68.
    Simoncelli, E.P., Olshausen, B.: Natural image statistics and neural representation. Ann. Rev. Neurosci. 24, 1193–1216 (2001)CrossRefGoogle Scholar
  69. 69.
    Friston, K.: The free-energy principle: a unified brain theory? Nature Rev. Neurosci. 11(2), 127–138 (2010)CrossRefGoogle Scholar
  70. 70.
    Li, B., Li, H., Li, L., Zhang, J.: \(\lambda \) domain rate control algorithm for high efficiency video coding. IEEE Trans. Image Process. 23(9), 3841–3854 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  71. 71.
    Hu, Q., Zhang, X., Gao, Z., Sun, J.: Analysis and optimization of x265 encoder. In: Proceedings of IEEE International Conference on Visual Communication and Image Processing, pp. 502–505 (2014)Google Scholar
  72. 72.
    Bossen, F.: Common HM test conditions and software reference configurations. In: Proceedings of 11th Meeting, JCTVC-K1100 (2012)Google Scholar
  73. 73.
    Pinson, M.H., Wolf, S.: A new standardized method for objectively measuring video quality. IEEE Trans. Broadcast. 50(3), 312–322 (2004)CrossRefGoogle Scholar
  74. 74.
    Naccari, M., Pereira, F.: Advanced H.264/AVC-based perceptual video coding: architecture, tools, and assessment. IEEE Trans. Circuits Syst. Video Technol. 21(6), 766–782 (2011)CrossRefGoogle Scholar
  75. 75.
    Bjontegaard, G.: Calculation of average PSNR differences between RD curves. ITU-T SC16/Q6, VCEG-M33. Austin, USA (2001)Google Scholar
  76. 76.
    ITU-R: Methodology for the subjective assessment of quality of television pictures. ITU-R Rec. BT.500-11 (2002)Google Scholar
  77. 77.
    Sheikh, H.R., Sabir, M.F., Bovik, A.C.: A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans. Image Process. 15(11), 3440–3451 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Qiang Hu
    • 1
  • Jun Zhou
    • 1
  • Xiaoyun Zhang
    • 1
  • Zhiyong Gao
    • 1
  • Ming-Ting Sun
    • 2
  1. 1.Institute of Image Communication and Network Engineering, Department of Electronic EngineeringShanghai Jiao Tong UniversityShanghaiChina
  2. 2.Department of Electrical EngineeringUniversity of WashingtonSeattleUSA

Personalised recommendations