Skip to main content

CodedVision: Towards Joint Image Understanding and Compression via End-to-End Learning

  • Conference paper
  • First Online:
Advances in Multimedia Information Processing – PCM 2018 (PCM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11164))

Included in the following conference series:

Abstract

We present a CodedVision framework to achieve image content understanding and compression jointly, leveraging the recent advances in deep neural networks. We have introduced an eight-layer deep residual network to extract image features for compression and understanding. For compression, a scalar quantizer and an entropy coder are utilized to remove redundancy. Rate-distortion optimization is integrated to improve the coding efficiency where rate is estimated via a piecewise linear approximation. A noticeable 7.8% BD-Rate (Bjontegaard delta rate) gain is presented against the state-of-the-art HEVC intra based image compression. For content understanding, we patch another residual network-based classifier to perform the classification, with reasonable accuracy at the current stage.

This work is supported by the National Natural Science Foundation of China (Grant # 61422107, 61571215 and 61631016) and the Fundamental Research Funds for the Central Universities (Grant # 021014380053, 021014380091). Dr. Z. Ma is the corresponding author of this work.

Q. Shen, J. Cai – Authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Given that BPG demonstrates the state-of-the-art coding efficiency, we mainly present the comparison against it.

References

  1. Aharon, M., Elad, M., Bruckstein, A.: \(\rm k\)-svd: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Sig. Process. 54(11), 4311–4322 (2006). https://doi.org/10.1109/TSP.2006.881199

    Article  MATH  Google Scholar 

  2. Azuma, R.T.: A survey of augmented reality. Presence Teleop. Virt. Environ. 6(4), 355–385 (1997)

    Article  Google Scholar 

  3. Babu, R.V., Tom, M., Wadekar, P.: A survey on compressed domain video analysis techniques. Multimedia Tools Appl. 75(2), 1043–1078 (2016)

    Article  Google Scholar 

  4. Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. CoRR abs/1611.01704 (2016). http://arxiv.org/abs/1611.01704

  5. Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. arXiv preprint (2016). arXiv:1611.01704

  6. Bjontegaard, G.: Calculation of average PSNR differences between R-D curves. In: Document VCEG-M33, ITU-T VCEG 13th Meeting (2001)

    Google Scholar 

  7. Chen, T., Liu, H., Shen, Q., Yue, T., Cao, X., Ma, Z.: Deepcoder: a deep neural network based video compression. In: Visual Communications and Image Processing (VCIP), 2017, pp. 1–4. IEEE (2017)

    Google Scholar 

  8. Dong, C., Deng, Y., Change Loy, C., Tang, X.: Compression artifacts reduction by a deep convolutional network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 576–584 (2015)

    Google Scholar 

  9. Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 184–199. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_13

    Chapter  Google Scholar 

  10. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)

    Article  Google Scholar 

  11. Fang, Y., Lin, W., Chen, Z., Tsai, C.M., Lin, C.W.: A video saliency detection model in compressed domain. IEEE Trans. Circuits Syst. Video Technol. 24(1), 27–38 (2014)

    Article  Google Scholar 

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  13. Hong, R., Hu, Z., Wang, R., Wang, M., Tao, D.: Multi-view object retrieval via multi-scale topic models. IEEE Trans. Image Process. 25(12), 5814–5827 (2016)

    Article  MathSciNet  Google Scholar 

  14. Hong, R., Zhang, L., Tao, D.: Unified photo enhancement by discovering aesthetic communities from flickr. IEEE Trans. Image Process. 25(3), 1124–1135 (2016)

    Article  MathSciNet  Google Scholar 

  15. Hong, R., Zhang, L., Zhang, C., Zimmermann, R.: Flickr circles: aesthetic tendency discovery by multi-view regularized topic modeling. IEEE Trans. Multimedia 18(8), 1555–1567 (2016)

    Article  Google Scholar 

  16. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint (2014). arXiv:1412.6980

  17. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  18. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)

    Article  Google Scholar 

  19. Lee, D.T.: Jpeg 2000: retrospective and new developments. Proc. IEEE 93(1), 32–41 (2005). https://doi.org/10.1109/JPROC.2004.839613

    Article  MathSciNet  Google Scholar 

  20. Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157 (1999). https://doi.org/10.1109/ICCV.1999.790410

  21. Lu, Z.M., Li, S.Z., Burkhardt, H.: A content-based image retrieval scheme in jpeg compressed domain. Int. J. Innovative Comput. Inf. Control 2(4), 831–839 (2006)

    Google Scholar 

  22. Marr, D.: Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. Henry Holt and Co., Inc., New York (1982)

    Google Scholar 

  23. Nah, S., Kim, T.H., Lee, K.M.: Deep multi-scale convolutional neural network for dynamic scene deblurring. In: CVPR, vol. 1, p. 3 (2017)

    Google Scholar 

  24. Ortega, A., Ramchandran, K.: Rate-distortion methods for image and video compression. IEEE Sig. Process. Mag. 15(6), 23–50 (1998)

    Article  Google Scholar 

  25. Pati, Y.C., Rezaiifar, R., Krishnaprasad, P.S.: Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. In: Proceedings of 27th Asilomar Conference on Signals, Systems and Computers, vol. 1, pp. 40–44 (1993). https://doi.org/10.1109/ACSSC.1993.342465

  26. Porikli, F., Bashir, F., Sun, H.: Compressed domain video object segmentation. IEEE Trans. Circ. Syst. Video Technol. 20(1), 2–14 (2010)

    Article  Google Scholar 

  27. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

    Google Scholar 

  28. Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., Wang, Z.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1874–1883 (2016)

    Google Scholar 

  29. Simpson, R.L.: Computer vision: an overview. IEEE Expert 6(4), 11–15 (1991). https://doi.org/10.1109/64.85917

    Article  Google Scholar 

  30. Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012). https://doi.org/10.1109/TCSVT.2012.2221191

    Article  Google Scholar 

  31. Toderici, G., O’Malley, S.M., Hwang, S.J., Vincent, D., Minnen, D., Baluja, S., Covell, M., Sukthankar, R.: Variable rate image compression with recurrent neural networks. CoRR abs/1511.06085 (2015). http://arxiv.org/abs/1511.06085

  32. Toderici, G., Vincent, D., Johnston, N., Hwang, S.J., Minnen, D., Shor, J., Covell, M.: Full resolution image compression with recurrent neural networks. CoRR abs/1608.05148 (2016). http://arxiv.org/abs/1608.05148

  33. Wiegand, T., Sullivan, G.J., Bjontegaard, G., Luthra, A.: Overview of the h.264/avc video coding standard. IEEE Trans. Circuits Syst. Video Technol. 13(7), 560–576 (2003)

    Article  Google Scholar 

  34. Xue, Y., Wang, Y.: Video coding using a self-adaptive redundant dictionary consisting of spatial and temporal prediction candidates. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME) (2014)

    Google Scholar 

  35. Zepeda, J., Guillemot, C., Kijak, E.: Image compression using sparse representations and the iteration-tuned and aligned dictionary. IEEE. J. Sel. Top. Sign. Process. 5(5), 1061–1073 (2011)

    Article  Google Scholar 

  36. Zhang, C., Cheng, J., Tian, Q.: Multiview label sharing for visual representations and classifications. IEEE Trans. Multimedia 20(4), 903–913 (2018)

    Article  Google Scholar 

  37. Zhang, C., Liu, J., Tian, Q., Xu, C., Lu, H., Ma, S.: Image classification by non-negative sparse coding, low-rank and sparse decomposition. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1673–1680. IEEE (2011)

    Google Scholar 

  38. Zhao, L., He, Z., Cao, W., Zhao, D.: Real-time moving object segmentation and classification from HEVC compressed surveillance video. IEEE Transactions on Circuits and Systems for Video Technology (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhan Ma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shen, Q. et al. (2018). CodedVision: Towards Joint Image Understanding and Compression via End-to-End Learning. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11164. Springer, Cham. https://doi.org/10.1007/978-3-030-00776-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00776-8_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00775-1

  • Online ISBN: 978-3-030-00776-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics