CodedVision: Towards Joint Image Understanding and Compression via End-to-End Learning

Shen, Qiu; Cai, Juanjuan; Liu, Linfeng; Liu, Haojie; Chen, Tong; Ye, Long; Ma, Zhan

doi:10.1007/978-3-030-00776-8_1

Qiu Shen¹⁸,
Juanjuan Cai¹⁹,
Linfeng Liu¹⁸,
Haojie Liu¹⁸,
Tong Chen¹⁸,
Long Ye¹⁹ &
…
Zhan Ma¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11164))

Included in the following conference series:

Pacific Rim Conference on Multimedia

3868 Accesses
9 Citations

Abstract

We present a CodedVision framework to achieve image content understanding and compression jointly, leveraging the recent advances in deep neural networks. We have introduced an eight-layer deep residual network to extract image features for compression and understanding. For compression, a scalar quantizer and an entropy coder are utilized to remove redundancy. Rate-distortion optimization is integrated to improve the coding efficiency where rate is estimated via a piecewise linear approximation. A noticeable 7.8% BD-Rate (Bjontegaard delta rate) gain is presented against the state-of-the-art HEVC intra based image compression. For content understanding, we patch another residual network-based classifier to perform the classification, with reasonable accuracy at the current stage.

This work is supported by the National Natural Science Foundation of China (Grant # 61422107, 61571215 and 61631016) and the Fundamental Research Funds for the Central Universities (Grant # 021014380053, 021014380091). Dr. Z. Ma is the corresponding author of this work.

Q. Shen, J. Cai – Authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Given that BPG demonstrates the state-of-the-art coding efficiency, we mainly present the comparison against it.

References

Aharon, M., Elad, M., Bruckstein, A.: \(\rm k\)-svd: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Sig. Process. 54(11), 4311–4322 (2006). https://doi.org/10.1109/TSP.2006.881199
Article MATH Google Scholar
Azuma, R.T.: A survey of augmented reality. Presence Teleop. Virt. Environ. 6(4), 355–385 (1997)
Article Google Scholar
Babu, R.V., Tom, M., Wadekar, P.: A survey on compressed domain video analysis techniques. Multimedia Tools Appl. 75(2), 1043–1078 (2016)
Article Google Scholar
Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. CoRR abs/1611.01704 (2016). http://arxiv.org/abs/1611.01704
Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. arXiv preprint (2016). arXiv:1611.01704
Bjontegaard, G.: Calculation of average PSNR differences between R-D curves. In: Document VCEG-M33, ITU-T VCEG 13th Meeting (2001)
Google Scholar
Chen, T., Liu, H., Shen, Q., Yue, T., Cao, X., Ma, Z.: Deepcoder: a deep neural network based video compression. In: Visual Communications and Image Processing (VCIP), 2017, pp. 1–4. IEEE (2017)
Google Scholar
Dong, C., Deng, Y., Change Loy, C., Tang, X.: Compression artifacts reduction by a deep convolutional network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 576–584 (2015)
Google Scholar
Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 184–199. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_13
Chapter Google Scholar
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Article Google Scholar
Fang, Y., Lin, W., Chen, Z., Tsai, C.M., Lin, C.W.: A video saliency detection model in compressed domain. IEEE Trans. Circuits Syst. Video Technol. 24(1), 27–38 (2014)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hong, R., Hu, Z., Wang, R., Wang, M., Tao, D.: Multi-view object retrieval via multi-scale topic models. IEEE Trans. Image Process. 25(12), 5814–5827 (2016)
Article MathSciNet Google Scholar
Hong, R., Zhang, L., Tao, D.: Unified photo enhancement by discovering aesthetic communities from flickr. IEEE Trans. Image Process. 25(3), 1124–1135 (2016)
Article MathSciNet Google Scholar
Hong, R., Zhang, L., Zhang, C., Zimmermann, R.: Flickr circles: aesthetic tendency discovery by multi-view regularized topic modeling. IEEE Trans. Multimedia 18(8), 1555–1567 (2016)
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint (2014). arXiv:1412.6980
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Article Google Scholar
Lee, D.T.: Jpeg 2000: retrospective and new developments. Proc. IEEE 93(1), 32–41 (2005). https://doi.org/10.1109/JPROC.2004.839613
Article MathSciNet Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157 (1999). https://doi.org/10.1109/ICCV.1999.790410
Lu, Z.M., Li, S.Z., Burkhardt, H.: A content-based image retrieval scheme in jpeg compressed domain. Int. J. Innovative Comput. Inf. Control 2(4), 831–839 (2006)
Google Scholar
Marr, D.: Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. Henry Holt and Co., Inc., New York (1982)
Google Scholar
Nah, S., Kim, T.H., Lee, K.M.: Deep multi-scale convolutional neural network for dynamic scene deblurring. In: CVPR, vol. 1, p. 3 (2017)
Google Scholar
Ortega, A., Ramchandran, K.: Rate-distortion methods for image and video compression. IEEE Sig. Process. Mag. 15(6), 23–50 (1998)
Article Google Scholar
Pati, Y.C., Rezaiifar, R., Krishnaprasad, P.S.: Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. In: Proceedings of 27th Asilomar Conference on Signals, Systems and Computers, vol. 1, pp. 40–44 (1993). https://doi.org/10.1109/ACSSC.1993.342465
Porikli, F., Bashir, F., Sun, H.: Compressed domain video object segmentation. IEEE Trans. Circ. Syst. Video Technol. 20(1), 2–14 (2010)
Article Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., Wang, Z.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1874–1883 (2016)
Google Scholar
Simpson, R.L.: Computer vision: an overview. IEEE Expert 6(4), 11–15 (1991). https://doi.org/10.1109/64.85917
Article Google Scholar
Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012). https://doi.org/10.1109/TCSVT.2012.2221191
Article Google Scholar
Toderici, G., O’Malley, S.M., Hwang, S.J., Vincent, D., Minnen, D., Baluja, S., Covell, M., Sukthankar, R.: Variable rate image compression with recurrent neural networks. CoRR abs/1511.06085 (2015). http://arxiv.org/abs/1511.06085
Toderici, G., Vincent, D., Johnston, N., Hwang, S.J., Minnen, D., Shor, J., Covell, M.: Full resolution image compression with recurrent neural networks. CoRR abs/1608.05148 (2016). http://arxiv.org/abs/1608.05148
Wiegand, T., Sullivan, G.J., Bjontegaard, G., Luthra, A.: Overview of the h.264/avc video coding standard. IEEE Trans. Circuits Syst. Video Technol. 13(7), 560–576 (2003)
Article Google Scholar
Xue, Y., Wang, Y.: Video coding using a self-adaptive redundant dictionary consisting of spatial and temporal prediction candidates. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME) (2014)
Google Scholar
Zepeda, J., Guillemot, C., Kijak, E.: Image compression using sparse representations and the iteration-tuned and aligned dictionary. IEEE. J. Sel. Top. Sign. Process. 5(5), 1061–1073 (2011)
Article Google Scholar
Zhang, C., Cheng, J., Tian, Q.: Multiview label sharing for visual representations and classifications. IEEE Trans. Multimedia 20(4), 903–913 (2018)
Article Google Scholar
Zhang, C., Liu, J., Tian, Q., Xu, C., Lu, H., Ma, S.: Image classification by non-negative sparse coding, low-rank and sparse decomposition. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1673–1680. IEEE (2011)
Google Scholar
Zhao, L., He, Z., Cao, W., Zhao, D.: Real-time moving object segmentation and classification from HEVC compressed surveillance video. IEEE Transactions on Circuits and Systems for Video Technology (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Nanjing University, Nanjing, China
Qiu Shen, Linfeng Liu, Haojie Liu, Tong Chen & Zhan Ma
Communication University of China, Beijing, China
Juanjuan Cai & Long Ye

Authors

Qiu Shen
View author publications
You can also search for this author in PubMed Google Scholar
Juanjuan Cai
View author publications
You can also search for this author in PubMed Google Scholar
Linfeng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Haojie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Tong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Long Ye
View author publications
You can also search for this author in PubMed Google Scholar
Zhan Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhan Ma .

Editor information

Editors and Affiliations

Hefei University of Technology, Hefei, China
Richang Hong
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
University of Tokyo, Tokyo, Japan
Toshihiko Yamasaki
Hefei University of Technology, Hefei, China
Meng Wang
City University of Hong Kong, Hong Kong, Hong Kong
Chong-Wah Ngo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shen, Q. et al. (2018). CodedVision: Towards Joint Image Understanding and Compression via End-to-End Learning. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11164. Springer, Cham. https://doi.org/10.1007/978-3-030-00776-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-00776-8_1
Published: 19 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00775-1
Online ISBN: 978-3-030-00776-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics