Abstract
We present a CodedVision framework to achieve image content understanding and compression jointly, leveraging the recent advances in deep neural networks. We have introduced an eight-layer deep residual network to extract image features for compression and understanding. For compression, a scalar quantizer and an entropy coder are utilized to remove redundancy. Rate-distortion optimization is integrated to improve the coding efficiency where rate is estimated via a piecewise linear approximation. A noticeable 7.8% BD-Rate (Bjontegaard delta rate) gain is presented against the state-of-the-art HEVC intra based image compression. For content understanding, we patch another residual network-based classifier to perform the classification, with reasonable accuracy at the current stage.
This work is supported by the National Natural Science Foundation of China (Grant # 61422107, 61571215 and 61631016) and the Fundamental Research Funds for the Central Universities (Grant # 021014380053, 021014380091). Dr. Z. Ma is the corresponding author of this work.
Q. Shen, J. Cai – Authors contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Given that BPG demonstrates the state-of-the-art coding efficiency, we mainly present the comparison against it.
References
Aharon, M., Elad, M., Bruckstein, A.: \(\rm k\)-svd: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Sig. Process. 54(11), 4311–4322 (2006). https://doi.org/10.1109/TSP.2006.881199
Azuma, R.T.: A survey of augmented reality. Presence Teleop. Virt. Environ. 6(4), 355–385 (1997)
Babu, R.V., Tom, M., Wadekar, P.: A survey on compressed domain video analysis techniques. Multimedia Tools Appl. 75(2), 1043–1078 (2016)
Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. CoRR abs/1611.01704 (2016). http://arxiv.org/abs/1611.01704
Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. arXiv preprint (2016). arXiv:1611.01704
Bjontegaard, G.: Calculation of average PSNR differences between R-D curves. In: Document VCEG-M33, ITU-T VCEG 13th Meeting (2001)
Chen, T., Liu, H., Shen, Q., Yue, T., Cao, X., Ma, Z.: Deepcoder: a deep neural network based video compression. In: Visual Communications and Image Processing (VCIP), 2017, pp. 1–4. IEEE (2017)
Dong, C., Deng, Y., Change Loy, C., Tang, X.: Compression artifacts reduction by a deep convolutional network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 576–584 (2015)
Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 184–199. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_13
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Fang, Y., Lin, W., Chen, Z., Tsai, C.M., Lin, C.W.: A video saliency detection model in compressed domain. IEEE Trans. Circuits Syst. Video Technol. 24(1), 27–38 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hong, R., Hu, Z., Wang, R., Wang, M., Tao, D.: Multi-view object retrieval via multi-scale topic models. IEEE Trans. Image Process. 25(12), 5814–5827 (2016)
Hong, R., Zhang, L., Tao, D.: Unified photo enhancement by discovering aesthetic communities from flickr. IEEE Trans. Image Process. 25(3), 1124–1135 (2016)
Hong, R., Zhang, L., Zhang, C., Zimmermann, R.: Flickr circles: aesthetic tendency discovery by multi-view regularized topic modeling. IEEE Trans. Multimedia 18(8), 1555–1567 (2016)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint (2014). arXiv:1412.6980
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Lee, D.T.: Jpeg 2000: retrospective and new developments. Proc. IEEE 93(1), 32–41 (2005). https://doi.org/10.1109/JPROC.2004.839613
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157 (1999). https://doi.org/10.1109/ICCV.1999.790410
Lu, Z.M., Li, S.Z., Burkhardt, H.: A content-based image retrieval scheme in jpeg compressed domain. Int. J. Innovative Comput. Inf. Control 2(4), 831–839 (2006)
Marr, D.: Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. Henry Holt and Co., Inc., New York (1982)
Nah, S., Kim, T.H., Lee, K.M.: Deep multi-scale convolutional neural network for dynamic scene deblurring. In: CVPR, vol. 1, p. 3 (2017)
Ortega, A., Ramchandran, K.: Rate-distortion methods for image and video compression. IEEE Sig. Process. Mag. 15(6), 23–50 (1998)
Pati, Y.C., Rezaiifar, R., Krishnaprasad, P.S.: Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. In: Proceedings of 27th Asilomar Conference on Signals, Systems and Computers, vol. 1, pp. 40–44 (1993). https://doi.org/10.1109/ACSSC.1993.342465
Porikli, F., Bashir, F., Sun, H.: Compressed domain video object segmentation. IEEE Trans. Circ. Syst. Video Technol. 20(1), 2–14 (2010)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., Wang, Z.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1874–1883 (2016)
Simpson, R.L.: Computer vision: an overview. IEEE Expert 6(4), 11–15 (1991). https://doi.org/10.1109/64.85917
Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012). https://doi.org/10.1109/TCSVT.2012.2221191
Toderici, G., O’Malley, S.M., Hwang, S.J., Vincent, D., Minnen, D., Baluja, S., Covell, M., Sukthankar, R.: Variable rate image compression with recurrent neural networks. CoRR abs/1511.06085 (2015). http://arxiv.org/abs/1511.06085
Toderici, G., Vincent, D., Johnston, N., Hwang, S.J., Minnen, D., Shor, J., Covell, M.: Full resolution image compression with recurrent neural networks. CoRR abs/1608.05148 (2016). http://arxiv.org/abs/1608.05148
Wiegand, T., Sullivan, G.J., Bjontegaard, G., Luthra, A.: Overview of the h.264/avc video coding standard. IEEE Trans. Circuits Syst. Video Technol. 13(7), 560–576 (2003)
Xue, Y., Wang, Y.: Video coding using a self-adaptive redundant dictionary consisting of spatial and temporal prediction candidates. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME) (2014)
Zepeda, J., Guillemot, C., Kijak, E.: Image compression using sparse representations and the iteration-tuned and aligned dictionary. IEEE. J. Sel. Top. Sign. Process. 5(5), 1061–1073 (2011)
Zhang, C., Cheng, J., Tian, Q.: Multiview label sharing for visual representations and classifications. IEEE Trans. Multimedia 20(4), 903–913 (2018)
Zhang, C., Liu, J., Tian, Q., Xu, C., Lu, H., Ma, S.: Image classification by non-negative sparse coding, low-rank and sparse decomposition. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1673–1680. IEEE (2011)
Zhao, L., He, Z., Cao, W., Zhao, D.: Real-time moving object segmentation and classification from HEVC compressed surveillance video. IEEE Transactions on Circuits and Systems for Video Technology (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Shen, Q. et al. (2018). CodedVision: Towards Joint Image Understanding and Compression via End-to-End Learning. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11164. Springer, Cham. https://doi.org/10.1007/978-3-030-00776-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-00776-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00775-1
Online ISBN: 978-3-030-00776-8
eBook Packages: Computer ScienceComputer Science (R0)