Accelerating Decoding Step in Image Captioning on Smartphones

Samadi, Behnam; Mansouri, Azadeh; Mahmoudi-Aznaveh, Ahmad

doi:10.1007/978-3-030-33495-6_33

Accelerating Decoding Step in Image Captioning on Smartphones

Behnam Samadi⁹,
Azadeh Mansouri⁹ &
Ahmad Mahmoudi-Aznaveh¹⁰

Conference paper
First Online: 20 October 2019

673 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 891))

Abstract

In recent years, many efforts have been conducted to increase the accuracy of neural image captioning as one of the diverse applications of deep neural networks. Text-based image retrieval can be considered as one of the important applications of the image captioning. Moreover, improving the quality of life for visually impaired people is another crucial application of the image captioning. Accordingly, rapid and optimal implementations that can work effectively on mobile processors seems to be necessary. Despite the numerous image captioning approaches presented so far, few solutions are provided that consider the mobile computational capabilities. In this paper, we practically focused on the decoding step for the implementation of image captioning in android applications. Actually, iteration over variable lengths sequences can be performed using dynamic control flow. In other words, implementing such iterative algorithms using dynamic control flow may prevent unrolling the computation to a fixed maximum length. Using this facility will result in increased speed of the decoding routine in image captioning on smartphone devices. Experimental results on execution time validate the proposed approach.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Samsung Galaxy J5.
2.
Huawei Honor 6x.
3.
Samsung Galaxy J5.

References

Kiros, R., Salakhutdinov, R., Zemel, R.: Multimodal neural language models. In: ICML, pp. 595–603 (2014)
Google Scholar
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of IEEE Computer Society Conference on Computer Vision Pattern Recognition, 12–June, vol. 07, pp. 3156–3164 (2015)
Google Scholar
Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention (2015)
Google Scholar
Pedersoli, M., Lucas, T., Schmid, C., Verbeek, J.: Areas of attention for image captioning. arXiv (CVPR sub) (2016)
Google Scholar
Mathur, P., Gill, A., Yadav, A., Mishra, A., Bansode, N.K.: Camera2Caption: a real-time image caption generator. ICCIDS 2017 – Proceedings of International Conference on Computational Intelligence Data Science, vol. 2018, no. 2015, pp. 1–6, (2018)
Google Scholar
Farhadi, A., et al.: Every Picture Tells a Story: Generating Sentences from Images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 15–29. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_2
Chapter Google Scholar
Kulkarni, G., et al.: Baby talk: understanding and generating simple image descriptions. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2891–2903 (2013)
Article Google Scholar
Elliott, D., Keller, F.: Image description using visual dependency representations. In: EMNLP, pp. 1292–1302, October 2013
Google Scholar
Ordonez, V., Kulkarni, G., Berg, T.L.: Im2Text: describing images using 1 million captioned photographs. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24. Curran Associates, Inc., pp. 1143–1151 (2011)
Google Scholar
Gupta, A., Verma, Y., Jawahar, C.V., et al.: Choosing linguistics over vision to describe images. In: AAAI, p. 1 (2012)
Google Scholar
Mao, J., Xu, W., Yang, Y., Wang, J., Yuille, A.L.: Explain images with multimodal recurrent neural networks, pp. 1–9 (2014)
Google Scholar
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI, vol. 4, p. 12 (2017)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Abadi, M., et al.: TensorFlow : a system for large-scale machine learning (2016)
Google Scholar
Yu, Y., et al.: Dynamic control flow in large-scale machine learning (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Faculty of Engineering, Kharazmi University, Tehran, Iran
Behnam Samadi & Azadeh Mansouri
Cyberspace Research Institute, Shahid Beheshti University, Tehran, Iran
Ahmad Mahmoudi-Aznaveh

Authors

Behnam Samadi
View author publications
You can also search for this author in PubMed Google Scholar
Azadeh Mansouri
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad Mahmoudi-Aznaveh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Behnam Samadi .

Editor information

Editors and Affiliations

University of Calabria, Rende, Italy
Lucio Grandinetti
Kharazmi University, Tehran, Iran
Seyedeh Leili Mirtaheri
University of Calabria, Rende, Italy
Reza Shahbazian

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Samadi, B., Mansouri, A., Mahmoudi-Aznaveh, A. (2019). Accelerating Decoding Step in Image Captioning on Smartphones. In: Grandinetti, L., Mirtaheri, S., Shahbazian, R. (eds) High-Performance Computing and Big Data Analysis. TopHPC 2019. Communications in Computer and Information Science, vol 891. Springer, Cham. https://doi.org/10.1007/978-3-030-33495-6_33

Download citation

DOI: https://doi.org/10.1007/978-3-030-33495-6_33
Published: 20 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33494-9
Online ISBN: 978-3-030-33495-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics