Abstract
In recent years, many efforts have been conducted to increase the accuracy of neural image captioning as one of the diverse applications of deep neural networks. Text-based image retrieval can be considered as one of the important applications of the image captioning. Moreover, improving the quality of life for visually impaired people is another crucial application of the image captioning. Accordingly, rapid and optimal implementations that can work effectively on mobile processors seems to be necessary. Despite the numerous image captioning approaches presented so far, few solutions are provided that consider the mobile computational capabilities. In this paper, we practically focused on the decoding step for the implementation of image captioning in android applications. Actually, iteration over variable lengths sequences can be performed using dynamic control flow. In other words, implementing such iterative algorithms using dynamic control flow may prevent unrolling the computation to a fixed maximum length. Using this facility will result in increased speed of the decoding routine in image captioning on smartphone devices. Experimental results on execution time validate the proposed approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Samsung Galaxy J5.
- 2.
Huawei Honor 6x.
- 3.
Samsung Galaxy J5.
References
Kiros, R., Salakhutdinov, R., Zemel, R.: Multimodal neural language models. In: ICML, pp. 595–603 (2014)
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of IEEE Computer Society Conference on Computer Vision Pattern Recognition, 12–June, vol. 07, pp. 3156–3164 (2015)
Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention (2015)
Pedersoli, M., Lucas, T., Schmid, C., Verbeek, J.: Areas of attention for image captioning. arXiv (CVPR sub) (2016)
Mathur, P., Gill, A., Yadav, A., Mishra, A., Bansode, N.K.: Camera2Caption: a real-time image caption generator. ICCIDS 2017 – Proceedings of International Conference on Computational Intelligence Data Science, vol. 2018, no. 2015, pp. 1–6, (2018)
Farhadi, A., et al.: Every Picture Tells a Story: Generating Sentences from Images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 15–29. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_2
Kulkarni, G., et al.: Baby talk: understanding and generating simple image descriptions. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2891–2903 (2013)
Elliott, D., Keller, F.: Image description using visual dependency representations. In: EMNLP, pp. 1292–1302, October 2013
Ordonez, V., Kulkarni, G., Berg, T.L.: Im2Text: describing images using 1 million captioned photographs. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24. Curran Associates, Inc., pp. 1143–1151 (2011)
Gupta, A., Verma, Y., Jawahar, C.V., et al.: Choosing linguistics over vision to describe images. In: AAAI, p. 1 (2012)
Mao, J., Xu, W., Yang, Y., Wang, J., Yuille, A.L.: Explain images with multimodal recurrent neural networks, pp. 1–9 (2014)
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI, vol. 4, p. 12 (2017)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Abadi, M., et al.: TensorFlow : a system for large-scale machine learning (2016)
Yu, Y., et al.: Dynamic control flow in large-scale machine learning (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Samadi, B., Mansouri, A., Mahmoudi-Aznaveh, A. (2019). Accelerating Decoding Step in Image Captioning on Smartphones. In: Grandinetti, L., Mirtaheri, S., Shahbazian, R. (eds) High-Performance Computing and Big Data Analysis. TopHPC 2019. Communications in Computer and Information Science, vol 891. Springer, Cham. https://doi.org/10.1007/978-3-030-33495-6_33
Download citation
DOI: https://doi.org/10.1007/978-3-030-33495-6_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33494-9
Online ISBN: 978-3-030-33495-6
eBook Packages: Computer ScienceComputer Science (R0)