Real-Time Environment Description Application for Visually Challenged People

Arvind Bhile, Amey; Hole, Varsha

doi:10.1007/978-3-030-37051-0_38

Amey Arvind Bhile⁵ &
Varsha Hole⁵

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 44))

Included in the following conference series:

International Conference on Computer Networks and Inventive Communication Technologies

1386 Accesses
1 Citations

Abstract

In real world, visually challenged people face the great challenge of understanding nearby objects and movements going on in their vicinity. They mainly depend upon their auditory or physical abilities of touch to recognize things that are happening around them. Being able to describe the surrounding environment and objects that are present around them using properly formed sentences could only be done if a normal person is present and can describe it to them. We plan on creating an application that can solve this very challenging task by generating description of a real-time video captured from a mobile phone camera, which will aid the visually challenged in their day to day activities. In this paper, we are using concepts of Object detection and Caption generation and present our approach for the same, this which will enable us to run the model on smart phone devices in real time. The description pertaining to the objects, as seen in real time video generated will be converted to audio as the output. We train our proposed model on various datasets so that the generated descriptions are correct and up to the mark. Using the combinations of Convolutional Neural-Network and Recurrent Neural-Network and our own modifications, we tend to create a new model. We also are implementing an Android application for the visually challenged people to show the real-life applicability and usefulness of the Neural Network.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. IEEE Trans. Pattern Anal. Mach. Intell. 39(4) (2017)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to Sequence Learning with Neural Networks. Department of Computer Science, University of Cornell. arXiv:1409.3215
You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image Captioning with Semantic Attention. Department of Computer Science, University of Rochester, Rochester NY 14627, USA
Google Scholar
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: lessons learned from the 2015 MS-COCO image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. 39(4) (2017)
Google Scholar
Hori, C., Hori, T., Marks, T.K., Hershey, J.R.: Early and Late Integration of Audio Features for Automatic Video Description. Mitsubishi Electric Research Laboratories. TR2017-183 Dec 2017
Google Scholar
Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., Xu1, W.: CNN-RNN: a unified framework for multi-label image classification. The University of California at Los Angles. TR2017-183 Dec 2016
Google Scholar
Hori, C., Hori, T., Lee, T.-Y., Sumi, K., Hershey, J.R., Marks, T.K.: Attention-based multimodal fusion for video description (2017). arXiv:1701.03126
Aytar, Y., Vondrick, C., Torralba, A.: Soundnet: Learning sound representations from the unlabeled video. In: Advances in Neural Information Processing Systems, pp. 892–900 (2016)
Google Scholar
Chorowski, J.K., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 577–585. Curran Associates, Inc. (2015)
Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: CoRR (2014). arXiv:1409.0473
Lin, M., Chen, Q., Yan, S.: Network in the network. In: CoRR (2013). arXiv:1312.4400
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: CoRR (2014). arXiv:1409.1556
Li, Z., Rao, Z.: Object Detection and its Implementation on Android Devices (2015)
Google Scholar
Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, pp. 4489–4497 (7–13 Dec 2015)
Google Scholar
Fiscus, J.G.: A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (rover). In: IEEE Workshop on Automatic Speech Recognition and Understanding, 1997. Proceedings. IEEE, pp. 347–354 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Information Technology, Sardar Patel Institute of Technology, Mumbai, India
Amey Arvind Bhile & Varsha Hole

Authors

Amey Arvind Bhile
View author publications
You can also search for this author in PubMed Google Scholar
Varsha Hole
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amey Arvind Bhile .

Editor information

Editors and Affiliations

Department of Computer Science Engineering, RVS Technical Campus, Coimbatore, Tamil Nadu, India
S. Smys
University of the Ryukyus, Okinawa, Japan
Tomonobu Senjyu
Department of Telecommunication Engineering, Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, Czech Republic
Pavel Lafata

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Arvind Bhile, A., Hole, V. (2020). Real-Time Environment Description Application for Visually Challenged People. In: Smys, S., Senjyu, T., Lafata, P. (eds) Second International Conference on Computer Networks and Communication Technologies. ICCNCT 2019. Lecture Notes on Data Engineering and Communications Technologies, vol 44. Springer, Cham. https://doi.org/10.1007/978-3-030-37051-0_38

Download citation

DOI: https://doi.org/10.1007/978-3-030-37051-0_38
Published: 22 January 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37050-3
Online ISBN: 978-3-030-37051-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics