Skip to main content

Real-Time Environment Description Application for Visually Challenged People

  • Conference paper
  • First Online:
Second International Conference on Computer Networks and Communication Technologies (ICCNCT 2019)

Abstract

In real world, visually challenged people face the great challenge of understanding nearby objects and movements going on in their vicinity. They mainly depend upon their auditory or physical abilities of touch to recognize things that are happening around them. Being able to describe the surrounding environment and objects that are present around them using properly formed sentences could only be done if a normal person is present and can describe it to them. We plan on creating an application that can solve this very challenging task by generating description of a real-time video captured from a mobile phone camera, which will aid the visually challenged in their day to day activities. In this paper, we are using concepts of Object detection and Caption generation and present our approach for the same, this which will enable us to run the model on smart phone devices in real time. The description pertaining to the objects, as seen in real time video generated will be converted to audio as the output. We train our proposed model on various datasets so that the generated descriptions are correct and up to the mark. Using the combinations of Convolutional Neural-Network and Recurrent Neural-Network and our own modifications, we tend to create a new model. We also are implementing an Android application for the visually challenged people to show the real-life applicability and usefulness of the Neural Network.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. IEEE Trans. Pattern Anal. Mach. Intell. 39(4) (2017)

    Google Scholar 

  2. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to Sequence Learning with Neural Networks. Department of Computer Science, University of Cornell. arXiv:1409.3215

  3. You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image Captioning with Semantic Attention. Department of Computer Science, University of Rochester, Rochester NY 14627, USA

    Google Scholar 

  4. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: lessons learned from the 2015 MS-COCO image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. 39(4) (2017)

    Google Scholar 

  5. Hori, C., Hori, T., Marks, T.K., Hershey, J.R.: Early and Late Integration of Audio Features for Automatic Video Description. Mitsubishi Electric Research Laboratories. TR2017-183 Dec 2017

    Google Scholar 

  6. Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., Xu1, W.: CNN-RNN: a unified framework for multi-label image classification. The University of California at Los Angles. TR2017-183 Dec 2016

    Google Scholar 

  7. Hori, C., Hori, T., Lee, T.-Y., Sumi, K., Hershey, J.R., Marks, T.K.: Attention-based multimodal fusion for video description (2017). arXiv:1701.03126

  8. Aytar, Y., Vondrick, C., Torralba, A.: Soundnet: Learning sound representations from the unlabeled video. In: Advances in Neural Information Processing Systems, pp. 892–900 (2016)

    Google Scholar 

  9. Chorowski, J.K., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 577–585. Curran Associates, Inc. (2015)

    Google Scholar 

  10. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: CoRR (2014). arXiv:1409.0473

  11. Lin, M., Chen, Q., Yan, S.: Network in the network. In: CoRR (2013). arXiv:1312.4400

  12. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: CoRR (2014). arXiv:1409.1556

  13. Li, Z., Rao, Z.: Object Detection and its Implementation on Android Devices (2015)

    Google Scholar 

  14. Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, pp. 4489–4497 (7–13 Dec 2015)

    Google Scholar 

  15. Fiscus, J.G.: A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (rover). In: IEEE Workshop on Automatic Speech Recognition and Understanding, 1997. Proceedings. IEEE, pp. 347–354 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amey Arvind Bhile .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Arvind Bhile, A., Hole, V. (2020). Real-Time Environment Description Application for Visually Challenged People. In: Smys, S., Senjyu, T., Lafata, P. (eds) Second International Conference on Computer Networks and Communication Technologies. ICCNCT 2019. Lecture Notes on Data Engineering and Communications Technologies, vol 44. Springer, Cham. https://doi.org/10.1007/978-3-030-37051-0_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-37051-0_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-37050-3

  • Online ISBN: 978-3-030-37051-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics