AUGEN: An Ocular Support for Visually Impaired Using Deep Learning

  • Reema K. Sans
  • Reenu Sara Joseph
  • Rekha Narayanan
  • Vandhana M. PrasadEmail author
  • Jisha James
Conference paper
Part of the Lecture Notes in Computational Vision and Biomechanics book series (LNCVB, volume 30)


Among the wide varieties of technologies, mobile phone technology has become popular and the usage of mobile phone applications is increasing day by day. Most of the modern mobiles are able to capture photographs. This can be used by the visually impaired to capture images of their surroundings which is then used to generate sentences that can be read out to the give visually impaired people a better knowledge of their surroundings. The content of an image is described automatically to them by which they can avoid seeking help from people around them. Computer vision is a field which can be used for gaining information from images or videos. The tasks which the human visual system can do can be done using computer vision. Visually impaired people can use these technologies in order to get better understanding of their surroundings.


Convolutional neural networks Recurrent neural network Caption generation Text to speech 


  1. 1.
    Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generation. In: Computer vision and pattern recognition (CVPR), IEEE conferenceGoogle Scholar
  2. 2.
    Xu K, Ba JL, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel RS, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057Google Scholar
  3. 3.
    Mesnil G, Dauphin Y, Yao K, Bengio Y, Deng L, Hakkani-Tur D, He X, Heck L, Tur G, Yu D, Zweig G (2015) Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Trans Audio Speech Lang Process 23(3):530–539CrossRefGoogle Scholar
  4. 4.
    Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39:2298–2304CrossRefGoogle Scholar
  5. 5.
    Sarikaya R, Hinton GE, Deoras A (2014) Application of deep belief networks for natural language understanding. IEEE/ACM Trans Audio Speech Lang Process 22(4):778–784CrossRefGoogle Scholar
  6. 6.
    Leuven XJKU, Gavves E, Fernando B, Tuytelaars T (2015) Guiding the long-short term memory model for image caption generation. In: The IEEE international conference on computer vision (ICCV), pp 2407–2415Google Scholar
  7. 7.
    Chen X, Lawrence Zitnick C (2015) MindsEye: a recurrent visual representation for image caption generation. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 2422–2431Google Scholar
  8. 8.
    Ba JL, Mnih V, Kavukcuoglu K-R (2014) Multiple object recognition with visual attention. arXiv: 1412.7755 [cs.LG]Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Reema K. Sans
    • 1
  • Reenu Sara Joseph
    • 1
  • Rekha Narayanan
    • 1
  • Vandhana M. Prasad
    • 1
    Email author
  • Jisha James
    • 1
  1. 1.Department of Computer Science & EngineeringMuthoot Institute of Technology & ScienceErnakulamIndia

Personalised recommendations