Text Recognition in Videos Using a Recurrent Connectionist Approach

  • Khaoula Elagouni
  • Christophe Garcia
  • Franck Mamalet
  • Pascale Sébillot
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7553)


Most OCR (Optical Character Recognition) systems developed to recognize texts embedded in multimedia documents segment the text into characters before recognizing them. In this paper, we propose a novel approach able to avoid any explicit character segmentation. Using a multi-scale scanning scheme, texts extracted from videos are first represented by sequences of learnt features. Obtained representations are then used to feed a connectionist recurrent model specifically designed to take into account dependencies between successive learnt features and to recognize texts. The proposed video OCR evaluated on a database of TV news videos achieves very high recognition rates. Experiments also demonstrate that, for our recognition task, learnt feature representations perform better than hand-crafted features.


Video text recognition multi-scale image scanning ConvNet LSTM CTC 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Casey, R., Lecolinet, E.: A survey of methods and strategies in character segmentation. PAMI 18(7), 690–706 (2002)CrossRefGoogle Scholar
  2. 2.
    Chen, D., Odobez, J., Bourlard, H.: Text detection and recognition in images and video frames. PR 37(3), 595–608 (2004)Google Scholar
  3. 3.
    Elagouni, K., Garcia, C., Mamalet, F., Sébillot, P.: Combining multi-scale character recognition and linguistic knowledge for natural scene text OCR. In: DAS, pp. 120–124 (2012)Google Scholar
  4. 4.
    Elagouni, K., Garcia, C., Sébillot, P.: A comprehensive neural-based approach for text recognition in videos using natural language processing. In: ICMR (2011)Google Scholar
  5. 5.
    Gers, F., Schraudolph, N., Schmidhuber, J.: Learning precise timing with lstm recurrent networks. JMLR 3(1), 115–143 (2003)zbMATHMathSciNetGoogle Scholar
  6. 6.
    Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006)Google Scholar
  7. 7.
    Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. PAMI 31(5), 855–868 (2009)CrossRefGoogle Scholar
  8. 8.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8) (1997)Google Scholar
  9. 9.
    LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. In: The Handbook of Brain Theory and Neural Networks. MIT Press (1995)Google Scholar
  10. 10.
    Lienhart, R., Effelsberg, W.: Automatic text segmentation and text recognition for video indexing. Multimedia Systems 8(1), 69–81 (2000)CrossRefGoogle Scholar
  11. 11.
    Saidane, Z., Garcia, C.: Automatic scene text recognition using a convolutional neural network. In: ICBDAR, pp. 100–106 (2007)Google Scholar
  12. 12.
    Yi, J., Peng, Y., Xiao, J.: Using multiple frame integration for the text recognition of video. In: ICDAR, pp. 71–75 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Khaoula Elagouni
    • 1
    • 2
  • Christophe Garcia
    • 3
  • Franck Mamalet
    • 1
  • Pascale Sébillot
    • 2
  1. 1.Orange Labs R&DCesson SévignéFrance
  2. 2.IRISA, INSA de RennesRennesFrance
  3. 3.LIRIS, INSA de LyonVilleurbaneFrance

Personalised recommendations