Text Recognition in Videos Using a Recurrent Connectionist Approach
Most OCR (Optical Character Recognition) systems developed to recognize texts embedded in multimedia documents segment the text into characters before recognizing them. In this paper, we propose a novel approach able to avoid any explicit character segmentation. Using a multi-scale scanning scheme, texts extracted from videos are first represented by sequences of learnt features. Obtained representations are then used to feed a connectionist recurrent model specifically designed to take into account dependencies between successive learnt features and to recognize texts. The proposed video OCR evaluated on a database of TV news videos achieves very high recognition rates. Experiments also demonstrate that, for our recognition task, learnt feature representations perform better than hand-crafted features.
KeywordsVideo text recognition multi-scale image scanning ConvNet LSTM CTC
Unable to display preview. Download preview PDF.
- 2.Chen, D., Odobez, J., Bourlard, H.: Text detection and recognition in images and video frames. PR 37(3), 595–608 (2004)Google Scholar
- 3.Elagouni, K., Garcia, C., Mamalet, F., Sébillot, P.: Combining multi-scale character recognition and linguistic knowledge for natural scene text OCR. In: DAS, pp. 120–124 (2012)Google Scholar
- 4.Elagouni, K., Garcia, C., Sébillot, P.: A comprehensive neural-based approach for text recognition in videos using natural language processing. In: ICMR (2011)Google Scholar
- 6.Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006)Google Scholar
- 8.Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8) (1997)Google Scholar
- 9.LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. In: The Handbook of Brain Theory and Neural Networks. MIT Press (1995)Google Scholar
- 11.Saidane, Z., Garcia, C.: Automatic scene text recognition using a convolutional neural network. In: ICBDAR, pp. 100–106 (2007)Google Scholar
- 12.Yi, J., Peng, Y., Xiao, J.: Using multiple frame integration for the text recognition of video. In: ICDAR, pp. 71–75 (2009)Google Scholar