Abstract
In recent years, recognition of text from natural scene image and video frame has got increased attention among the researchers due to its various complexities and challenges. Because of low resolution, blurring effect, complex background, different fonts, color and variant alignment of text within images and video frames, etc., text recognition in such scenario is difficult. Most of the current approaches usually apply a binarization algorithm to convert them into binary images and next OCR is applied to get the recognition result. In this paper, we present a novel approach based on color channel selection for text recognition from scene images and video frames. In the approach, at first, a color channel is automatically selected and then selected color channel is considered for text recognition. Our text recognition framework is based on Hidden Markov Model (HMM) which uses Pyramidal Histogram of Oriented Gradient features extracted from selected color channel. From each sliding window of a color channel our color-channel selection approach analyzes the image properties from the sliding window and then a multi-label Support Vector Machine (SVM) classifier is applied to select the color channel that will provide the best recognition results in the sliding window. This color channel selection for each sliding window has been found to be more fruitful than considering a single color channel for the whole word image. Five different features have been analyzed for multi-label SVM based color channel selection where wavelet transform based feature outperforms others. Our framework of color channel selection is script-independent. It has been tested in English (Roman) and Devanagari (Indic) scripts. We have tested our approach on English datasets (ICDAR 2003, ICDAR 2013, MSRA-TD500, IIIT5K, SVT, YVT) publicly available for both video and scene images. For Devanagari script, we collected our own dataset. The performances obtained from experimental results are encouraging and show the advantage of the proposed method.
Similar content being viewed by others
References
ABBYY FineReader 9.0. http://www.abbyy.com/
Alsharif O, Pineau J (2013) End-to-end text recognition with hybrid HMM maxout models. arXiv preprint arXiv:1310.1811
Bhunia AK, Das A, Roy PP, Pal U (2015) A comparative study of features of handwritten Bangla text recognition. In Proceedings of International Conference on Document Analysis and Recognition, pp. 636-640
Bissacco A, Cummins M, Netzer Y, Neven H (2013) PhotoOCR: reading text in uncontrolled conditions. In Proceedings of International Conference on Computer Vision, pp. 785-792
Bosch A, Zisserman A, Munoz X (2007) Representing shape with a spatial pyramid kernel. In Proceedings of ACM International Conference on Image and Video Retrieval, pp. 401-408
Chattopadhyay T, Reddy VR, Garain U (2013) Automatic selection of binarization method for robust OCR. In Proceedings of International Conference on Document Analysis and Recognition, pp. 1170-1174
Chen D, Odobez JM (2005) Video text recognition using sequential Monte Carlo and error voting methods. Pattern Recogn Lett 26(9):1386–1403
Chen J, Shan S, He C, Zhao G, Pietikainen M, Chen X, Gao W (2010) WLD: a robust local image descriptor. IEEE Trans Pattern Anal Mach Intell 32(9):1705–1720
Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. In Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 22-30
Gonzalez R. C., Woods R. E. (2006) Digital image processing(3rd Edition). Prentice-Hall, Upper Saddle River
Gonzalez A, Bergasa LM, Yebes JJ (2015) Text detection and recognition on traffic panels from street-level imagery using visual appearance. IEEE Trans Intell Transp Syst 16(3):228–238
Greenhalgh J, Mirmehdi M (2015) Recognizing text-based traffic signs. IEEE Trans Intell Transp Syst 16(3):1360–1369
Haralick RM, Shanmugam K (1973) Textural features for image classification. IEEE Trans Syst Man Cybern 3(6):610–621
Huang R, Oba S, Shivakumara P, Uchida S (2012) Scene character detection and recognition based on multiple hypotheses framework. In Proceedings of International Conference on Pattern Recognition, pp. 717-720
Huang W, Lin Z, Yang J, Wang J (2013) Text localization in natural images using stroke feature transform and text covariance descriptors. In Proceedings of International Conference on Computer Vision, pp. 1241-1248
Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In Proceedings of European Conference on Computer Vision, pp. 512-528
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20
Jain A, Peng X, Zhuang X, Natarajan P, Cao H (2014) Text detection and recognition in natural scenes and consumer videos. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, pp. 1245–1249
Jetley S, Behlhe S, Koppula VK, Nagi A (2012) Two-stage hybrid binarization around fringe map based text line segmentation for document images. In Proceedings of International Conference on Pattern Recognition, pp. 343-346
Karatzas D, Shafait F, Uchida S, Iwamura M, Gomez L, Robles S, Mas J, Fernandez D, Almazan J, de lasHeras, LP (2013) ICDAR 2013 robust reading competition. In Proceedings of International Conference on Document Analysis and Recognition, pp. 1484–1493
Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanow A, Iwamura M, Matas J, Neumann L, Chandrsekhar VR (2015) ICDAR 2015 competition on robust reading. In Proceedings of International Conference on Document Analysis and Recognition, pp. 1156–1160
Khare V, Shivakumara P, Raveendran P, Blumenstein M (2016) A blind deconvolution model for scene text detection and recognition in video. Pattern Recogn 54:128–148
Leung T, Malik J (2001) Representing and recognizing the visual appearance of materials using three-dimensional textons. Int J Comput Vis 43(1):29–44
Liu C, Wechsler H (2002) Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans Image Process 11(4):467–476
Liu L, Li W, Tang S, Gong W (2012) A novel separating strategy for face hallucination. In Proceedings of International Conference on Image Processing, pp. 1849-1852
Liu L, Wiliem A, Chen S, Lovell BC (2014) Automatic image attribute selection for zero-shot learning of object categories. In Proceedings of International Conference on Pattern Recognition, pp. 2619-2624
Liu M, Zhang D, Chen S (2014) Attribute relation learning for zero-shot classification. Neurocomputing 139:34–46
Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) ICDAR 2003 robust reading competitions. In Proceedings of International Conference on Document Analysis and Recognition, pp. 682–687
Lucas S, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) ICDAR 2003 robust reading competitions. In Proceedings of International Conference on Document Analysis and Recognition, pp. 682–687
Mallat SG (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell 11(7):674–693
Mishra A, Alahari K, Jawahar CV (2012) Top-down and bottom-up cues for scene text recognition. In Proceedings of. Computer Vision and Pattern Recognition, pp. 2687–2694
Mittal A, Roy PP, Singh P, Balasubramanian R (2017) Rotation and script independent text detection from video frames using sub pixel mapping. J Vis Commun Image Represent 46:187–198
Neuman L, Matas J (2010) A method for text localization and recognition in real world images. In Proceedings of Asian Conference on Computer Vision, pp. 770-783
Neumann L, Matas J (2012) Real-time scene text localization and recognition. In Proceedings of Computer Vision and Pattern Recognition, pp. 3538-3545
Nguyen P, Wang K, Belongie S (2014) Video text detection and recognition: dataset and benchmark. In Proceedings of Winter Conference on Applications of Computer Vision, pp. 776–783
Novikova MT, Barinova O, Kohli P, Lempitsky V (2012) Large-lexicon attribute-consistent text recognition in natural images. In Proceedings of European Conference on Computer Vision, pp. 752–765
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
Ojansivu V, Heikkila J (2008) Blur insensitive texture classification using local phase quantization. International conference on image and signal processing, pp. 236–243
Pal U, Roy PP, Tripathy N, Lladós J (2010) Multi-oriented Bangla and Devanagari text recognition. Pattern Recogn 43:4124–4136
Phan TQ, Shivakumara P, Tian S, Tan CL (2013) Recognizing text with perspective distortion in natural scenes. In Proceedings of International Conference on Computer Vision, pp. 569-576
Roy S, Shivakumara P, Roy PP, Tan CL (2012) Wavelet-gradient-fusion for video text binarization. In Proceedings of International Conference on Pattern Recognition, pp. 3300-3303
Roy PP, Pal U, Lladós J, Delalandre M (2012) Multi-oriented touching text character segmentation in graphical documents using dynamic programming. Pattern Recogn 45:1972–1983
Roy PP, Pal U, Lladós J, Delalandre M (2012) Multi-oriented touching text character segmentation in graphical documents using dynamic programming. Pattern Recognit 45(5):1972–1983
Roy S, Roy PP, Shivakumara P, Louloudis G, Tan CL, Pal U (2013) HMM-based multi oriented text recognition in natural scene image. In Proceedings of Asian Conference on Pattern Recognition, pp. 288–292
Roy S, Shivakumara P, Roy PP, Pal U, Tan CL, Lu T (2015) Bayesian classifier for multi-oriented video text recognition system. Expert Systems with Applications 42(13):5554–5566
Roy S, Shivakumara P, Jalab HA, Ibrahim RW, Pal U, Lu T (2016) Fractional Poisson enhancement model for text detection and recognition in video frames. Pattern Recogn 52:433–447
Saidane Z, Garcia C (2007) Robust binarization for video text recognition. In Proceedings of International Conference on Document Analysis and Recognition, pp. 874-879
Saidane Z, Gracia C (2007) Automatic scene text recognition using a convolutional neural network. In Proceedings of Camera-Based Document Analysis and Recognition, pp. 100-107
Shivakumara P, Raghavendra R, Qin L, Raja KB, Lu T, Pal U (2017) A new multi-modal approach to bib number/text detection and recognition in Marathon images. Pattern Recogn 61:479–491
Tesseract. http://code.google.com/p/tesseract-ocr/
Tu Z, Chen X, Yuille AL, Zhu SC (2005) Image parsing: unifying segmentation, detection and recognition. Int J Comput Vis 61:113–140
Wang K, Belongie S (2010) Word spotting in the wild. In Proceedings of European Conference on Computer Vision, pp. 591–604
Wang K, Babenko B, Belongie S (2011) End to end scene text recognition. In Proceedings of International Conference on Computer Vision, pp. 1457-1464
Wu Y, Shivakumara P, Lu T, Tan CL, Blumenstein M, Kumar GH (2016) Contour restoration of text components for recognition in video/scene images. IEEE Trans Image Process 25(12):5622–5634
Xin L, Guo Y (2013) Active learning with multi-label SVM classification. In Proceedings of International Joint Conference on Artificial Intelligence
Yang H, Quehl B, Sack H (2014) A framework for improved video text detection and recognition. Multimedia Tools and Applications 69(1):217–245
Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In Proceedings of Computer Vision and Pattern Recognition, pp. 1083-1090
Yao C, Bai X, Shi B, Liu W (2014) Strokelets: a learned multi-scale representation for scene text recognition. In Proceedings of Computer Vision and Pattern Recognition, pp. 4042-4049
Ye Q, Doermann DS (2015) Text detection and recognition in imagery: a survey. IEEE Trans Pattern Anal Mach Intell 37(7):1480–1500
Zhang J, Liang J, Zhao H (2013) Local energy pattern for texture classification using self-adaptive quantization thresholds. IEEE Trans Image Process 22(1):31–42
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bhunia, A.K., Kumar, G., Roy, P.P. et al. Text recognition in scene image and video frame using Color Channel selection. Multimed Tools Appl 77, 8551–8578 (2018). https://doi.org/10.1007/s11042-017-4750-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-4750-6