Text recognition in scene image and video frame using Color Channel selection

Bhunia, Ayan Kumar; Kumar, Gautam; Roy, Partha Pratim; Balasubramanian, R.; Pal, Umapada

doi:10.1007/s11042-017-4750-6

Text recognition in scene image and video frame using Color Channel selection

Published: 05 May 2017

Volume 77, pages 8551–8578, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Ayan Kumar Bhunia¹,
Gautam Kumar²,
Partha Pratim Roy²,
R. Balasubramanian² &
…
Umapada Pal³

868 Accesses
28 Citations
1 Altmetric
Explore all metrics

Abstract

In recent years, recognition of text from natural scene image and video frame has got increased attention among the researchers due to its various complexities and challenges. Because of low resolution, blurring effect, complex background, different fonts, color and variant alignment of text within images and video frames, etc., text recognition in such scenario is difficult. Most of the current approaches usually apply a binarization algorithm to convert them into binary images and next OCR is applied to get the recognition result. In this paper, we present a novel approach based on color channel selection for text recognition from scene images and video frames. In the approach, at first, a color channel is automatically selected and then selected color channel is considered for text recognition. Our text recognition framework is based on Hidden Markov Model (HMM) which uses Pyramidal Histogram of Oriented Gradient features extracted from selected color channel. From each sliding window of a color channel our color-channel selection approach analyzes the image properties from the sliding window and then a multi-label Support Vector Machine (SVM) classifier is applied to select the color channel that will provide the best recognition results in the sliding window. This color channel selection for each sliding window has been found to be more fruitful than considering a single color channel for the whole word image. Five different features have been analyzed for multi-label SVM based color channel selection where wavelet transform based feature outperforms others. Our framework of color channel selection is script-independent. It has been tested in English (Roman) and Devanagari (Indic) scripts. We have tested our approach on English datasets (ICDAR 2003, ICDAR 2013, MSRA-TD500, IIIT5K, SVT, YVT) publicly available for both video and scene images. For Devanagari script, we collected our own dataset. The performances obtained from experimental results are encouraging and show the advantage of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A Systematic Survey on CAPTCHA Recognition: Types, Creation and Breaking Techniques

Article 14 June 2021

Mohinder Kumar, M. K. Jindal & Munish Kumar

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

Article 05 March 2020

Kanish Shah, Henil Patel, … Manan Shah

Scene Text Detection and Recognition: The Deep Learning Era

Article 27 August 2020

Shangbang Long, Xin He & Cong Yao

References

ABBYY FineReader 9.0. http://www.abbyy.com/
Alsharif O, Pineau J (2013) End-to-end text recognition with hybrid HMM maxout models. arXiv preprint arXiv:1310.1811
Bhunia AK, Das A, Roy PP, Pal U (2015) A comparative study of features of handwritten Bangla text recognition. In Proceedings of International Conference on Document Analysis and Recognition, pp. 636-640
Bissacco A, Cummins M, Netzer Y, Neven H (2013) PhotoOCR: reading text in uncontrolled conditions. In Proceedings of International Conference on Computer Vision, pp. 785-792
Bosch A, Zisserman A, Munoz X (2007) Representing shape with a spatial pyramid kernel. In Proceedings of ACM International Conference on Image and Video Retrieval, pp. 401-408
Chattopadhyay T, Reddy VR, Garain U (2013) Automatic selection of binarization method for robust OCR. In Proceedings of International Conference on Document Analysis and Recognition, pp. 1170-1174
Chen D, Odobez JM (2005) Video text recognition using sequential Monte Carlo and error voting methods. Pattern Recogn Lett 26(9):1386–1403
Article Google Scholar
Chen J, Shan S, He C, Zhao G, Pietikainen M, Chen X, Gao W (2010) WLD: a robust local image descriptor. IEEE Trans Pattern Anal Mach Intell 32(9):1705–1720
Article Google Scholar
Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. In Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 22-30
Gonzalez R. C., Woods R. E. (2006) Digital image processing(3rd Edition). Prentice-Hall, Upper Saddle River
Gonzalez A, Bergasa LM, Yebes JJ (2015) Text detection and recognition on traffic panels from street-level imagery using visual appearance. IEEE Trans Intell Transp Syst 16(3):228–238
Google Scholar
Greenhalgh J, Mirmehdi M (2015) Recognizing text-based traffic signs. IEEE Trans Intell Transp Syst 16(3):1360–1369
Article Google Scholar
Haralick RM, Shanmugam K (1973) Textural features for image classification. IEEE Trans Syst Man Cybern 3(6):610–621
Article Google Scholar
Huang R, Oba S, Shivakumara P, Uchida S (2012) Scene character detection and recognition based on multiple hypotheses framework. In Proceedings of International Conference on Pattern Recognition, pp. 717-720
Huang W, Lin Z, Yang J, Wang J (2013) Text localization in natural images using stroke feature transform and text covariance descriptors. In Proceedings of International Conference on Computer Vision, pp. 1241-1248
Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In Proceedings of European Conference on Computer Vision, pp. 512-528
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20
Article MathSciNet Google Scholar
Jain A, Peng X, Zhuang X, Natarajan P, Cao H (2014) Text detection and recognition in natural scenes and consumer videos. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, pp. 1245–1249
Jetley S, Behlhe S, Koppula VK, Nagi A (2012) Two-stage hybrid binarization around fringe map based text line segmentation for document images. In Proceedings of International Conference on Pattern Recognition, pp. 343-346
Karatzas D, Shafait F, Uchida S, Iwamura M, Gomez L, Robles S, Mas J, Fernandez D, Almazan J, de lasHeras, LP (2013) ICDAR 2013 robust reading competition. In Proceedings of International Conference on Document Analysis and Recognition, pp. 1484–1493
Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanow A, Iwamura M, Matas J, Neumann L, Chandrsekhar VR (2015) ICDAR 2015 competition on robust reading. In Proceedings of International Conference on Document Analysis and Recognition, pp. 1156–1160
Khare V, Shivakumara P, Raveendran P, Blumenstein M (2016) A blind deconvolution model for scene text detection and recognition in video. Pattern Recogn 54:128–148
Article Google Scholar
Leung T, Malik J (2001) Representing and recognizing the visual appearance of materials using three-dimensional textons. Int J Comput Vis 43(1):29–44
Article MATH Google Scholar
Liu C, Wechsler H (2002) Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans Image Process 11(4):467–476
Article Google Scholar
Liu L, Li W, Tang S, Gong W (2012) A novel separating strategy for face hallucination. In Proceedings of International Conference on Image Processing, pp. 1849-1852
Liu L, Wiliem A, Chen S, Lovell BC (2014) Automatic image attribute selection for zero-shot learning of object categories. In Proceedings of International Conference on Pattern Recognition, pp. 2619-2624
Liu M, Zhang D, Chen S (2014) Attribute relation learning for zero-shot classification. Neurocomputing 139:34–46
Article Google Scholar
Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) ICDAR 2003 robust reading competitions. In Proceedings of International Conference on Document Analysis and Recognition, pp. 682–687
Lucas S, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) ICDAR 2003 robust reading competitions. In Proceedings of International Conference on Document Analysis and Recognition, pp. 682–687
Mallat SG (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell 11(7):674–693
Article MATH Google Scholar
Mishra A, Alahari K, Jawahar CV (2012) Top-down and bottom-up cues for scene text recognition. In Proceedings of. Computer Vision and Pattern Recognition, pp. 2687–2694
Mittal A, Roy PP, Singh P, Balasubramanian R (2017) Rotation and script independent text detection from video frames using sub pixel mapping. J Vis Commun Image Represent 46:187–198
Article Google Scholar
Neuman L, Matas J (2010) A method for text localization and recognition in real world images. In Proceedings of Asian Conference on Computer Vision, pp. 770-783
Neumann L, Matas J (2012) Real-time scene text localization and recognition. In Proceedings of Computer Vision and Pattern Recognition, pp. 3538-3545
Nguyen P, Wang K, Belongie S (2014) Video text detection and recognition: dataset and benchmark. In Proceedings of Winter Conference on Applications of Computer Vision, pp. 776–783
Novikova MT, Barinova O, Kohli P, Lempitsky V (2012) Large-lexicon attribute-consistent text recognition in natural images. In Proceedings of European Conference on Computer Vision, pp. 752–765
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
Article MATH Google Scholar
Ojansivu V, Heikkila J (2008) Blur insensitive texture classification using local phase quantization. International conference on image and signal processing, pp. 236–243
Pal U, Roy PP, Tripathy N, Lladós J (2010) Multi-oriented Bangla and Devanagari text recognition. Pattern Recogn 43:4124–4136
Article MATH Google Scholar
Phan TQ, Shivakumara P, Tian S, Tan CL (2013) Recognizing text with perspective distortion in natural scenes. In Proceedings of International Conference on Computer Vision, pp. 569-576
Roy S, Shivakumara P, Roy PP, Tan CL (2012) Wavelet-gradient-fusion for video text binarization. In Proceedings of International Conference on Pattern Recognition, pp. 3300-3303
Roy PP, Pal U, Lladós J, Delalandre M (2012) Multi-oriented touching text character segmentation in graphical documents using dynamic programming. Pattern Recogn 45:1972–1983
Article Google Scholar
Roy PP, Pal U, Lladós J, Delalandre M (2012) Multi-oriented touching text character segmentation in graphical documents using dynamic programming. Pattern Recognit 45(5):1972–1983
Article Google Scholar
Roy S, Roy PP, Shivakumara P, Louloudis G, Tan CL, Pal U (2013) HMM-based multi oriented text recognition in natural scene image. In Proceedings of Asian Conference on Pattern Recognition, pp. 288–292
Roy S, Shivakumara P, Roy PP, Pal U, Tan CL, Lu T (2015) Bayesian classifier for multi-oriented video text recognition system. Expert Systems with Applications 42(13):5554–5566
Article Google Scholar
Roy S, Shivakumara P, Jalab HA, Ibrahim RW, Pal U, Lu T (2016) Fractional Poisson enhancement model for text detection and recognition in video frames. Pattern Recogn 52:433–447
Article Google Scholar
Saidane Z, Garcia C (2007) Robust binarization for video text recognition. In Proceedings of International Conference on Document Analysis and Recognition, pp. 874-879
Saidane Z, Gracia C (2007) Automatic scene text recognition using a convolutional neural network. In Proceedings of Camera-Based Document Analysis and Recognition, pp. 100-107
Shivakumara P, Raghavendra R, Qin L, Raja KB, Lu T, Pal U (2017) A new multi-modal approach to bib number/text detection and recognition in Marathon images. Pattern Recogn 61:479–491
Article Google Scholar
Tesseract. http://code.google.com/p/tesseract-ocr/
Tu Z, Chen X, Yuille AL, Zhu SC (2005) Image parsing: unifying segmentation, detection and recognition. Int J Comput Vis 61:113–140
Article Google Scholar
Wang K, Belongie S (2010) Word spotting in the wild. In Proceedings of European Conference on Computer Vision, pp. 591–604
Wang K, Babenko B, Belongie S (2011) End to end scene text recognition. In Proceedings of International Conference on Computer Vision, pp. 1457-1464
Wu Y, Shivakumara P, Lu T, Tan CL, Blumenstein M, Kumar GH (2016) Contour restoration of text components for recognition in video/scene images. IEEE Trans Image Process 25(12):5622–5634
Article MathSciNet Google Scholar
Xin L, Guo Y (2013) Active learning with multi-label SVM classification. In Proceedings of International Joint Conference on Artificial Intelligence
Yang H, Quehl B, Sack H (2014) A framework for improved video text detection and recognition. Multimedia Tools and Applications 69(1):217–245
Article Google Scholar
Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In Proceedings of Computer Vision and Pattern Recognition, pp. 1083-1090
Yao C, Bai X, Shi B, Liu W (2014) Strokelets: a learned multi-scale representation for scene text recognition. In Proceedings of Computer Vision and Pattern Recognition, pp. 4042-4049
Ye Q, Doermann DS (2015) Text detection and recognition in imagery: a survey. IEEE Trans Pattern Anal Mach Intell 37(7):1480–1500
Article Google Scholar
Zhang J, Liang J, Zhao H (2013) Local energy pattern for texture classification using self-adaptive quantization thresholds. IEEE Trans Image Process 22(1):31–42
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of ECE, Institute of Engineering & Management, Kolkata, India
Ayan Kumar Bhunia
Department of CSE, Indian Institute of Technology Roorkee, Roorkee, India
Gautam Kumar, Partha Pratim Roy & R. Balasubramanian
CVPR Unit, Indian Statistical Institute, Kolkata, India
Umapada Pal

Authors

Ayan Kumar Bhunia
View author publications
You can also search for this author in PubMed Google Scholar
Gautam Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Partha Pratim Roy
View author publications
You can also search for this author in PubMed Google Scholar
R. Balasubramanian
View author publications
You can also search for this author in PubMed Google Scholar
Umapada Pal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Partha Pratim Roy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bhunia, A.K., Kumar, G., Roy, P.P. et al. Text recognition in scene image and video frame using Color Channel selection. Multimed Tools Appl 77, 8551–8578 (2018). https://doi.org/10.1007/s11042-017-4750-6

Download citation

Received: 07 October 2016
Revised: 15 April 2017
Accepted: 24 April 2017
Published: 05 May 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s11042-017-4750-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Text recognition in scene image and video frame using Color Channel selection

Abstract

Access this article

Similar content being viewed by others

A Systematic Survey on CAPTCHA Recognition: Types, Creation and Breaking Techniques

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

Scene Text Detection and Recognition: The Deep Learning Era

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Text recognition in scene image and video frame using Color Channel selection

Abstract

Access this article

Similar content being viewed by others

A Systematic Survey on CAPTCHA Recognition: Types, Creation and Breaking Techniques

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

Scene Text Detection and Recognition: The Deep Learning Era

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation