Advertisement

Multimedia Tools and Applications

, Volume 77, Issue 7, pp 8551–8578 | Cite as

Text recognition in scene image and video frame using Color Channel selection

  • Ayan Kumar Bhunia
  • Gautam Kumar
  • Partha Pratim Roy
  • R. Balasubramanian
  • Umapada Pal
Article

Abstract

In recent years, recognition of text from natural scene image and video frame has got increased attention among the researchers due to its various complexities and challenges. Because of low resolution, blurring effect, complex background, different fonts, color and variant alignment of text within images and video frames, etc., text recognition in such scenario is difficult. Most of the current approaches usually apply a binarization algorithm to convert them into binary images and next OCR is applied to get the recognition result. In this paper, we present a novel approach based on color channel selection for text recognition from scene images and video frames. In the approach, at first, a color channel is automatically selected and then selected color channel is considered for text recognition. Our text recognition framework is based on Hidden Markov Model (HMM) which uses Pyramidal Histogram of Oriented Gradient features extracted from selected color channel. From each sliding window of a color channel our color-channel selection approach analyzes the image properties from the sliding window and then a multi-label Support Vector Machine (SVM) classifier is applied to select the color channel that will provide the best recognition results in the sliding window. This color channel selection for each sliding window has been found to be more fruitful than considering a single color channel for the whole word image. Five different features have been analyzed for multi-label SVM based color channel selection where wavelet transform based feature outperforms others. Our framework of color channel selection is script-independent. It has been tested in English (Roman) and Devanagari (Indic) scripts. We have tested our approach on English datasets (ICDAR 2003, ICDAR 2013, MSRA-TD500, IIIT5K, SVT, YVT) publicly available for both video and scene images. For Devanagari script, we collected our own dataset. The performances obtained from experimental results are encouraging and show the advantage of the proposed method.

Keywords

Scene text recognition Color channel selection Hidden Markov model Multi script recognition 

References

  1. 1.
    ABBYY FineReader 9.0. http://www.abbyy.com/
  2. 2.
    Alsharif O, Pineau J (2013) End-to-end text recognition with hybrid HMM maxout models. arXiv preprint arXiv:1310.1811Google Scholar
  3. 3.
    Bhunia AK, Das A, Roy PP, Pal U (2015) A comparative study of features of handwritten Bangla text recognition. In Proceedings of International Conference on Document Analysis and Recognition, pp. 636-640Google Scholar
  4. 4.
    Bissacco A, Cummins M, Netzer Y, Neven H (2013) PhotoOCR: reading text in uncontrolled conditions. In Proceedings of International Conference on Computer Vision, pp. 785-792Google Scholar
  5. 5.
    Bosch A, Zisserman A, Munoz X (2007) Representing shape with a spatial pyramid kernel. In Proceedings of ACM International Conference on Image and Video Retrieval, pp. 401-408Google Scholar
  6. 6.
    Chattopadhyay T, Reddy VR, Garain U (2013) Automatic selection of binarization method for robust OCR. In Proceedings of International Conference on Document Analysis and Recognition, pp. 1170-1174Google Scholar
  7. 7.
    Chen D, Odobez JM (2005) Video text recognition using sequential Monte Carlo and error voting methods. Pattern Recogn Lett 26(9):1386–1403CrossRefGoogle Scholar
  8. 8.
    Chen J, Shan S, He C, Zhao G, Pietikainen M, Chen X, Gao W (2010) WLD: a robust local image descriptor. IEEE Trans Pattern Anal Mach Intell 32(9):1705–1720CrossRefGoogle Scholar
  9. 9.
    Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. In Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 22-30Google Scholar
  10. 10.
    Gonzalez R. C., Woods R. E. (2006) Digital image processing(3rd Edition). Prentice-Hall, Upper Saddle RiverGoogle Scholar
  11. 11.
    Gonzalez A, Bergasa LM, Yebes JJ (2015) Text detection and recognition on traffic panels from street-level imagery using visual appearance. IEEE Trans Intell Transp Syst 16(3):228–238Google Scholar
  12. 12.
    Greenhalgh J, Mirmehdi M (2015) Recognizing text-based traffic signs. IEEE Trans Intell Transp Syst 16(3):1360–1369CrossRefGoogle Scholar
  13. 13.
    Haralick RM, Shanmugam K (1973) Textural features for image classification. IEEE Trans Syst Man Cybern 3(6):610–621CrossRefGoogle Scholar
  14. 14.
    Huang R, Oba S, Shivakumara P, Uchida S (2012) Scene character detection and recognition based on multiple hypotheses framework. In Proceedings of International Conference on Pattern Recognition, pp. 717-720Google Scholar
  15. 15.
    Huang W, Lin Z, Yang J, Wang J (2013) Text localization in natural images using stroke feature transform and text covariance descriptors. In Proceedings of International Conference on Computer Vision, pp. 1241-1248Google Scholar
  16. 16.
    Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In Proceedings of European Conference on Computer Vision, pp. 512-528Google Scholar
  17. 17.
    Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20MathSciNetCrossRefGoogle Scholar
  18. 18.
    Jain A, Peng X, Zhuang X, Natarajan P, Cao H (2014) Text detection and recognition in natural scenes and consumer videos. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, pp. 1245–1249Google Scholar
  19. 19.
    Jetley S, Behlhe S, Koppula VK, Nagi A (2012) Two-stage hybrid binarization around fringe map based text line segmentation for document images. In Proceedings of International Conference on Pattern Recognition, pp. 343-346Google Scholar
  20. 20.
    Karatzas D, Shafait F, Uchida S, Iwamura M, Gomez L, Robles S, Mas J, Fernandez D, Almazan J, de lasHeras, LP (2013) ICDAR 2013 robust reading competition. In Proceedings of International Conference on Document Analysis and Recognition, pp. 1484–1493Google Scholar
  21. 21.
    Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanow A, Iwamura M, Matas J, Neumann L, Chandrsekhar VR (2015) ICDAR 2015 competition on robust reading. In Proceedings of International Conference on Document Analysis and Recognition, pp. 1156–1160Google Scholar
  22. 22.
    Khare V, Shivakumara P, Raveendran P, Blumenstein M (2016) A blind deconvolution model for scene text detection and recognition in video. Pattern Recogn 54:128–148CrossRefGoogle Scholar
  23. 23.
    Leung T, Malik J (2001) Representing and recognizing the visual appearance of materials using three-dimensional textons. Int J Comput Vis 43(1):29–44CrossRefzbMATHGoogle Scholar
  24. 24.
    Liu C, Wechsler H (2002) Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans Image Process 11(4):467–476CrossRefGoogle Scholar
  25. 25.
    Liu L, Li W, Tang S, Gong W (2012) A novel separating strategy for face hallucination. In Proceedings of International Conference on Image Processing, pp. 1849-1852Google Scholar
  26. 26.
    Liu L, Wiliem A, Chen S, Lovell BC (2014) Automatic image attribute selection for zero-shot learning of object categories. In Proceedings of International Conference on Pattern Recognition, pp. 2619-2624Google Scholar
  27. 27.
    Liu M, Zhang D, Chen S (2014) Attribute relation learning for zero-shot classification. Neurocomputing 139:34–46CrossRefGoogle Scholar
  28. 28.
    Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) ICDAR 2003 robust reading competitions. In Proceedings of International Conference on Document Analysis and Recognition, pp. 682–687Google Scholar
  29. 29.
    Lucas S, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) ICDAR 2003 robust reading competitions. In Proceedings of International Conference on Document Analysis and Recognition, pp. 682–687Google Scholar
  30. 30.
    Mallat SG (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell 11(7):674–693CrossRefzbMATHGoogle Scholar
  31. 31.
    Mishra A, Alahari K, Jawahar CV (2012) Top-down and bottom-up cues for scene text recognition. In Proceedings of. Computer Vision and Pattern Recognition, pp. 2687–2694Google Scholar
  32. 32.
    Mittal A, Roy PP, Singh P, Balasubramanian R (2017) Rotation and script independent text detection from video frames using sub pixel mapping. J Vis Commun Image Represent 46:187–198CrossRefGoogle Scholar
  33. 33.
    Neuman L, Matas J (2010) A method for text localization and recognition in real world images. In Proceedings of Asian Conference on Computer Vision, pp. 770-783Google Scholar
  34. 34.
    Neumann L, Matas J (2012) Real-time scene text localization and recognition. In Proceedings of Computer Vision and Pattern Recognition, pp. 3538-3545Google Scholar
  35. 35.
    Nguyen P, Wang K, Belongie S (2014) Video text detection and recognition: dataset and benchmark. In Proceedings of Winter Conference on Applications of Computer Vision, pp. 776–783Google Scholar
  36. 36.
    Novikova MT, Barinova O, Kohli P, Lempitsky V (2012) Large-lexicon attribute-consistent text recognition in natural images. In Proceedings of European Conference on Computer Vision, pp. 752–765Google Scholar
  37. 37.
    Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987CrossRefzbMATHGoogle Scholar
  38. 38.
    Ojansivu V, Heikkila J (2008) Blur insensitive texture classification using local phase quantization. International conference on image and signal processing, pp. 236–243Google Scholar
  39. 39.
    Pal U, Roy PP, Tripathy N, Lladós J (2010) Multi-oriented Bangla and Devanagari text recognition. Pattern Recogn 43:4124–4136CrossRefzbMATHGoogle Scholar
  40. 40.
    Phan TQ, Shivakumara P, Tian S, Tan CL (2013) Recognizing text with perspective distortion in natural scenes. In Proceedings of International Conference on Computer Vision, pp. 569-576Google Scholar
  41. 41.
    Roy S, Shivakumara P, Roy PP, Tan CL (2012) Wavelet-gradient-fusion for video text binarization. In Proceedings of International Conference on Pattern Recognition, pp. 3300-3303Google Scholar
  42. 42.
    Roy PP, Pal U, Lladós J, Delalandre M (2012) Multi-oriented touching text character segmentation in graphical documents using dynamic programming. Pattern Recogn 45:1972–1983CrossRefGoogle Scholar
  43. 43.
    Roy PP, Pal U, Lladós J, Delalandre M (2012) Multi-oriented touching text character segmentation in graphical documents using dynamic programming. Pattern Recognit 45(5):1972–1983CrossRefGoogle Scholar
  44. 44.
    Roy S, Roy PP, Shivakumara P, Louloudis G, Tan CL, Pal U (2013) HMM-based multi oriented text recognition in natural scene image. In Proceedings of Asian Conference on Pattern Recognition, pp. 288–292Google Scholar
  45. 45.
    Roy S, Shivakumara P, Roy PP, Pal U, Tan CL, Lu T (2015) Bayesian classifier for multi-oriented video text recognition system. Expert Systems with Applications 42(13):5554–5566CrossRefGoogle Scholar
  46. 46.
    Roy S, Shivakumara P, Jalab HA, Ibrahim RW, Pal U, Lu T (2016) Fractional Poisson enhancement model for text detection and recognition in video frames. Pattern Recogn 52:433–447CrossRefGoogle Scholar
  47. 47.
    Saidane Z, Garcia C (2007) Robust binarization for video text recognition. In Proceedings of International Conference on Document Analysis and Recognition, pp. 874-879Google Scholar
  48. 48.
    Saidane Z, Gracia C (2007) Automatic scene text recognition using a convolutional neural network. In Proceedings of Camera-Based Document Analysis and Recognition, pp. 100-107Google Scholar
  49. 49.
    Shivakumara P, Raghavendra R, Qin L, Raja KB, Lu T, Pal U (2017) A new multi-modal approach to bib number/text detection and recognition in Marathon images. Pattern Recogn 61:479–491CrossRefGoogle Scholar
  50. 50.
  51. 51.
    Tu Z, Chen X, Yuille AL, Zhu SC (2005) Image parsing: unifying segmentation, detection and recognition. Int J Comput Vis 61:113–140CrossRefGoogle Scholar
  52. 52.
    Wang K, Belongie S (2010) Word spotting in the wild. In Proceedings of European Conference on Computer Vision, pp. 591–604Google Scholar
  53. 53.
    Wang K, Babenko B, Belongie S (2011) End to end scene text recognition. In Proceedings of International Conference on Computer Vision, pp. 1457-1464Google Scholar
  54. 54.
    Wu Y, Shivakumara P, Lu T, Tan CL, Blumenstein M, Kumar GH (2016) Contour restoration of text components for recognition in video/scene images. IEEE Trans Image Process 25(12):5622–5634MathSciNetCrossRefGoogle Scholar
  55. 55.
    Xin L, Guo Y (2013) Active learning with multi-label SVM classification. In Proceedings of International Joint Conference on Artificial IntelligenceGoogle Scholar
  56. 56.
    Yang H, Quehl B, Sack H (2014) A framework for improved video text detection and recognition. Multimedia Tools and Applications 69(1):217–245CrossRefGoogle Scholar
  57. 57.
    Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In Proceedings of Computer Vision and Pattern Recognition, pp. 1083-1090Google Scholar
  58. 58.
    Yao C, Bai X, Shi B, Liu W (2014) Strokelets: a learned multi-scale representation for scene text recognition. In Proceedings of Computer Vision and Pattern Recognition, pp. 4042-4049Google Scholar
  59. 59.
    Ye Q, Doermann DS (2015) Text detection and recognition in imagery: a survey. IEEE Trans Pattern Anal Mach Intell 37(7):1480–1500CrossRefGoogle Scholar
  60. 60.
    Zhang J, Liang J, Zhao H (2013) Local energy pattern for texture classification using self-adaptive quantization thresholds. IEEE Trans Image Process 22(1):31–42MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  • Ayan Kumar Bhunia
    • 1
  • Gautam Kumar
    • 2
  • Partha Pratim Roy
    • 2
  • R. Balasubramanian
    • 2
  • Umapada Pal
    • 3
  1. 1.Department of ECEInstitute of Engineering & ManagementKolkataIndia
  2. 2.Department of CSEIndian Institute of Technology RoorkeeRoorkeeIndia
  3. 3.CVPR UnitIndian Statistical InstituteKolkataIndia

Personalised recommendations