Skip to main content
Log in

A novel method for binarization of scene text images and its application in text identification

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

The aim of this article is twofold. First, we propose an effective methodology for binarization of scene images. For our present study, we use the publicly available ICDAR 2011 Born Digital Data set. We introduce a new concept of variance map of a gray-level image for detection of text boundary in an image. Based on this boundary information, the image is binarized by means of adaptive thresholding. This binarization procedure produces a number of connected components. Next, these connected components are examined in order to identify possible text components. In this context, a number of shape-based features that distinguish between text and non-text components are proposed. We consider text component identification as an one-class classification problem, i.e., the ground truth information for only the text class is available for the ICDAR 2011 Born Digital Data set. Then, the ground truth text components are used to obtain a certain statistical distribution of the shape-based features. Here, we observe that all the features may not follow a single family of distributions. Therefore, we construct a joint distribution by using multivariate Gaussian copula which allows a coupling of different marginal distributions. As our experiments suggest, the copula-based method is superior to multivariate Gaussian distribution in describing the feature distribution. Finally, a text connected component of an unknown class is subjected to the trained statistical model, and by performing a hypothesis test we successfully identify a possible text component. For a comparative study, we consider a number of state-of-the-art methods. Our proposed approach significantly outperforms most of these methods in terms of recall, precision and F-measure in both the binarization and text identification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Bhattacharya U, Parui SK, Mondal S (2009) Devanagari and bangla text extraction from natural scene images. In: Proc. of the int. conf. on document analysis and recognition, pp 171–175

  2. Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 8(6):679–698

    Article  Google Scholar 

  3. Clavelli A, Karatzas D, Lladós J (2010) A framework for the assessment of text extraction algorithms on complex colour images. In: Proceedings of the 9th IAPR international workshop on document analysis systems, DAS ’10. ACM, pp 19–26

  4. Dance CR, Seegar M (1999) On the evaluation of document analysis components by recall, precision, and accuracy. In: Proc. of the fifth int. conf. on document analysis and recognition, pp 713–716

  5. Figueiredo M, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 24(3):381–396

    Article  Google Scholar 

  6. Gatos B, Pratikakis I, Perantonis SJ (2008) Improved document image binarization by using a combination of multiple binarization techniques and adapted edge information. In: International conference on pattern recognition, ICPR ’08. IEEE, pp 1–4

  7. Ghoshal R, Roy A, Bhowmik TK, Parui SK (2011) Decision tree based recognition of bangla text from outdoor scene images. In: Proc. of the 18th international conference on neural information processing, pp 538–546

  8. Gllavata J, Ewerth R, Freisleben B (2004) Text detection in images based on unsupervised classification of high frequency wavelet coefficients. In: Proceedings of the international conference on pattern recognition, vol. 1, pp 425–428

  9. Gomez L, Karatzas D (2016) A fine-grained approach to scene text script identification. In: Proceedings of the 12th IAPR international workshop on document analysis systems, DAS ’16, pp 192–197

  10. Jung K, Kim IK, Kurata T, Kourogi M, Han HJ (2002) Text scanner with text detection technology on image sequences. In: Proceedings of the international conference on pattern recognition, vol. 3, pp 473–476

  11. Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar V (2015) Icdar 2015 robust reading competition-challenge 1: reading text in born-digital images (web and email). In: Proceedings of the 13th international conference of document analysis and recognition, ICDAR ’15. IEEE, pp 1156–1160

  12. Karatzas D, Robles S, Gomez L (2014) An on-line platform for ground truthing and performance evaluation of text extraction systems. In: Proceedings of the 11th IAPR international workshop on document analysis systems, pp 242–246

  13. Karatzas D, Robles Mestre S, Mas J, Nourbakhsh F, Roy PP (2011) Icdar 2011 robust reading competition-challenge 1: reading text in born-digital images (web and email). In: In Proc. 11th international conference of document analysis and recognition, ICDAR ’11. IEEE, pp 1485–1490

  14. Karatzas D, Shafait F, Uchida S, Iwamura M, Bigorda, LGi, Mestre SR, Mas J, Mota DF, Almazn JA, Heras LPdl (2013) Icdar 2013 robust reading competition. In: 12th international conference on document analysis and recognition, ICDAR ’13, pp 1484–1493

  15. Kumar D, Ramakrishnan AG (2012) Octymist:otsu-canny minimal spanning tree for born-digital images. In: Proceedings of the 10th IAPR international workshop on document analysis systems, DAS ’12, pp 389–393

  16. Li H, Doermann D (1998) Automatic identification of text in digital video key frames. In: Fourteenth international conference on pattern recognition, ICPR ’98. IEEE pp 129–132

  17. Liang J, Doermann D, Li H (2005) Camera based analysis of text and documents : a survey. Int J Doc Anal Recognit 7:84–104

    Article  Google Scholar 

  18. Lienhart R, Stuber F (1996) Automatic text recognition in digital videos. In: Image and video processing IV, proc. SPIE 2666, pp 180–188

  19. Liu CL, Koga M, Fujisawa H (2002) Lexicon-driven segmentation and recognition of handwritten character strings for japanese address reading. IEEE Trans Pattern Anal Mach Intell 24(11):1425–1437

    Article  Google Scholar 

  20. Lu S, Su B, Tan CL (2010) Document image binarization using background estimation and stroke edge. Int J Doc Anal Recogn 13(4):303–314

    Article  Google Scholar 

  21. Nelsen RB (2006) An introduction to copulas. Springer, Berlin

    MATH  Google Scholar 

  22. Niblack W (1986) An introduction to digital image processing. Prentice Hall, Englewood Cliffs

    Google Scholar 

  23. Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):377–393

    Article  Google Scholar 

  24. Roy A, Pal A, Garain U (2017) Jclmm: a finite mixture model for clustering of circular-linear data and its application to psoriatic plaque segmentation. Pattern Recognit 66:160–173

    Article  Google Scholar 

  25. Sauvola J, Pietikinen M (2000) Adaptive document image binarization. Pattern Recognit 2:225–236

    Article  Google Scholar 

  26. Shi B, Yao C, Zhang C, Guo X, Huang F, Bai X (2015) Automatic script identification in the wild. In: 13th international conference on document analysis and recognition, ICDAR ’15, pp 531 – 535

  27. Shivakumara P, Phan TQ, Tan CL (2011) A laplacian approach to multi-oriented text detection in video. IEEE Trans Pattern Anal Mach Intell 33(2):412–419

    Article  Google Scholar 

  28. Sobottka K, Bunke H, Kronenberg H (1999) Identification of text on colored book and journal covers. In: Proceedings of the international conference on document analysis and recognition, pp 57–63

  29. Tsai C, Lee H (2002) Binarization of color document images via luminance and saturation color features. IEEE Trans Image Process 11(4):434–451

    Article  Google Scholar 

  30. Wang QF, Yin F, Liu CL (2012) Handwritten chinese text recognition by integrating multiple contexts. IEEE Trans Pattern Anal Mach Intell 34(8):1469–1481

    Article  Google Scholar 

  31. Wu V, Manmatha R, Riseman EM (1999) Textfinder: an automatic system to detect and recognize text in images. IEEE Trans Pattern Anal Mach Intell 21(11):1224–1229

    Article  Google Scholar 

  32. Yin XC, Pei WY, Zhang J, Hao HW (2015) Multi-orientation scene text detection with adaptive clustering. IEEE Trans Pattern Anal Mach Intell 37(9):1930–1937

    Article  Google Scholar 

  33. Yin XC, Yin X, Huang K, Hao HW (2014) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ranjit Ghoshal.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghoshal, R., Roy, A., Banerjee, A. et al. A novel method for binarization of scene text images and its application in text identification. Pattern Anal Applic 22, 1361–1375 (2019). https://doi.org/10.1007/s10044-018-0687-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-018-0687-2

Keywords

Navigation