Post-processing of Video Text Detection

  • Tong Lu
  • Shivakumara Palaiahnakote
  • Chew Lim Tan
  • Wenyin Liu
Part of the Advances in Computer Vision and Pattern Recognition book series (ACVPR)


This chapter introduces methods for text binarization and recognition as post-processing for text detection. Binarization pertains to the separation of text from the background of the detected text block. As an example of binarization methods, this chapter presents a fusion method which combines wavelet and gradient bands for the text lines with the help of k-means clustering on every row and column to binarize the image. Since video and natural scene images suffer from low resolution and complex background, it is hard to develop effective binarization methods which preserve shapes of characters without losing text pixels. Therefore, this chapter further presents a method for character shape reconstruction using ring radius transform. The method obtains radius values for each pixel in the edge image of the input character image, which is based on distance to the nearest white pixel. The medial axis pixel is then found horizontally and vertically by selecting maximum radius values between the strokes. This medial axis value helps in filling the gap between end points while preserving the shape of the character.


Medial Axis Text Line Outer Contour Gradient Vector Flow Scene Text 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Doermann D, Liang J, Li H (2003) Progress in camera-based document image analysis. In: Proceedings of the ICDAR, pp 606–616Google Scholar
  2. 2.
    Zang J, Kasturi R (2008) Extraction of text objects in video documents: recent progress. In: Proceedings of the DAS, pp 5–17Google Scholar
  3. 3.
    Wang K, Belongie S (2010) Word spotting in the wild. In: Proceedings of the ECCV, pp 591–604Google Scholar
  4. 4.
    Tang X, Gao X, Liu J, Zhang H (2002) A spatial-temporal approach for video caption detection and recognition. IEEE Trans Neural Netw 13:961–971CrossRefGoogle Scholar
  5. 5.
    Lyu MR, Song J, Cai M (2005) A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Trans CSVT 15:243–255Google Scholar
  6. 6.
    Mishara A, Alahari K, Jawahar CV (2011) An MRF model for binarization of natural scene text. In: Proceedings of the ICDAR, pp 11–16Google Scholar
  7. 7.
    Neumann L, Matas J (2011) A method for text localization and recognition in real-world images. In: Proceedings of the ACCV, pp 770–783Google Scholar
  8. 8.
    Chen D, Odobez JM (2005) Video text recognition using sequential Monte Carlo and error voting methods. Pattern Recogn Lett 26:1386–1403CrossRefGoogle Scholar
  9. 9.
    Niblack W (1986) An introduction to digital image processing. Prentice Hall, Englewood CliffsGoogle Scholar
  10. 10.
    Sauvola J, Seeppanen T, Haapakoski S, Pietikainen M (1997) Adaptive document binarization. In: Proceedings of the ICDAR, pp 147–152Google Scholar
  11. 11.
    He J, Do QDM, Downton AC, Kim JH (2005) A comparison of binarization methods for historical archive documents. In: Proceedings of the ICDAR, pp 538–542Google Scholar
  12. 12.
    Ntirogiannis K, Gotos B, Pratikakis I (2011) Binarization of textual content in video frames. In: Proceedings of the ICDAR, pp 673–677Google Scholar
  13. 13.
    Saidane Z, Garcia C (2007) Robust binarization for video text recognition. In: Proceedings of the ICDAR, pp 874–879Google Scholar
  14. 14.
    Zhou Z, Li L, Tan CL (2010) Edge based binarization of video text images. In: Proceedings of the ICPR, pp 133–136Google Scholar
  15. 15.
    Roy S, Shivakumara P, Roy P, Tan CL (2012) Wavelet-gradient-fusion for video text binarization. In: Proceedings of the ICPR, pp 3300–3303Google Scholar
  16. 16.
    Shivakumara P, Phan TQ, Tan CL (2011) A Laplacian approach to multi-oriented text detection in video. IEEE Trans PAMI 33:412–419CrossRefGoogle Scholar
  17. 17.
    Pajares G, Cruz JM (2004) A wavelet-based image fusion tutorial. Pattern Recogn 37:1855–1872CrossRefGoogle Scholar
  18. 18.
  19. 19.
    Chen X, Yang J, Zhang J, Waibel A (2004) Automatic detection and recognition of signs from natural scenes. IEEE Trans Image Process 13Google Scholar
  20. 20.
    Zhou P, Li L, Tan CL (2009) Character recognition under severe perspective distortion. In: Proceedings of the ICDAR, pp 676–680Google Scholar
  21. 21.
    Pan YF, Hou X, Liu CL (2009) Text localization in natural scene images based on conditional random field. In: Proceedings of the ICDAR, pp 6–10Google Scholar
  22. 22.
    Pan YF, Hou X, Liu CL (2008) A robust system to detect and localize texts in natural scene images. In: Proceedings of the DAS, pp 35–42Google Scholar
  23. 23.
    Chen D, Odobez JM, Bourlard H (2004) Text detection and recognition in images and video frames. Pattern Recogn 37:595–608CrossRefGoogle Scholar
  24. 24.
    Lee SH, Kim JH (2008) Complementary combination of holistic and component analysis for recognition of low-resolution video character images. Pattern Recogn Lett 29:383–391CrossRefGoogle Scholar
  25. 25.
    Ghosh A, Petkov N (2005) Robustness of shape descriptors to incomplete contour representations. IEEE Trans PAMI 27:1793–1804CrossRefGoogle Scholar
  26. 26.
    Wang J, Yan H (1999) Mending broken handwriting with a macrostructure analysis method to improve recognition. Pattern Recogn Lett 20:855–864CrossRefGoogle Scholar
  27. 27.
    Yu D, Yan H (2001) Reconstruction of broken handwritten digits based on structural morphological features. Pattern Recogn 34:235–254CrossRefzbMATHGoogle Scholar
  28. 28.
    Allier B, Emptoz H (2002) Degraded character image restoration using active contours: a first approach. In: Proceedings of the ACM symposium on document engineering, pp 142–148Google Scholar
  29. 29.
    Allier B, Bali N, Emptoz H (2006) Automatic accurate broken character restoration for patrimonial documents. IJDAR 8:246–261CrossRefGoogle Scholar
  30. 30.
    Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: Proceedings of the CVPR, pp 2963–2970Google Scholar
  31. 31.
    Phan TQ, Shivakumara P, Tan CL (2011) A gradient vector flow-based method for video character segmentation. In: Proceedings of the ICDAR, pp 1024–1028Google Scholar
  32. 32.
    Shivakumara P, Ding Bei Hong, Zhao D, Lu S, Tan CL (2012) A new iterative-midpoint-method for video character gap filling. In: Proceedings of the ICPR, pp 673–676Google Scholar

Copyright information

© Springer-Verlag London 2014

Authors and Affiliations

  • Tong Lu
    • 1
  • Shivakumara Palaiahnakote
    • 2
  • Chew Lim Tan
    • 3
  • Wenyin Liu
    • 4
  1. 1.Department of Computer Science and TechnologyNanjing UniversityNanjingChina
  2. 2.Faculty of CSITUniversity of MalayaKuala LumpurMalaysia
  3. 3.National University of SingaporeSingaporeSingapore
  4. 4.Multimedia Software Engineering Research CenterCity University of Hong KongKowloon TongHong Kong SAR

Personalised recommendations