Post-processing of Video Text Detection

Lu, Tong; Palaiahnakote, Shivakumara; Tan, Chew Lim; Liu, Wenyin

doi:10.1007/978-1-4471-6515-6_5

Tong Lu⁷,
Shivakumara Palaiahnakote⁸,
Chew Lim Tan⁹ &
…
Wenyin Liu¹⁰

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

1095 Accesses

Abstract

This chapter introduces methods for text binarization and recognition as post-processing for text detection. Binarization pertains to the separation of text from the background of the detected text block. As an example of binarization methods, this chapter presents a fusion method which combines wavelet and gradient bands for the text lines with the help of k-means clustering on every row and column to binarize the image. Since video and natural scene images suffer from low resolution and complex background, it is hard to develop effective binarization methods which preserve shapes of characters without losing text pixels. Therefore, this chapter further presents a method for character shape reconstruction using ring radius transform. The method obtains radius values for each pixel in the edge image of the input character image, which is based on distance to the nearest white pixel. The medial axis pixel is then found horizontally and vertically by selecting maximum radius values between the strokes. This medial axis value helps in filling the gap between end points while preserving the shape of the character.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Doermann D, Liang J, Li H (2003) Progress in camera-based document image analysis. In: Proceedings of the ICDAR, pp 606–616
Google Scholar
Zang J, Kasturi R (2008) Extraction of text objects in video documents: recent progress. In: Proceedings of the DAS, pp 5–17
Google Scholar
Wang K, Belongie S (2010) Word spotting in the wild. In: Proceedings of the ECCV, pp 591–604
Google Scholar
Tang X, Gao X, Liu J, Zhang H (2002) A spatial-temporal approach for video caption detection and recognition. IEEE Trans Neural Netw 13:961–971
Article Google Scholar
Lyu MR, Song J, Cai M (2005) A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Trans CSVT 15:243–255
Google Scholar
Mishara A, Alahari K, Jawahar CV (2011) An MRF model for binarization of natural scene text. In: Proceedings of the ICDAR, pp 11–16
Google Scholar
Neumann L, Matas J (2011) A method for text localization and recognition in real-world images. In: Proceedings of the ACCV, pp 770–783
Google Scholar
Chen D, Odobez JM (2005) Video text recognition using sequential Monte Carlo and error voting methods. Pattern Recogn Lett 26:1386–1403
Article Google Scholar
Niblack W (1986) An introduction to digital image processing. Prentice Hall, Englewood Cliffs
Google Scholar
Sauvola J, Seeppanen T, Haapakoski S, Pietikainen M (1997) Adaptive document binarization. In: Proceedings of the ICDAR, pp 147–152
Google Scholar
He J, Do QDM, Downton AC, Kim JH (2005) A comparison of binarization methods for historical archive documents. In: Proceedings of the ICDAR, pp 538–542
Google Scholar
Ntirogiannis K, Gotos B, Pratikakis I (2011) Binarization of textual content in video frames. In: Proceedings of the ICDAR, pp 673–677
Google Scholar
Saidane Z, Garcia C (2007) Robust binarization for video text recognition. In: Proceedings of the ICDAR, pp 874–879
Google Scholar
Zhou Z, Li L, Tan CL (2010) Edge based binarization of video text images. In: Proceedings of the ICPR, pp 133–136
Google Scholar
Roy S, Shivakumara P, Roy P, Tan CL (2012) Wavelet-gradient-fusion for video text binarization. In: Proceedings of the ICPR, pp 3300–3303
Google Scholar
Shivakumara P, Phan TQ, Tan CL (2011) A Laplacian approach to multi-oriented text detection in video. IEEE Trans PAMI 33:412–419
Article Google Scholar
Pajares G, Cruz JM (2004) A wavelet-based image fusion tutorial. Pattern Recogn 37:1855–1872
Article Google Scholar
Tesseract http://code.google.com/p/tesseract-ocr/
Chen X, Yang J, Zhang J, Waibel A (2004) Automatic detection and recognition of signs from natural scenes. IEEE Trans Image Process 13
Google Scholar
Zhou P, Li L, Tan CL (2009) Character recognition under severe perspective distortion. In: Proceedings of the ICDAR, pp 676–680
Google Scholar
Pan YF, Hou X, Liu CL (2009) Text localization in natural scene images based on conditional random field. In: Proceedings of the ICDAR, pp 6–10
Google Scholar
Pan YF, Hou X, Liu CL (2008) A robust system to detect and localize texts in natural scene images. In: Proceedings of the DAS, pp 35–42
Google Scholar
Chen D, Odobez JM, Bourlard H (2004) Text detection and recognition in images and video frames. Pattern Recogn 37:595–608
Article Google Scholar
Lee SH, Kim JH (2008) Complementary combination of holistic and component analysis for recognition of low-resolution video character images. Pattern Recogn Lett 29:383–391
Article Google Scholar
Ghosh A, Petkov N (2005) Robustness of shape descriptors to incomplete contour representations. IEEE Trans PAMI 27:1793–1804
Article Google Scholar
Wang J, Yan H (1999) Mending broken handwriting with a macrostructure analysis method to improve recognition. Pattern Recogn Lett 20:855–864
Article Google Scholar
Yu D, Yan H (2001) Reconstruction of broken handwritten digits based on structural morphological features. Pattern Recogn 34:235–254
Article MATH Google Scholar
Allier B, Emptoz H (2002) Degraded character image restoration using active contours: a first approach. In: Proceedings of the ACM symposium on document engineering, pp 142–148
Google Scholar
Allier B, Bali N, Emptoz H (2006) Automatic accurate broken character restoration for patrimonial documents. IJDAR 8:246–261
Article Google Scholar
Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: Proceedings of the CVPR, pp 2963–2970
Google Scholar
Phan TQ, Shivakumara P, Tan CL (2011) A gradient vector flow-based method for video character segmentation. In: Proceedings of the ICDAR, pp 1024–1028
Google Scholar
Shivakumara P, Ding Bei Hong, Zhao D, Lu S, Tan CL (2012) A new iterative-midpoint-method for video character gap filling. In: Proceedings of the ICPR, pp 673–676
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Nanjing University, Nanjing, China
Tong Lu
Faculty of CSIT, University of Malaya, Kuala Lumpur, Malaysia
Shivakumara Palaiahnakote
National University of Singapore, Singapore, Singapore
Chew Lim Tan
Multimedia Software Engineering Research Center, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
Wenyin Liu

Authors

Tong Lu
View author publications
You can also search for this author in PubMed Google Scholar
Shivakumara Palaiahnakote
View author publications
You can also search for this author in PubMed Google Scholar
Chew Lim Tan
View author publications
You can also search for this author in PubMed Google Scholar
Wenyin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lu, T., Palaiahnakote, S., Tan, C.L., Liu, W. (2014). Post-processing of Video Text Detection. In: Video Text Detection. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-6515-6_5

Download citation

DOI: https://doi.org/10.1007/978-1-4471-6515-6_5
Published: 30 June 2014
Publisher Name: Springer, London
Print ISBN: 978-1-4471-6514-9
Online ISBN: 978-1-4471-6515-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics