An Efficient Technique for Detection and Removal of Lines with Text Stroke Crossings in Document Images

Shobha Rani, N.; Vasudev, T.

doi:10.1007/978-981-10-5146-3_9

N. Shobha Rani⁶ &
T. Vasudev⁷

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 14))

715 Accesses
7 Citations

Abstract

Precise automatic reading of the characters in a document image is the functionality of Optical Character Recognition (OCR) systems. The overall recognition accuracy can be accomplished only through efficient pre-processing procedures. The recognition of characters in pre-printed document images is a highly challenging task as it desires unique pre-processing methods and it depends on the layout of document. In this paper we propose a pre-processing technique for removal of horizontal/vertical lines in the pre-printed documents. The major challenge involved in removal of the horizontal lines is retention of the pixels overlapped between line and characters in document. The proposed algorithm works in two phases; image enhancement and line detection is made in the first phase and the second phase comprises convolution process using rectangular structuring element for detection of text stroke crossings on lines which are detected in phase one. The output image is further subjected to undergo post enhancement and analysis operations using connected component analysis and area features for removal of broken/dotted line structures. The experimental outcomes achieved are quite satisfactory and consistent enough for subsequent processing of document.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zheng Y, Li H, Doermann D (2005) A parallel-line detection algorithm based on HMM decoding. Pattern Anal Mach Intell IEEE Trans 27(5):777–792
Article Google Scholar
Zheng Y, Li H, Doermann D (2003) A model-based line detection algorithm in documents. In: Proceedings of seventh international conference on document analysis and recognition, 2003, pp 44–48. IEEE
Google Scholar
Yoo J-Y, Kim M-K, Yong Han S, Kwon Y-B (1997) Information extraction from a skewed form document in the presence of crossing characters. In: Graphics recognition algorithms and systems, pp 139–148. Springer, Berlin
Google Scholar
Abd-Almageed W, Kumar J, Doermann D (2009) Page rule-line removal using linear subspaces in monochromatic handwritten arabic documents. In: 10th international conference on document analysis and recognition, 2009 (ICDAR’09), pp 768–772. IEEE
Google Scholar
Pietikäinen M, Okun, O (2001) Edge-based method for text detection from complex document images. In: Proceedings of sixth international conference on document analysis and recognition 2001, pp 286–291. IEEE
Google Scholar
Chen J-L, Lee H-J (1998) An efficient algorithm for form structure extraction using strip projection. Pattern Recogn 31(9):1353–1368
Article Google Scholar
Gatos B, Danatsas D, Pratikakis I, Perantonis SJ (2005) Automatic table detection in document images. In: Pattern recognition and data mining, pp 609–618. Springer, Berlin
Google Scholar
Al-Faris AQ, Mohamad D, Ngah UK, Isa NAM (2011) Handwritten characters extraction from form based on line shape characteristics. J Comput Sci 7(12):1778
Google Scholar
Kong B, Chen SS, Haralick RM, Phillips IT (1995) Automatic line detection in document images using recursive morphological transforms. In: IS&T/SPIE’s symposium on electronic imaging: science & technology, pp 163–174. International Society for Optics and Photonics
Google Scholar
Parker JR (2010) Algorithms for image processing and computer vision. Wiley
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Amrita University, Mysore, India
N. Shobha Rani
Maharaja Research Foundation, Maharaja Institute of Technology, Mysore, India
T. Vasudev

Authors

N. Shobha Rani
View author publications
You can also search for this author in PubMed Google Scholar
T. Vasudev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to N. Shobha Rani .

Editor information

Editors and Affiliations

Department of Studies in Computer Science, University of Mysore, Mysore, Karnataka, India
D. S. Guru
Department of Master of Computer Application, Maharaja Institute of Technology, Mysore, Karnataka, India
T. Vasudev
Department of Computer Science and Engineering, Maharaja Institute of Technology, Mysore, Karnataka, India
H.K. Chethan
Department of Information Science and Engineering, Maharaja Institute of Technology, Mysore, Karnataka, India
Y.H. Sharath Kumar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shobha Rani, N., Vasudev, T. (2018). An Efficient Technique for Detection and Removal of Lines with Text Stroke Crossings in Document Images. In: Guru, D., Vasudev, T., Chethan, H., Kumar, Y. (eds) Proceedings of International Conference on Cognition and Recognition . Lecture Notes in Networks and Systems, vol 14. Springer, Singapore. https://doi.org/10.1007/978-981-10-5146-3_9

Download citation

DOI: https://doi.org/10.1007/978-981-10-5146-3_9
Published: 27 September 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-5145-6
Online ISBN: 978-981-10-5146-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics