Abstract
Precise automatic reading of the characters in a document image is the functionality of Optical Character Recognition (OCR) systems. The overall recognition accuracy can be accomplished only through efficient pre-processing procedures. The recognition of characters in pre-printed document images is a highly challenging task as it desires unique pre-processing methods and it depends on the layout of document. In this paper we propose a pre-processing technique for removal of horizontal/vertical lines in the pre-printed documents. The major challenge involved in removal of the horizontal lines is retention of the pixels overlapped between line and characters in document. The proposed algorithm works in two phases; image enhancement and line detection is made in the first phase and the second phase comprises convolution process using rectangular structuring element for detection of text stroke crossings on lines which are detected in phase one. The output image is further subjected to undergo post enhancement and analysis operations using connected component analysis and area features for removal of broken/dotted line structures. The experimental outcomes achieved are quite satisfactory and consistent enough for subsequent processing of document.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zheng Y, Li H, Doermann D (2005) A parallel-line detection algorithm based on HMM decoding. Pattern Anal Mach Intell IEEE Trans 27(5):777–792
Zheng Y, Li H, Doermann D (2003) A model-based line detection algorithm in documents. In: Proceedings of seventh international conference on document analysis and recognition, 2003, pp 44–48. IEEE
Yoo J-Y, Kim M-K, Yong Han S, Kwon Y-B (1997) Information extraction from a skewed form document in the presence of crossing characters. In: Graphics recognition algorithms and systems, pp 139–148. Springer, Berlin
Abd-Almageed W, Kumar J, Doermann D (2009) Page rule-line removal using linear subspaces in monochromatic handwritten arabic documents. In: 10th international conference on document analysis and recognition, 2009 (ICDAR’09), pp 768–772. IEEE
Pietikäinen M, Okun, O (2001) Edge-based method for text detection from complex document images. In: Proceedings of sixth international conference on document analysis and recognition 2001, pp 286–291. IEEE
Chen J-L, Lee H-J (1998) An efficient algorithm for form structure extraction using strip projection. Pattern Recogn 31(9):1353–1368
Gatos B, Danatsas D, Pratikakis I, Perantonis SJ (2005) Automatic table detection in document images. In: Pattern recognition and data mining, pp 609–618. Springer, Berlin
Al-Faris AQ, Mohamad D, Ngah UK, Isa NAM (2011) Handwritten characters extraction from form based on line shape characteristics. J Comput Sci 7(12):1778
Kong B, Chen SS, Haralick RM, Phillips IT (1995) Automatic line detection in document images using recursive morphological transforms. In: IS&T/SPIE’s symposium on electronic imaging: science & technology, pp 163–174. International Society for Optics and Photonics
Parker JR (2010) Algorithms for image processing and computer vision. Wiley
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Shobha Rani, N., Vasudev, T. (2018). An Efficient Technique for Detection and Removal of Lines with Text Stroke Crossings in Document Images. In: Guru, D., Vasudev, T., Chethan, H., Kumar, Y. (eds) Proceedings of International Conference on Cognition and Recognition . Lecture Notes in Networks and Systems, vol 14. Springer, Singapore. https://doi.org/10.1007/978-981-10-5146-3_9
Download citation
DOI: https://doi.org/10.1007/978-981-10-5146-3_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-5145-6
Online ISBN: 978-981-10-5146-3
eBook Packages: EngineeringEngineering (R0)