Skip to main content

An Efficient Technique for Detection and Removal of Lines with Text Stroke Crossings in Document Images

  • Conference paper
  • First Online:
Proceedings of International Conference on Cognition and Recognition

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 14))

Abstract

Precise automatic reading of the characters in a document image is the functionality of Optical Character Recognition (OCR) systems. The overall recognition accuracy can be accomplished only through efficient pre-processing procedures. The recognition of characters in pre-printed document images is a highly challenging task as it desires unique pre-processing methods and it depends on the layout of document. In this paper we propose a pre-processing technique for removal of horizontal/vertical lines in the pre-printed documents. The major challenge involved in removal of the horizontal lines is retention of the pixels overlapped between line and characters in document. The proposed algorithm works in two phases; image enhancement and line detection is made in the first phase and the second phase comprises convolution process using rectangular structuring element for detection of text stroke crossings on lines which are detected in phase one. The output image is further subjected to undergo post enhancement and analysis operations using connected component analysis and area features for removal of broken/dotted line structures. The experimental outcomes achieved are quite satisfactory and consistent enough for subsequent processing of document.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zheng Y, Li H, Doermann D (2005) A parallel-line detection algorithm based on HMM decoding. Pattern Anal Mach Intell IEEE Trans 27(5):777–792

    Article  Google Scholar 

  2. Zheng Y, Li H, Doermann D (2003) A model-based line detection algorithm in documents. In: Proceedings of seventh international conference on document analysis and recognition, 2003, pp 44–48. IEEE

    Google Scholar 

  3. Yoo J-Y, Kim M-K, Yong Han S, Kwon Y-B (1997) Information extraction from a skewed form document in the presence of crossing characters. In: Graphics recognition algorithms and systems, pp 139–148. Springer, Berlin

    Google Scholar 

  4. Abd-Almageed W, Kumar J, Doermann D (2009) Page rule-line removal using linear subspaces in monochromatic handwritten arabic documents. In: 10th international conference on document analysis and recognition, 2009 (ICDAR’09), pp 768–772. IEEE

    Google Scholar 

  5. Pietikäinen M, Okun, O (2001) Edge-based method for text detection from complex document images. In: Proceedings of sixth international conference on document analysis and recognition 2001, pp 286–291. IEEE

    Google Scholar 

  6. Chen J-L, Lee H-J (1998) An efficient algorithm for form structure extraction using strip projection. Pattern Recogn 31(9):1353–1368

    Article  Google Scholar 

  7. Gatos B, Danatsas D, Pratikakis I, Perantonis SJ (2005) Automatic table detection in document images. In: Pattern recognition and data mining, pp 609–618. Springer, Berlin

    Google Scholar 

  8. Al-Faris AQ, Mohamad D, Ngah UK, Isa NAM (2011) Handwritten characters extraction from form based on line shape characteristics. J Comput Sci 7(12):1778

    Google Scholar 

  9. Kong B, Chen SS, Haralick RM, Phillips IT (1995) Automatic line detection in document images using recursive morphological transforms. In: IS&T/SPIE’s symposium on electronic imaging: science & technology, pp 163–174. International Society for Optics and Photonics

    Google Scholar 

  10. Parker JR (2010) Algorithms for image processing and computer vision. Wiley

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to N. Shobha Rani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Shobha Rani, N., Vasudev, T. (2018). An Efficient Technique for Detection and Removal of Lines with Text Stroke Crossings in Document Images. In: Guru, D., Vasudev, T., Chethan, H., Kumar, Y. (eds) Proceedings of International Conference on Cognition and Recognition . Lecture Notes in Networks and Systems, vol 14. Springer, Singapore. https://doi.org/10.1007/978-981-10-5146-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-5146-3_9

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-5145-6

  • Online ISBN: 978-981-10-5146-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics