Foreground Text Extraction in Color Document Images for Enhanced Readability

Nirmala, S.; Nagabhushan, P.

doi:10.1007/978-3-642-11164-8_63

S. Nirmala²¹ &
P. Nagabhushan²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5909))

Included in the following conference series:

International Conference on Pattern Recognition and Machine Intelligence

1558 Accesses

Abstract

Quite often it is observed that text information in documents is printed on colorful complex background. Smooth reading of text content in such documents is difficult due to background patterns and mix up of foreground text color with background color. Further the character recognition rate when such documents are OCRed, is low. In this paper we are presenting a novel approach for extraction of text information in complex color document images. The proposed approach is a three stage process. In the first stage the edge map is obtained utilizing the Canny edge operator. The edge map is split into blocks of uniform size and image blocks are classified as text or non-text. In each text block the possible text regions are identified and enclosed in tight bounding boxes using x-y cut on edge pixels. Further the text regions that are immediate adjacent to each other in vertical direction in which the character(s) are split horizontally are merged so as to enclose the character(s) fully in one text region. In the second stage certain amount of false text regions are eliminated based on a property of printed text. In the last stage the foreground text in each text region is extracted by unsupervised thresholding using the data of refined text regions. We conducted exhaustive experiments on documents having variety of background complexities with printed foreground text in any color, font and tilt. The experimental evaluations show that on an average 98.03% of text is identified. The processed document images showed better performance when OCRed compared with the corresponding unprocessed source document images.

Download to read the full chapter text

Chapter PDF

Edge color transform: a new operator for natural scene text localization

Article 25 April 2017

Historical Handwritten Document Image Segmentation Using Morphology

Image Text Extraction Based on Morphology and Color Layering

Keywords

References

Sezgin, M., Sankur, B.: Survey over image thresholding techniques and quantitative performance evaluation. Journal of Electronic imaging 13, 146–165 (2004)
Article Google Scholar
Pietikäinen, M., Okun, O.: Text extraction from grey scale page images by simple edge detectors. In: Proceedings of the 12th Scandinavian Conference on Image Analysis, SCIA, Norway, pp. 628–635 (2001)
Google Scholar
Leedham, G., Chen, Y., Takru, K., Tan, J.H.N., Mian, L.: Comparison of some thresholding algorithms for text/background segmentation in difficult document images. In: Proceedings of seventh International Conf. on Document Analysis and Recognition (ICDAR), pp. 859–864 (2003)
Google Scholar
Shivananda, N., Nagabhushan, P.: Separation of Foreground Text from Complex Background in Color Document Images. In: Proceedings of Seventh international conference on advances in pattern recognition, ISI Kolkata, pp. 306–309 (2009)
Google Scholar
Kasar, T., Kumar, J., Ramakrishnan, A.G.: Font and Background Color Independent Text Binarization. In: Proceedings of 2nd Intl. workshop on Camera Based Document Analysis and Recognition (workshop of CBDAR), pp. 3–9 (2007)
Google Scholar
Liu, Y., Goto, S., Ikenaga, T.: A contour based robust algorithm for text detection in color images. IEICE Transactions on Information and Systems 89, 1221–1230 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dept of Studies in Computer Science, University of Mysore, Mysore, 570 006, India
S. Nirmala & P. Nagabhushan

Authors

S. Nirmala
View author publications
You can also search for this author in PubMed Google Scholar
P. Nagabhushan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Electrical Engineering Department, Indian Institute of Technology Delhi, 110016, New Delhi, India
Santanu Chaudhury
Center for Soft Computing Research, Indian Statistical Institute, 700 108, Kolkata, India
Sushmita Mitra
Center for Soft Computing Research, Indian Statistical Institute,
C. A. Murthy
Department of Electrical Engineering, Indian Institute of Science, 560012, Bangalore, INDIA
P. S. Sastry
Center for Soft Computing Research, Machine Intelligence Unit, Indian Statistical Institute, 203 Barrackpore Trunk Road, 700 108, Kolkata, India
Sankar K. Pal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nirmala, S., Nagabhushan, P. (2009). Foreground Text Extraction in Color Document Images for Enhanced Readability. In: Chaudhury, S., Mitra, S., Murthy, C.A., Sastry, P.S., Pal, S.K. (eds) Pattern Recognition and Machine Intelligence. PReMI 2009. Lecture Notes in Computer Science, vol 5909. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11164-8_63

Download citation

DOI: https://doi.org/10.1007/978-3-642-11164-8_63
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11163-1
Online ISBN: 978-3-642-11164-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Foreground Text Extraction in Color Document Images for Enhanced Readability

Abstract

Chapter PDF

Similar content being viewed by others

Edge color transform: a new operator for natural scene text localization

Historical Handwritten Document Image Segmentation Using Morphology

Image Text Extraction Based on Morphology and Color Layering

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Foreground Text Extraction in Color Document Images for Enhanced Readability

Abstract

Chapter PDF

Similar content being viewed by others

Edge color transform: a new operator for natural scene text localization

Historical Handwritten Document Image Segmentation Using Morphology

Image Text Extraction Based on Morphology and Color Layering

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation