Advertisement

Text Line Segmentation for Unconstrained Handwritten Document Images Using Neighborhood Connected Component Analysis

  • Abhishek Khandelwal
  • Pritha Choudhury
  • Ram Sarkar
  • Subhadip Basu
  • Mita Nasipuri
  • Nibaran Das
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5909)

Abstract

Text line extraction is the first and one of the most critical steps in optical character recognition (OCR) of unconstrained handwritten documents. The present work reports a new methodology based on comparison of neighborhood connected components to determine whether they belong to the same text line. Components which are very small or very large compared to the average component height are ignored in the preprocessing step. During post-processing, such components are reconsidered and allocated to the lines to which they most suitably belong. The performance of the developed technique is evaluated on the benchmark training dataset for the ICDAR 2009 handwriting segmentation contest. The dataset consists of English, French, German and Greek handwritten texts. The overall text line identification accuracy on the mentioned dataset is observed to be around 93.35%.

Keywords

Text line identification handwritten script neighborhood connected component analysis 

References

  1. 1.
    Likforman, L., et al.: A Hough based algorithm for extracting text lines in handwritten documents. In: Proc. of the Third ICDAR, Montreal, Canada, pp. 774–777 (1995)Google Scholar
  2. 2.
    Pu, Y., et al.: A natural learning algorithm based on Hough transform for text lines extraction in handwritten documents. In: Proc. of the 6th IWFHR, pp. 637–646 (1998)Google Scholar
  3. 3.
    Louloudis, G., et al.: A block-based Hough transform mapping for text line detection in handwritten documents. In: The 10th IWFHR, France, October 2006, pp. 515–520 (2006)Google Scholar
  4. 4.
    Shi, Z., et al.: Line separation for complex document images using fuzzy run-length. In: First International Workshop on Document Image Analysis for Libraries, p. 306 (2004)Google Scholar
  5. 5.
    Gatos, B., et al.: ICDAR2007 Handwriting Segmentation Contest. In: the Ninth ICDAR, Curitiba, Brazil, September 2007, pp. 1284–1288 (2007)Google Scholar
  6. 6.
    Wahl, F.M., et al.: Block segmentation and text extraction in mixed text/image documents. Computer Graphics and Image Processing 20, 375–390 (1982)CrossRefGoogle Scholar
  7. 7.
    Roy, P.P., et al.: Morphology Based Handwritten Line Segmentation Using Foreground and Background Information. In: Proc. of ICFHR, Canada, pp. 241–246 (2008)Google Scholar
  8. 8.
    Yin, F., et al.: Handwritten Text Line Segmentation by Clustering with Distance Metric Learning. In: Proc. of ICFHR, Canada, August 91-21, pp. 229–234 (2008)Google Scholar
  9. 9.
    Du, X., et al.: Text Line Segmentation in Handwritten Documents Using Mumford-Shah Model. In: Proc. of ICFHR, Canada, August 91-21, pp. 253–258 (2008)Google Scholar
  10. 10.
    Li, Y., et al.: Script-Independent Text Line Segmentation in Freestyle Handwritten Documents. IEEE Transactions on PAMI 30(8), 1313–1329 (2008)Google Scholar
  11. 11.
    Basu, S., et al.: Text line extraction from multi-skewed handwritten documents. Pattern Recognition 40(6), 1825–1839 (2007)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Abhishek Khandelwal
    • 1
  • Pritha Choudhury
    • 1
  • Ram Sarkar
    • 2
  • Subhadip Basu
    • 2
  • Mita Nasipuri
    • 2
  • Nibaran Das
    • 2
  1. 1.CSE DepartmentSikkim Manipal Institute of TechnologySikkimIndia
  2. 2.CSE DepartmentJadavpur UniversityKolkataIndia

Personalised recommendations