Advertisement

A Mean-Based Thresholding Approach for Broken Character Segmentation from Printed Gujarati Documents

  • Riddhi J. Shah
  • Tushar V. Ratanpara
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 379)

Abstract

The major problem faced by a Gujarati optical character recognition (OCR) can be attributed to the presence of broken character in machine printed Gujarati document image. This character could cause the error in character segmentation process. Broken characters are generated due to noise scanning, older documents with low-quality printing, and thresholding error. It is necessary to identify and segment it properly. So this paper presents mean-based thresholding technique for broken character segmentation from printed Gujarati documents. Line segmentation is used to extract lines from Gujarati document image. Individual characters are extracted using vertical projection profile method. Then, broken characters are identified using mean-based thresholding (MBT) algorithm. Heuristic information is used to merge the identified broken characters. The main purpose of this paper is to merge vertical and naturally broken Gujarati characters as a single glyph from the document image. Experimental results are carried out using various types of Gujarati documents (A, B, C, and D). 79.93 % accuracy is achieved from experimental results.

Keywords

Document Image Optical Character Recognition Character Segmentation Heuristic Information Break Character 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Singh, R., Yadav, C.S., Verma, P., Yadav, V.: Optical character recognition (OCR) for printed Devnagari script using artificial neural network. Int. J. Comput. Appl. 1, 91–95 (2010)Google Scholar
  2. 2.
    Antani, S., Agnihotri, L.: Gujarati character recognition. In: International Conference on Document Analysis and Recognition (1999)Google Scholar
  3. 3.
    Solanki, P., Bhatt, M.: Printed Gujarati script OCR using hopfield neural network. Int. J. Comput. Appl. 69, 33–37 (2013)Google Scholar
  4. 4.
    Peerawit, P., Yingsaree, W., Kawtrakul, A.: The utilization of closing algorithm and heuristic information for broken character segmentation. IEEE Conf. Cybern. Intell. Syst. 2, 775–779 (2004)Google Scholar
  5. 5.
    Prajapati, H., Rama Mohan, S.: Removal of graphics from text-document and segmentation of Gujarati documents using connected component theory. IOSR J. Eng. 2(5), 1029–1031 (2012)Google Scholar
  6. 6.
    Sankura, B., Sezgin, M.: Thresholding Review. Boğaziçi University Electric-Electronic Engineering Department, BebekGoogle Scholar
  7. 7.
    Gonzalez, R.C., Woods, R.E.: Digital Image Processing, 2nd edn. Pearson Education, SingaporeGoogle Scholar
  8. 8.
    Patel, C., Desai, A.: Segmentation of text lines into words for Gujarati handwritten text. In: International Conference on Signal and Image Processing, pp. 130–134 (2010)Google Scholar
  9. 9.
    Bishnu, A., Chaudhari, B.B., Segmentation of Bangla handwritten text into characters by recursive contour following. In: International Conference on Documents Analysis and Recognition (1999)Google Scholar
  10. 10.
    Thakkar, N., Dangarwala, K.: A survey on offline-methods of character segmentation. In: International Conference on Advances in Computer Science and Electronics Engineering (2013)Google Scholar
  11. 11.
    Droetboom, M.: Correcting broken character in the recognition of the historical printed documents. IEEE Joint Conf. Digit. Libr. 2, 775–779 (2003)Google Scholar
  12. 12.
    Tangwongsan, S., Sumetphong, C.: Optical character recognition techniques for restoration Thai historical documents. In: International Conference on Computer and Electrical Engineering, pp. 531–535 (2008)Google Scholar
  13. 13.
    Sumetphong, C., Tangwongsan, S.: An optimal approach towards recognizing broken Thai characters in OCR systems. In: International Conference on Digital Image Computing Techniques and Applications, pp. 1–5 (2012)Google Scholar
  14. 14.
    Sumetphong, C., Tangwongsan, S.: Effectively recognizing broken characters in Historical documents. IEEE Int. Conf. Comput. Sci. Autom. Eng. 3, 104–108 (2012)Google Scholar
  15. 15.
    Sumetphong, C., Tangwongsan, S.: Recognizing broken characters in Thai historical documents. Int. Conf. Adv. Comput. Theory Eng. 1, 99–103 (2010)Google Scholar
  16. 16.
    Babu, R., Ravinsankar, M., Kumar, M., Raj, A., Wadera, K.: Recognition of machine printed broken characters based on gradient patterns and its spatial relationship. Int. Conf. Comput. Sci. Inf. Technol. 1, 673–676 (2010)Google Scholar
  17. 17.
    Sulem, L., Sigelle, M.: Recognition of broken characters from historical printed books using dynamic Bayesian networks. In: International Conference on Document Analysis and Recognition, vol. 1, pp. 173–177 (2007)Google Scholar
  18. 18.
    Stuberud, P., Kanai, J., Kalluri, V.: Adaptive image restoration of text images that contain touching or broken characters. In: International Conference on Document Analysis and Recognition, vol. 2, pp.778–781 (1995)Google Scholar
  19. 19.
    Dholakiya, J.: Mathematical Techniques for Gujarati Document Analysis and Character Recognition. M. S. University of Baroda, Baroda (2010)Google Scholar
  20. 20.
    Singh, S.: Optical character recognition techniques: a survey. J. Emerg. Trends Comput. Info. Sci. 6, 545–550 (2013)Google Scholar

Copyright information

© Springer India 2016

Authors and Affiliations

  1. 1.Department of Computer EngineeringDharmsinh Desai UniversityNadiadIndia

Personalised recommendations