Skip to main content

Progress in Gujarati Document Processing and Character Recognition

  • Chapter
  • First Online:
Guide to OCR for Indic Scripts

Part of the book series: Advances in Pattern Recognition ((ACVPR))

Abstract

Gujarati is an Indic script similar in appearance to other Indo-Aryan scripts. Printed Gujarati script has a rich literary heritage. From an OCR perspective it needs a different treatment due to some of its peculiarities. Research on Gujarati OCR is a recent development as compared to OCR research on many other Indic scripts. Here, in this chapter we present a detailed account of the state of the art of Gujarati document analysis and character recognition. We begin with approaches to zone boundary detection, necessary for the isolation of words and character segmentation and recognition. We show results of various feature extraction techniques such as fringe maps, discrete cosine transform, and wavelets. Zone information and aspect ratios are also used for classification. We present recognition results with two types of classifiers, viz., nearest neighbor classifier and artificial neural networks. Results of experiments wherein various combinations of feature extraction methods with classifiers are also presented. We find that general regression neural network with wavelets feature gives best results with significant time saving in training. Since Indic scripts require syllabic reconstruction from OCR components, a procedure for text generation from the recognized glyph sequences and a method for post-processing is also described.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Two conjuncts /ksha/ and /jya/ are treated as if they are basic consonants in Gujarati script

References

  1. Gandhi, M. K.: Hind Swaraj (Indian Homerule). Navjeevan Publishers

    Google Scholar 

  2. Gandhi, M. K.: Satya-na Prayogo–Atmakatha (My Experiments with Truth - Autobiography). Navjeevan Publishers

    Google Scholar 

  3. Dwyer, R.: The poetics of devotion: the Gujarati lyrics of Dayaram. (2000)

    Google Scholar 

  4. Mehta, S. Y., Dholakia, J.: Gujarati Script. Vishwabharat@TDIL April(2004)

    Google Scholar 

  5. Bansal, V., Sinha, R. M. K.: A Complete OCR for Printed Hindi Text in Devanagari Script. Proc. 6th ICDAR (2001)

    Google Scholar 

  6. Pal, U., Chaudhuri, B. B.: Automatic Separation of Machine-Printed and Hand-Written Text Lines. Proc. of 6th ICDAR (2001) 645–648

    Google Scholar 

  7. Pal, U., Belaid, A., Choisy, C.: Touching Numeral Segmentation Using Water Reservoir Concept. Pattern Recognition Letters 24 (2003) 261–272

    Article  Google Scholar 

  8. Chaudhuri, B.B., Pal, U.: An OCR System to Read Two Indian Lanague Scripts : Bangla and Devanagari. Proc. 4th ICDAR (1997) 1011–1015

    Google Scholar 

  9. Negi, A., Bhagvati, C., Krishna, B.: An OCR System for Telugu. Proc. 4th ICDAR (1997) 1110–1114

    Google Scholar 

  10. Breuel, T.: Segmentation of Hand printed Letter Strings using a Dynamic Programming Algorithm. Proc. of 6th ICDAR (2001) 821–826

    Google Scholar 

  11. Antani, S., Agnihotri, L.: Gujarati Character Recognition. Proc. 6th ICDAR (1999) 418–421

    Google Scholar 

  12. Dholakia, J., Negi, A., S., Ramamohan: Zone Identification in Printed Gujarati Text. Proc. ICDAR (2005) 272–276

    Google Scholar 

  13. Ramamohan, S., Yajnik, A.: Gujarati Numeral Recognition Using Wavelets and Neural Network. IICAI (2005) 397–406

    Google Scholar 

  14. Yajnik, A., Rama Mohan, S.: Identification of Gujarati Characters Using Wavelets and Neural Networks. ISTED Conference on Artificial Intelligence and Soft Computing (2006) 150–155

    Google Scholar 

  15. Dholakia, J., Negi, A., Pathak, V. D.: A Novel Approach To Model Zone Separation Problem In Printed Gujarati Text And Its Solution by Application Of Dynamic Programming. Proc. Of International Conference of Advanced Computing and Communication (2007)

    Google Scholar 

  16. Dholakia, J., Yajnik, A., Negi, A.: Wavelet Feature Based Confusion Character Sets for Gujarati Script. ICCIMA (2007)

    Google Scholar 

  17. Bloomberg, D., Minka, T., Popat, K.: Document Image Decoding using Iterated Complete Path Search with Subsampled Heuristic Scoring. Proc. of 6th ICDAR (2001) 344–349

    Google Scholar 

  18. Popat, K.: Decoding of Text Lines in Grayscale Document Images. Proc. Of ICASSP (2001) 1513–1516

    Google Scholar 

  19. Negi, A., Murthy, K. N., Bhagvati, C.: Issues of Document Engineering in Indian Scripts and Telugu as a Case Study. Vivek (2003)

    Google Scholar 

  20. Gonzalez, R. C., Woods, R. E.: Digital Image Processing. Addison-Wesley (1993)

    Google Scholar 

  21. Pujari, A. K., Naidu, D. C., Sreenivasa Rao, M. Jingara, B. C.: An Adaptive Character Recognizer for Telugu Scripts using Multiresolution Analysis and Associative Memory. Image Vision Computing 22(14) (2004) 1221–1227

    Google Scholar 

  22. Bhattacharya, U., Parui, S. K., Sridhar, M., Kimura, F.: Two-stage Recognition of Handwritten Bangla Alphanumeric Characters Using Neural Classifier. Proc. Of IICAI (2005) 1357–1376

    Google Scholar 

  23. Chaudhuri, A. R., Mandal A. K., Chaudhuri B. B.: Page Layout Analysis for Multilingual Indian Document. Proc. of LEC (2002) 24–32

    Google Scholar 

  24. Duda, O., Heart, P., Stork, D.: Pattern Classification. 2nd. edn. J. Wiley (2001)

    Google Scholar 

  25. Haykin, S.: Neural Networks, A Comprehensive Foundation. Pearson Education Asia (2002)

    Google Scholar 

  26. Specht, D. F.: A General Regression Neural Network. IEEE Transactions on Neural Networks 2(6) (1991) 568–576

    Article  Google Scholar 

  27. Kumar, V. B., Ramakrishnan, A. G.: Radial Basis Function and Subspace Approach For Printed Kannada Text Recognition. Proc. Of ICASSP 5 (2004) 321–324

    Google Scholar 

  28. Amrouche, A., Rouvaen, J. M.: Efficient System for Speech Recognition using General Regression Neural Network. International Journal Of Intelligent Technology 1 (2) (2006) 183–189

    Google Scholar 

  29. Amrouche, A., Rouvaen,J. M.: Arabic Isolated Word Recognition Using General Regression Neural Network. Proc. of the 46th IEEE International Midwest Symposium on Circuits and Systems 2 (2003) 689–692.

    Google Scholar 

  30. Huang, L., Huang, X.: Multi Resolution Recognition Of Off line Handwritten Chinese Characters With Wavelet Transform. Proc. 6th ICDAR (2001) 631–634

    Google Scholar 

  31. http://www.omniglot.com

  32. Pal, U., Chaudhuri, B. B.: Automatic Separation of Machine-Printed and Hand-Written Text Lines Proc. 6th ICDAR (1999) 645–648

    Google Scholar 

  33. Wang, K. Y., Casey, R. G., Wahl, F. M.: Document Analysis System. IBM J. Res. Development 26 (1982) 647–656

    Article  Google Scholar 

Download references

Acknowledgment

Most of this work was supported by the grants from the Ministry of Communications and Information Technology, Government of India, under Resource Center for Indian Language Technology Solutions project and Development Of Robust Document Analysis And Recognition System For Printed Indian Scripts project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jignesh Dholakia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag London Limited

About this chapter

Cite this chapter

Dholakia, J., Negi, A., Mohan, S.R. (2009). Progress in Gujarati Document Processing and Character Recognition. In: Govindaraju, V., Setlur, S. (eds) Guide to OCR for Indic Scripts. Advances in Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-84800-330-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-84800-330-9_4

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84800-329-3

  • Online ISBN: 978-1-84800-330-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics