Progress in Gujarati Document Processing and Character Recognition

Dholakia, Jignesh; Negi, Atul; Mohan, S. Rama

doi:10.1007/978-1-84800-330-9_4

Jignesh Dholakia³,
Atul Negi⁴ &
S. Rama Mohan³

Part of the book series: Advances in Pattern Recognition ((ACVPR))

725 Accesses
5 Citations

Abstract

Gujarati is an Indic script similar in appearance to other Indo-Aryan scripts. Printed Gujarati script has a rich literary heritage. From an OCR perspective it needs a different treatment due to some of its peculiarities. Research on Gujarati OCR is a recent development as compared to OCR research on many other Indic scripts. Here, in this chapter we present a detailed account of the state of the art of Gujarati document analysis and character recognition. We begin with approaches to zone boundary detection, necessary for the isolation of words and character segmentation and recognition. We show results of various feature extraction techniques such as fringe maps, discrete cosine transform, and wavelets. Zone information and aspect ratios are also used for classification. We present recognition results with two types of classifiers, viz., nearest neighbor classifier and artificial neural networks. Results of experiments wherein various combinations of feature extraction methods with classifiers are also presented. We find that general regression neural network with wavelets feature gives best results with significant time saving in training. Since Indic scripts require syllabic reconstruction from OCR components, a procedure for text generation from the recognized glyph sequences and a method for post-processing is also described.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Two conjuncts /ksha/ and /jya/ are treated as if they are basic consonants in Gujarati script

References

Gandhi, M. K.: Hind Swaraj (Indian Homerule). Navjeevan Publishers
Google Scholar
Gandhi, M. K.: Satya-na Prayogo–Atmakatha (My Experiments with Truth - Autobiography). Navjeevan Publishers
Google Scholar
Dwyer, R.: The poetics of devotion: the Gujarati lyrics of Dayaram. (2000)
Google Scholar
Mehta, S. Y., Dholakia, J.: Gujarati Script. Vishwabharat@TDIL April(2004)
Google Scholar
Bansal, V., Sinha, R. M. K.: A Complete OCR for Printed Hindi Text in Devanagari Script. Proc. 6th ICDAR (2001)
Google Scholar
Pal, U., Chaudhuri, B. B.: Automatic Separation of Machine-Printed and Hand-Written Text Lines. Proc. of 6th ICDAR (2001) 645–648
Google Scholar
Pal, U., Belaid, A., Choisy, C.: Touching Numeral Segmentation Using Water Reservoir Concept. Pattern Recognition Letters 24 (2003) 261–272
Article Google Scholar
Chaudhuri, B.B., Pal, U.: An OCR System to Read Two Indian Lanague Scripts : Bangla and Devanagari. Proc. 4th ICDAR (1997) 1011–1015
Google Scholar
Negi, A., Bhagvati, C., Krishna, B.: An OCR System for Telugu. Proc. 4th ICDAR (1997) 1110–1114
Google Scholar
Breuel, T.: Segmentation of Hand printed Letter Strings using a Dynamic Programming Algorithm. Proc. of 6th ICDAR (2001) 821–826
Google Scholar
Antani, S., Agnihotri, L.: Gujarati Character Recognition. Proc. 6th ICDAR (1999) 418–421
Google Scholar
Dholakia, J., Negi, A., S., Ramamohan: Zone Identification in Printed Gujarati Text. Proc. ICDAR (2005) 272–276
Google Scholar
Ramamohan, S., Yajnik, A.: Gujarati Numeral Recognition Using Wavelets and Neural Network. IICAI (2005) 397–406
Google Scholar
Yajnik, A., Rama Mohan, S.: Identification of Gujarati Characters Using Wavelets and Neural Networks. ISTED Conference on Artificial Intelligence and Soft Computing (2006) 150–155
Google Scholar
Dholakia, J., Negi, A., Pathak, V. D.: A Novel Approach To Model Zone Separation Problem In Printed Gujarati Text And Its Solution by Application Of Dynamic Programming. Proc. Of International Conference of Advanced Computing and Communication (2007)
Google Scholar
Dholakia, J., Yajnik, A., Negi, A.: Wavelet Feature Based Confusion Character Sets for Gujarati Script. ICCIMA (2007)
Google Scholar
Bloomberg, D., Minka, T., Popat, K.: Document Image Decoding using Iterated Complete Path Search with Subsampled Heuristic Scoring. Proc. of 6th ICDAR (2001) 344–349
Google Scholar
Popat, K.: Decoding of Text Lines in Grayscale Document Images. Proc. Of ICASSP (2001) 1513–1516
Google Scholar
Negi, A., Murthy, K. N., Bhagvati, C.: Issues of Document Engineering in Indian Scripts and Telugu as a Case Study. Vivek (2003)
Google Scholar
Gonzalez, R. C., Woods, R. E.: Digital Image Processing. Addison-Wesley (1993)
Google Scholar
Pujari, A. K., Naidu, D. C., Sreenivasa Rao, M. Jingara, B. C.: An Adaptive Character Recognizer for Telugu Scripts using Multiresolution Analysis and Associative Memory. Image Vision Computing 22(14) (2004) 1221–1227
Google Scholar
Bhattacharya, U., Parui, S. K., Sridhar, M., Kimura, F.: Two-stage Recognition of Handwritten Bangla Alphanumeric Characters Using Neural Classifier. Proc. Of IICAI (2005) 1357–1376
Google Scholar
Chaudhuri, A. R., Mandal A. K., Chaudhuri B. B.: Page Layout Analysis for Multilingual Indian Document. Proc. of LEC (2002) 24–32
Google Scholar
Duda, O., Heart, P., Stork, D.: Pattern Classification. 2nd. edn. J. Wiley (2001)
Google Scholar
Haykin, S.: Neural Networks, A Comprehensive Foundation. Pearson Education Asia (2002)
Google Scholar
Specht, D. F.: A General Regression Neural Network. IEEE Transactions on Neural Networks 2(6) (1991) 568–576
Article Google Scholar
Kumar, V. B., Ramakrishnan, A. G.: Radial Basis Function and Subspace Approach For Printed Kannada Text Recognition. Proc. Of ICASSP 5 (2004) 321–324
Google Scholar
Amrouche, A., Rouvaen, J. M.: Efficient System for Speech Recognition using General Regression Neural Network. International Journal Of Intelligent Technology 1 (2) (2006) 183–189
Google Scholar
Amrouche, A., Rouvaen,J. M.: Arabic Isolated Word Recognition Using General Regression Neural Network. Proc. of the 46th IEEE International Midwest Symposium on Circuits and Systems 2 (2003) 689–692.
Google Scholar
Huang, L., Huang, X.: Multi Resolution Recognition Of Off line Handwritten Chinese Characters With Wavelet Transform. Proc. 6th ICDAR (2001) 631–634
Google Scholar
http://www.omniglot.com
Pal, U., Chaudhuri, B. B.: Automatic Separation of Machine-Printed and Hand-Written Text Lines Proc. 6th ICDAR (1999) 645–648
Google Scholar
Wang, K. Y., Casey, R. G., Wahl, F. M.: Document Analysis System. IBM J. Res. Development 26 (1982) 647–656
Article Google Scholar

Download references

Acknowledgment

Most of this work was supported by the grants from the Ministry of Communications and Information Technology, Government of India, under Resource Center for Indian Language Technology Solutions project and Development Of Robust Document Analysis And Recognition System For Printed Indian Scripts project.

Author information

Authors and Affiliations

Department of Applied Mathematics, The M. S. University of Baroda, Vadodara, Gujarat, India
Jignesh Dholakia & S. Rama Mohan
Department of Computer and Information Sciences, University of Hyderabad, Hyderabad, India
Atul Negi

Authors

Jignesh Dholakia
View author publications
You can also search for this author in PubMed Google Scholar
Atul Negi
View author publications
You can also search for this author in PubMed Google Scholar
S. Rama Mohan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jignesh Dholakia .

Editor information

Editors and Affiliations

Analysis & Recognition (CEDAR), Center of Excellence for Document, Lee Entrance 520, Amherst, 14228, U.S.A.
Venu Govindaraju
Analysis & Recognition (CEDAR), Center of Excellence for Document, Lee Entrance 520, Amherst, 14228, U.S.A.
Srirangaraj (Ranga) Setlur

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dholakia, J., Negi, A., Mohan, S.R. (2009). Progress in Gujarati Document Processing and Character Recognition. In: Govindaraju, V., Setlur, S. (eds) Guide to OCR for Indic Scripts. Advances in Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-84800-330-9_4

Download citation

DOI: https://doi.org/10.1007/978-1-84800-330-9_4
Published: 28 August 2009
Publisher Name: Springer, London
Print ISBN: 978-1-84800-329-3
Online ISBN: 978-1-84800-330-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics