Skip to main content

Recognition of Malayalam Documents

  • Chapter
  • First Online:
Guide to OCR for Indic Scripts

Part of the book series: Advances in Pattern Recognition ((ACVPR))

Abstract

Malayalam is an Indian language spoken by 40 million people with its own script. It has a rich literary tradition. A character recognition system for this language will be of immense help in a spectrum of applications ranging from data entry to reading aids. The Malayalam script has a large number of similar characters making the recognition problem challenging. In this chapter, we present our approach for recognition of Malayalam documents, both printed and handwritten. Classification results as well as ongoing activities are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bishop Robert Caldwell: Comparative Grammar of Dravidian Languages (1875).

    Google Scholar 

  2. Nagy, G. and Seth, S.C.: Hierarchical Representation of Optically Scanned Documents. Proceedings of the 7th International Conference on Pattern Recognition, Montreal (1984) 347–349.

    Google Scholar 

  3. Ulichney, R.: Digital Halftoning. The MIT Press, Cambridge, MA, (1987).

    Google Scholar 

  4. Ulloor S Parameswara Iyer: Kerala Sahitya Charitram, Vol 1–5 (in Malayalam) Kerala University Press, Trivandrum, 1953.

    Google Scholar 

  5. Fujisawa, H., Nakano, Y., and Kurino, K.: Segmentation Methods for Character Recognition: From Segmentation to Document Structure Analysis. in Proceedings of the IEEE 80, (1992) 1079–1092.

    Google Scholar 

  6. Haralick, R.M.: Document Image Understanding: Geometric and Logical Layout. in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, Seattle, WA (1994) pp. 385–390.

    Google Scholar 

  7. Jain, A.K. and Yu, B.: Document Representation and its Application to Page Decomposition. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, (1998) 294–308.

    Article  Google Scholar 

  8. Nagy G.: Twenty Years of Document Image Analysis in PAMI. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, (2000) 38–62.

    Article  Google Scholar 

  9. Trier, D., Jain, A.K., and Taxt, T.: Feature Extraction Methods for Character Recognition – A Survey. Pattern Recognition 29 (4), (1996) 641–662.

    Article  Google Scholar 

  10. Bagdanov, A.D. and Worring, M.: First Order Gaussian Graphs for Efficient Structure Classification. Pattern Recognition 36, (2003) 1311–1324.

    Article  MATH  Google Scholar 

  11. Yamashita, A., Amano, T., Takahashi, I., rand Toyokawa, K.: A Model-based Layout Understanding Method for the Document Recognition System. in Proceedings of the International Conference on Document Analysis and Recognition, Saint-Malo, France (1991) pp. 130–138.

    Google Scholar 

  12. Kreich, J., Luhn, A., and Maderlechner, G.: An Experimental Environment for Model-Based Document Analysis. in Proceedings of the International Conference on Document Analysis and Recognition, Saint-Malo, France (1991), pp. 50–58.

    Google Scholar 

  13. Niyogi, D. and Srihari, S.N.: Knowledge-Based Derivation of Document Logical Structure. in Proceedings of the International Conference on Document Analysis and Recognition, Montreal, Canada (1995), pp. 472–475.

    Google Scholar 

  14. Mao, S. and Kanungo, T.: Empirical Performance Evaluation Methodology and its Application to Page Segmentation Algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (2001), 242–256.

    Article  Google Scholar 

  15. Artires, T.: Poorly Structured Handwritten Documents Segmentation using Continuous Probabilistic Feature Grammars. in Workshop on Document Layout Interpretation and its Applications (DLIA2003).

    Google Scholar 

  16. Namboodiri, A.M. and Jain, A.K.: Robust Segmentation of Unconstrained On-line Handwritten Documents. in Proceedings of the Fourth Indian Conference on Computer Vision, Graphics and Image Processing, Calcutta, India (2004), 165–170.

    Google Scholar 

  17. Chalasani, Tejo Krishna, Namboodiri, Anoop, and Jawahar, C.V.: Support Vector Machine based Hierarchical Classifiers for Large Class Problems. in Proceedings of the sixth International Conference on Advances in Pattern Recognition, Kolkata, India (2007).

    Google Scholar 

  18. Sesh Kumar, K.S., Kumar, Sukesh, and Jawahar, C.V.: On Segmentation of Documents in Complex Scripts. in Proceedings of International Conference on Document Analysis and Recognition, Brazil (2007), 1243–1247.

    Google Scholar 

  19. Sesh Kumar, K.S., Namboodiri, Anoop M., and Jawahar, C.V.: Learning Segmentation of Documents with Complex Scripts. in Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India (2006), pp. 749–760.

    Google Scholar 

  20. Neeba, N.V. and Jawahar, C.V.: Recognition of Books by Verification and Retraining. in Proceedings of the International Conference on Pattern Recognition, Tampa, Florida (2008).

    Google Scholar 

  21. Alahari, Karteek, Lahari, Satya P., and Jawahar, C.V.: Discriminant Substrokes for Online Handwriting Recognition. in Proceedings of the International Conference on Document Analysis and Recognition, Seoul, Korea (2005), 499–503.

    Google Scholar 

  22. NIST : NIST Scientific and Technical Databases, http://www.nist.gov/srd/.

  23. LAMP: Documents and Standards Information, http://documents.cfar.umd.edu/resources/database/

  24. Anand Kumar, A. Balasubramanian, Anoop M. Namboodiri and C.V. Jawahar: Model-Based Annotation of Online Handwritten Datasets. in Proceedings of IWFHR-2006, October 23-26, 2006, La Baule, France.

    Google Scholar 

  25. Karteek Alahari, Satya Lahari Putrevu, and Jawahar, C.V.: Learning Mixtures of Offline and Online Features for Handwritten Stroke Recognition. in Proceedings of International Conference on Pattern Recognition, Hong Kong, Aug 2006, Vol. III, pp.379-382.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to N.V. Neeba .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag London Limited

About this chapter

Cite this chapter

Neeba, N., Namboodiri, A., Jawahar, C., Narayanan, P. (2009). Recognition of Malayalam Documents. In: Govindaraju, V., Setlur, S. (eds) Guide to OCR for Indic Scripts. Advances in Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-84800-330-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-84800-330-9_6

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84800-329-3

  • Online ISBN: 978-1-84800-330-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics