Abstract
Malayalam is an Indian language spoken by 40 million people with its own script. It has a rich literary tradition. A character recognition system for this language will be of immense help in a spectrum of applications ranging from data entry to reading aids. The Malayalam script has a large number of similar characters making the recognition problem challenging. In this chapter, we present our approach for recognition of Malayalam documents, both printed and handwritten. Classification results as well as ongoing activities are presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bishop Robert Caldwell: Comparative Grammar of Dravidian Languages (1875).
Nagy, G. and Seth, S.C.: Hierarchical Representation of Optically Scanned Documents. Proceedings of the 7th International Conference on Pattern Recognition, Montreal (1984) 347–349.
Ulichney, R.: Digital Halftoning. The MIT Press, Cambridge, MA, (1987).
Ulloor S Parameswara Iyer: Kerala Sahitya Charitram, Vol 1–5 (in Malayalam) Kerala University Press, Trivandrum, 1953.
Fujisawa, H., Nakano, Y., and Kurino, K.: Segmentation Methods for Character Recognition: From Segmentation to Document Structure Analysis. in Proceedings of the IEEE 80, (1992) 1079–1092.
Haralick, R.M.: Document Image Understanding: Geometric and Logical Layout. in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, Seattle, WA (1994) pp. 385–390.
Jain, A.K. and Yu, B.: Document Representation and its Application to Page Decomposition. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, (1998) 294–308.
Nagy G.: Twenty Years of Document Image Analysis in PAMI. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, (2000) 38–62.
Trier, D., Jain, A.K., and Taxt, T.: Feature Extraction Methods for Character Recognition – A Survey. Pattern Recognition 29 (4), (1996) 641–662.
Bagdanov, A.D. and Worring, M.: First Order Gaussian Graphs for Efficient Structure Classification. Pattern Recognition 36, (2003) 1311–1324.
Yamashita, A., Amano, T., Takahashi, I., rand Toyokawa, K.: A Model-based Layout Understanding Method for the Document Recognition System. in Proceedings of the International Conference on Document Analysis and Recognition, Saint-Malo, France (1991) pp. 130–138.
Kreich, J., Luhn, A., and Maderlechner, G.: An Experimental Environment for Model-Based Document Analysis. in Proceedings of the International Conference on Document Analysis and Recognition, Saint-Malo, France (1991), pp. 50–58.
Niyogi, D. and Srihari, S.N.: Knowledge-Based Derivation of Document Logical Structure. in Proceedings of the International Conference on Document Analysis and Recognition, Montreal, Canada (1995), pp. 472–475.
Mao, S. and Kanungo, T.: Empirical Performance Evaluation Methodology and its Application to Page Segmentation Algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (2001), 242–256.
Artires, T.: Poorly Structured Handwritten Documents Segmentation using Continuous Probabilistic Feature Grammars. in Workshop on Document Layout Interpretation and its Applications (DLIA2003).
Namboodiri, A.M. and Jain, A.K.: Robust Segmentation of Unconstrained On-line Handwritten Documents. in Proceedings of the Fourth Indian Conference on Computer Vision, Graphics and Image Processing, Calcutta, India (2004), 165–170.
Chalasani, Tejo Krishna, Namboodiri, Anoop, and Jawahar, C.V.: Support Vector Machine based Hierarchical Classifiers for Large Class Problems. in Proceedings of the sixth International Conference on Advances in Pattern Recognition, Kolkata, India (2007).
Sesh Kumar, K.S., Kumar, Sukesh, and Jawahar, C.V.: On Segmentation of Documents in Complex Scripts. in Proceedings of International Conference on Document Analysis and Recognition, Brazil (2007), 1243–1247.
Sesh Kumar, K.S., Namboodiri, Anoop M., and Jawahar, C.V.: Learning Segmentation of Documents with Complex Scripts. in Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India (2006), pp. 749–760.
Neeba, N.V. and Jawahar, C.V.: Recognition of Books by Verification and Retraining. in Proceedings of the International Conference on Pattern Recognition, Tampa, Florida (2008).
Alahari, Karteek, Lahari, Satya P., and Jawahar, C.V.: Discriminant Substrokes for Online Handwriting Recognition. in Proceedings of the International Conference on Document Analysis and Recognition, Seoul, Korea (2005), 499–503.
NIST : NIST Scientific and Technical Databases, http://www.nist.gov/srd/.
LAMP: Documents and Standards Information, http://documents.cfar.umd.edu/resources/database/
Anand Kumar, A. Balasubramanian, Anoop M. Namboodiri and C.V. Jawahar: Model-Based Annotation of Online Handwritten Datasets. in Proceedings of IWFHR-2006, October 23-26, 2006, La Baule, France.
Karteek Alahari, Satya Lahari Putrevu, and Jawahar, C.V.: Learning Mixtures of Offline and Online Features for Handwritten Stroke Recognition. in Proceedings of International Conference on Pattern Recognition, Hong Kong, Aug 2006, Vol. III, pp.379-382.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag London Limited
About this chapter
Cite this chapter
Neeba, N., Namboodiri, A., Jawahar, C., Narayanan, P. (2009). Recognition of Malayalam Documents. In: Govindaraju, V., Setlur, S. (eds) Guide to OCR for Indic Scripts. Advances in Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-84800-330-9_6
Download citation
DOI: https://doi.org/10.1007/978-1-84800-330-9_6
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-84800-329-3
Online ISBN: 978-1-84800-330-9
eBook Packages: Computer ScienceComputer Science (R0)