Abstract
In a multilingual country like India it is a common scenario that a handwritten text document may contain more than one script. This causes practical difficulty in digitizing such a document, because the language type of the text should be pre-determined, before feeding it into a suitable Optical Character Recognition (OCR) system. In this paper, an intelligent feature based technique is reported, which automatically identifies the scripts of handwritten words from a document page, written in Devnagari script mixed with Roman script. The word-level script identification is performed by applying Multi layer Perceptron (MLP) based classifier with 39 distinctive features. The technique is tested on 100 handwritten document pages containing both Devnagari and Roman script words and 99.54% of words are identified with their true class.
Chapter PDF
Similar content being viewed by others
Keywords
References
Pal, U., Choudhuri, B.B.: Script Line Separation From Indian Multi-Script Documents. In: Proc. of 5th International Conference on Document Analysis and Recognition, pp. 406–409. IEEE Comput. Soc. Press (1999)
BasvarajPatil, S., Subba Reddy, N.V.: Character script class identification system using probabilistic neural network for multi-script multi lingual document processing. In: Proc. of National Conference on Document Analysis and Recognition, Karnataka, pp. 1–8 (2001)
Pal, U., Choudhuri, B.B.: Automatic Separation of Words in Multi Lingual multi Script Indian Documents. In: Proc. of 4th International Conference on Document Analysis and Recognition, pp. 576–579 (1997)
Chanda, S., Pal, U.: English, Devnagari and Urdu Text Identification. In: Proc. of International Conference on Document Analysis and Recognition, pp. 538–545 (2005)
Pal, U., Sinha, S., Choudhuri, B.B.: Word-wise script identification from a document containing English, Devanagari and Telugu text. In: Proc. of 2nd National Conference on Document Analysis and Recognition, Karnataka, India, pp. 213–220 (2003)
Padma, M.C., Nagabhushan, P.: Identification and separation of text words of Kannada, Hindi and English languages through discriminating features. In: Proc. of 2nd National Conference on Document Analysis and Recognition, Karnataka, pp. 234-245 (2003)
Spitz, A.L.: Determination of the script and language content of document images. Proc. of IEEE Tran. on Pattern Analysis and Machine Intelligence 19, 234–245 (1997)
Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., Basu, D.K.: Word-Level Script Identification from Bangla And Devanagri Handwritten Texts mixed with Roman script. Journal of Computing 2(2) (2010)
Basu, S., Sarkar, R., Das, N., Kundu, M., Nasipuri, M., Basu, D.K.: A Fuzzy Technique for Segmentation of Handwritten Bangla Word Images. In: Proc. of International Conference on Computing: Theory and Applications (ICCTA), pp. 427–433 (2007)
Das, N., Pramanik, S., Basu, S., Saha, P.K., Sarkar, R., Kundu, M., Nasipuri, M.: Recognition of handwritten Bangla basic characters and digits using convex hull based feature set. In: Proc. of International Conference on Artificial Intelligence and Pattern Recognition, AIPR (2009)
Vysniauskaite, L., et al.: A Priori Filtration Of Points For Finding Convex Hull, Tede, vol. XII(4), pp. 341–346 (2006)
Lady, E.L.: (February 14, 2000), http://www.math.hawaii.edu/~lee/calculus/green.pdf
Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., Basu, D.K.: A two-stage approach for Segmentation of Handwritten Bangla word Images. In: Proc. of International Conference on Frontiers in Handwritten Recognition (ICFHR), Canada, pp. 227-260 (2008)
Sarkar, R., Malakar, S., Das, N., Basu, S., Kundu, M., Nasipuri, M.: Word Extraction and Character Segmentation from Text Lines of Unconstrained Handwritten Bangla Document Images. Journal of Intelligent Systems 20(3), 227–260 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Singh, P.K., Sarkar, R., Das, N., Basu, S., MitaNasipuri (2013). Identification of Devnagari and Roman Scripts from Multi-script Handwritten Documents. In: Maji, P., Ghosh, A., Murty, M.N., Ghosh, K., Pal, S.K. (eds) Pattern Recognition and Machine Intelligence. PReMI 2013. Lecture Notes in Computer Science, vol 8251. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45062-4_70
Download citation
DOI: https://doi.org/10.1007/978-3-642-45062-4_70
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45061-7
Online ISBN: 978-3-642-45062-4
eBook Packages: Computer ScienceComputer Science (R0)