Identification of Devnagari and Roman Scripts from Multi-script Handwritten Documents

Singh, Pawan Kumar; Sarkar, Ram; Das, Nibaran; Basu, Subhadip; MitaNasipuri

doi:10.1007/978-3-642-45062-4_70

Pawan Kumar Singh¹⁸,
Ram Sarkar¹⁸,
Nibaran Das¹⁸,
Subhadip Basu¹⁸ &
…
MitaNasipuri¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8251))

Included in the following conference series:

International Conference on Pattern Recognition and Machine Intelligence

1771 Accesses
8 Citations

Abstract

In a multilingual country like India it is a common scenario that a handwritten text document may contain more than one script. This causes practical difficulty in digitizing such a document, because the language type of the text should be pre-determined, before feeding it into a suitable Optical Character Recognition (OCR) system. In this paper, an intelligent feature based technique is reported, which automatically identifies the scripts of handwritten words from a document page, written in Devnagari script mixed with Roman script. The word-level script identification is performed by applying Multi layer Perceptron (MLP) based classifier with 39 distinctive features. The technique is tested on 100 handwritten document pages containing both Devnagari and Roman script words and 99.54% of words are identified with their true class.

Download to read the full chapter text

Chapter PDF

Statistical Textural Features for Text-Line Level Handwritten Indic Script Identification

Line-Level Script Identification for Six Handwritten Scripts Using Texture Based Features

Word-Level Script Identification from Handwritten Multi-script Documents

Keywords

References

Pal, U., Choudhuri, B.B.: Script Line Separation From Indian Multi-Script Documents. In: Proc. of 5th International Conference on Document Analysis and Recognition, pp. 406–409. IEEE Comput. Soc. Press (1999)
Google Scholar
BasvarajPatil, S., Subba Reddy, N.V.: Character script class identification system using probabilistic neural network for multi-script multi lingual document processing. In: Proc. of National Conference on Document Analysis and Recognition, Karnataka, pp. 1–8 (2001)
Google Scholar
Pal, U., Choudhuri, B.B.: Automatic Separation of Words in Multi Lingual multi Script Indian Documents. In: Proc. of 4th International Conference on Document Analysis and Recognition, pp. 576–579 (1997)
Google Scholar
Chanda, S., Pal, U.: English, Devnagari and Urdu Text Identification. In: Proc. of International Conference on Document Analysis and Recognition, pp. 538–545 (2005)
Google Scholar
Pal, U., Sinha, S., Choudhuri, B.B.: Word-wise script identification from a document containing English, Devanagari and Telugu text. In: Proc. of 2nd National Conference on Document Analysis and Recognition, Karnataka, India, pp. 213–220 (2003)
Google Scholar
Padma, M.C., Nagabhushan, P.: Identification and separation of text words of Kannada, Hindi and English languages through discriminating features. In: Proc. of 2nd National Conference on Document Analysis and Recognition, Karnataka, pp. 234-245 (2003)
Google Scholar
Spitz, A.L.: Determination of the script and language content of document images. Proc. of IEEE Tran. on Pattern Analysis and Machine Intelligence 19, 234–245 (1997)
Google Scholar
Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., Basu, D.K.: Word-Level Script Identification from Bangla And Devanagri Handwritten Texts mixed with Roman script. Journal of Computing 2(2) (2010)
Google Scholar
Basu, S., Sarkar, R., Das, N., Kundu, M., Nasipuri, M., Basu, D.K.: A Fuzzy Technique for Segmentation of Handwritten Bangla Word Images. In: Proc. of International Conference on Computing: Theory and Applications (ICCTA), pp. 427–433 (2007)
Google Scholar
Das, N., Pramanik, S., Basu, S., Saha, P.K., Sarkar, R., Kundu, M., Nasipuri, M.: Recognition of handwritten Bangla basic characters and digits using convex hull based feature set. In: Proc. of International Conference on Artificial Intelligence and Pattern Recognition, AIPR (2009)
Google Scholar
Vysniauskaite, L., et al.: A Priori Filtration Of Points For Finding Convex Hull, Tede, vol. XII(4), pp. 341–346 (2006)
Google Scholar
Lady, E.L.: (February 14, 2000), http://www.math.hawaii.edu/~lee/calculus/green.pdf
Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., Basu, D.K.: A two-stage approach for Segmentation of Handwritten Bangla word Images. In: Proc. of International Conference on Frontiers in Handwritten Recognition (ICFHR), Canada, pp. 227-260 (2008)
Google Scholar
Sarkar, R., Malakar, S., Das, N., Basu, S., Kundu, M., Nasipuri, M.: Word Extraction and Character Segmentation from Text Lines of Unconstrained Handwritten Bangla Document Images. Journal of Intelligent Systems 20(3), 227–260 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
Pawan Kumar Singh, Ram Sarkar, Nibaran Das, Subhadip Basu & MitaNasipuri

Authors

Pawan Kumar Singh
View author publications
You can also search for this author in PubMed Google Scholar
Ram Sarkar
View author publications
You can also search for this author in PubMed Google Scholar
Nibaran Das
View author publications
You can also search for this author in PubMed Google Scholar
Subhadip Basu
View author publications
You can also search for this author in PubMed Google Scholar
MitaNasipuri
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Machine Intelligence Unit, Indian Statistical Institute, 203, B. T. Road, 700108, Kolkata, India
Pradipta Maji , Ashish Ghosh , Kuntal Ghosh & Sankar K. Pal , , &
Department of Computer Science and Automation, Indian Institute of Science, 560012, Bangalore, India
M. Narasimha Murty

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Singh, P.K., Sarkar, R., Das, N., Basu, S., MitaNasipuri (2013). Identification of Devnagari and Roman Scripts from Multi-script Handwritten Documents. In: Maji, P., Ghosh, A., Murty, M.N., Ghosh, K., Pal, S.K. (eds) Pattern Recognition and Machine Intelligence. PReMI 2013. Lecture Notes in Computer Science, vol 8251. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45062-4_70

Download citation

DOI: https://doi.org/10.1007/978-3-642-45062-4_70
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45061-7
Online ISBN: 978-3-642-45062-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Identification of Devnagari and Roman Scripts from Multi-script Handwritten Documents

Abstract

Chapter PDF

Similar content being viewed by others

Statistical Textural Features for Text-Line Level Handwritten Indic Script Identification

Line-Level Script Identification for Six Handwritten Scripts Using Texture Based Features

Word-Level Script Identification from Handwritten Multi-script Documents

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Identification of Devnagari and Roman Scripts from Multi-script Handwritten Documents

Abstract

Chapter PDF

Similar content being viewed by others

Statistical Textural Features for Text-Line Level Handwritten Indic Script Identification

Line-Level Script Identification for Six Handwritten Scripts Using Texture Based Features

Word-Level Script Identification from Handwritten Multi-script Documents

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation