Abstract
Script identification from a given document image has some important applicability in many computer applications such as automatic archiving of multilingual documents, searching online archives of document images and for the selection of script specific Optical Character Recognition (OCR) engine in any multilingual environment. In this paper, we propose a texture based approach for text line-level script identification of six handwritten scripts namely, Bangla, Devnagari, Malayalam, Tamil, Telugu and Roman. A set of 80 features based on Gray Level Co-occurrence Matrix (GLCM) has been designed for the present work. Multi Layer Perceptron (MLP) is found to be the best classifier among a set of popular multiple classifiers which is then extensively tested by tuning different parameters. Finally, an accuracy of 95.67 % has been achieved on a dataset of 600 text lines using 3-fold cross validation with epoch size 1,500 of MLP classifier.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Pal, U., Chaudhuri, B.B.: Script line separation from indian multi-script documents. In: Proceedings of 5th International Conference on Document Analysis and Recognition (ICDAR), pp. 406–409. (1999)
Pal, U., Chaudhuri, B.B.: Identification of different script lines from multi-script documents. Image Vis. Comput. 20(13–14), 945–954 (2002)
Pal, U., Sinha, S., Chaudhuri, B.B.: Multi-script line identification from indian documents. In: Proceedings of 7th International Conference on Document Analysis and Recognition (ICDAR), pp. 880–884. (2003)
Joshi, G.D., Garg, S., Sivaswamy, J.: Script identification from Indian documents. In: International Workshop Document Analysis Systems, Nelson. Lecture Notes in Computer Science, vol. 3872, pp. 255–267. (2006)
Padma, M.C., Vijaya, P.A.: Identification of Telugu, Devnagari and English scripts using discriminating features. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 1(2), 64–78 (2009)
Gopakumar, R., SubbaReddy, N.V., Makkithaya, K., Dinesh Acharya, U.: Script identification from multilingual indian documents using structural features. J. Comput. 2(7), 106–111 (2010)
Chaudhuri, B.B., Bera, S.: Handwritten text line identification in Indian scripts. In: Proceedings of 10th International Conference on Document Analysis and Recognition, pp. 636–640. (2009)
Hangarge, M., Dhandra, B.V.: Offline handwritten script identification in document images. Int. J. Comput. Appl. (IJCA) 4(6), 6–10 (2010)
Haralick, R.M., Shanmungam, K., Dinstein, I.: Textural features of image classification. IEEE Trans. Syst. Man, Cybern. 3, 610–621 (1973)
Haralick, R.M., Watson, L.: A facet model for image data. Comput. Vision Graph. Image Process. 15, 113–129 (1981)
Busch, A., Boles, W.W., Sridharan, S.: Texture for script identification. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1720–1732 (2005)
Gonzalez, R.C., Woods, R.E.: Digital Image Processing, vol. I. PHI, New Delhi (1992)
Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M.: Extraction of text lines from handwritten documents using piecewise water flow technique. J. Intell. Syst. 23(3), 245–260 (2014)
Ostu, N.: A thresholding selection method from gray-level histogram. IEEE Trans. Syst. Man Cybern. SMC-8, 62–66 (1978)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer India
About this paper
Cite this paper
Singh, P.K., Sarkar, R., Nasipuri, M. (2015). Line-Level Script Identification for Six Handwritten Scripts Using Texture Based Features. In: Mandal, J., Satapathy, S., Kumar Sanyal, M., Sarkar, P., Mukhopadhyay, A. (eds) Information Systems Design and Intelligent Applications. Advances in Intelligent Systems and Computing, vol 340. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2247-7_30
Download citation
DOI: https://doi.org/10.1007/978-81-322-2247-7_30
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2246-0
Online ISBN: 978-81-322-2247-7
eBook Packages: EngineeringEngineering (R0)