Abstract
A considerable amount of success has been achieved in developing monolingual OCR systems for Indian Scripts. But in a country like India, where many languages and scripts exist, it is more common that a single document contain words from more than one script. Therefore a script identification system is required to select the appropriate OCR. This paper presents a comparative analysis of two different feature extraction techniques for script identification of each word. In this work, for script identification discriminating and Gabor filter based features are computed of Punjabi words and English numerals. Extracted feature are simulated with Knn and SVM classifiers to identify the script and then recognition rates are compared. It has been observed that by selecting the appropriate value of k and appropriate kernel function with appropriate combination of feature extraction and classification scheme, there is significant drop in error rate.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Dhanya, D., Ramakrishnan, A.G.: Simultaneous Recognition of Tamil and Roman Scripts. In: The Proc. Tamil Internet, Kuala Lumpur, pp. 64–68 (2001)
Rani, R., Dhir, R.: A Survey: Recognition of Scripts in Bi-Lingual/Multi-Lingual Indian Documents. National Journal of PIMT Journal of Research 2(1), 55–60 (2009)
Abirami, S., Manjula, D.: A Survey of Script Identification Techniques for Multi-Script Document Images. international journal of Recent trends in Engineering 1(2), 246–249 (2009)
Devijver, P.A., Kittler, J.: Pattern Recognition: A statistical Approach. Prentice –Hall, London (1982)
Wood, S., Yao, X., Krishnamurthi, K., Dang, L.: Language identification from for printrd trxt independent od fsegmentation. In: Proc of International Conference on Image Processing, pp. 428–431 (1995)
Dhanya, D., Ramakrishnan, A.G., Pati, P.B.: Script identification in printed bilingual documents. Sadhana 27(part 1), 73–82 (2002)
Pal, U., Sinha, S., Chaudhuri, B.B.: Word-wise Script identification from a document containing English,Devnagari and Telgu Text. In: The Proc. of NCDAR, pp. 213–220 (2003)
Padma, M.C., Vijya, P.A.: Language Identification of Kannada, Hindi and English Text Words through Visual Discriminating features. The International Journal of Computational Intelligence Systems 1(2), 116–126 (2008)
Dhir, R., Singh, C., Lehal, G.S.: A Structural Feature Based Approach for Script Identification of Gurmukhi and Roman Character and Words. In: The proc. of 39th Annual National Convention of Computer Society of India (CSI) held at Mumbai, India (2004)
Pati, P.B., Raju, S.S., Pati, N., Ramakrishnan, A.G.: Gabor filters for document analysis in Indian Bilingual Documents. In: The Proc. Of ICISIP, pp. 123–126 (2004)
Pati, P.B., Ramakrishnan, A.G.: HVS inspired system for Script Identification in Indian Multi-Script Documents. In: Proc. of 7th International Workshop on Document Analysis System, Nelson Newland, pp. 380–389 (2006)
Pati, P.B., Ramakrishnan, A.G.: Word level multi-script identification. The Pattern Recognition Letters 29, 1218–1219 (2008)
Dhandra, B.V., Mallikarjun, H., Hegadi, R., Malemath, V.S.: Word-wise Script Identification from Bilingual Documents based on Morphological Reconstruction. In: The Proc. of First IEEE International Conference on Digital Information Management, pp. 389–394 (2006)
Dhandra, B.V., Mallikarjun, H., Hegadi, R., Malemath, V.S.: Word–wise Script Identification based on Morphological Reconstruction in Printed Bilingual Documents. In: The Proc. of IET International Conference on Vision Information Engineering VIE, Bangalore, pp. 389–393 (2006)
Dhandra, B.V., Hangarge, M.: On Separation of English Numerals from Multilingual Document Images. The Journal of Multimedia 2(6), 26–33 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rani, R., Dhir, R., Lehal, G.S. (2011). Comparative Analysis of Gabor and Discriminating Feature Extraction Techniques for Script Identification. In: Singh, C., Singh Lehal, G., Sengupta, J., Sharma, D.V., Goyal, V. (eds) Information Systems for Indian Languages. ICISIL 2011. Communications in Computer and Information Science, vol 139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19403-0_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-19403-0_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19402-3
Online ISBN: 978-3-642-19403-0
eBook Packages: Computer ScienceComputer Science (R0)