Skip to main content

Comparative Analysis of Gabor and Discriminating Feature Extraction Techniques for Script Identification

  • Conference paper
Information Systems for Indian Languages (ICISIL 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 139))

Included in the following conference series:

Abstract

A considerable amount of success has been achieved in developing monolingual OCR systems for Indian Scripts. But in a country like India, where many languages and scripts exist, it is more common that a single document contain words from more than one script. Therefore a script identification system is required to select the appropriate OCR. This paper presents a comparative analysis of two different feature extraction techniques for script identification of each word. In this work, for script identification discriminating and Gabor filter based features are computed of Punjabi words and English numerals. Extracted feature are simulated with Knn and SVM classifiers to identify the script and then recognition rates are compared. It has been observed that by selecting the appropriate value of k and appropriate kernel function with appropriate combination of feature extraction and classification scheme, there is significant drop in error rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dhanya, D., Ramakrishnan, A.G.: Simultaneous Recognition of Tamil and Roman Scripts. In: The Proc. Tamil Internet, Kuala Lumpur, pp. 64–68 (2001)

    Google Scholar 

  2. Rani, R., Dhir, R.: A Survey: Recognition of Scripts in Bi-Lingual/Multi-Lingual Indian Documents. National Journal of PIMT Journal of Research 2(1), 55–60 (2009)

    Google Scholar 

  3. Abirami, S., Manjula, D.: A Survey of Script Identification Techniques for Multi-Script Document Images. international journal of Recent trends in Engineering 1(2), 246–249 (2009)

    Google Scholar 

  4. Devijver, P.A., Kittler, J.: Pattern Recognition: A statistical Approach. Prentice –Hall, London (1982)

    MATH  Google Scholar 

  5. Wood, S., Yao, X., Krishnamurthi, K., Dang, L.: Language identification from for printrd trxt independent od fsegmentation. In: Proc of International Conference on Image Processing, pp. 428–431 (1995)

    Google Scholar 

  6. Dhanya, D., Ramakrishnan, A.G., Pati, P.B.: Script identification in printed bilingual documents. Sadhana 27(part 1), 73–82 (2002)

    Article  MATH  Google Scholar 

  7. Pal, U., Sinha, S., Chaudhuri, B.B.: Word-wise Script identification from a document containing English,Devnagari and Telgu Text. In: The Proc. of NCDAR, pp. 213–220 (2003)

    Google Scholar 

  8. Padma, M.C., Vijya, P.A.: Language Identification of Kannada, Hindi and English Text Words through Visual Discriminating features. The International Journal of Computational Intelligence Systems 1(2), 116–126 (2008)

    Article  Google Scholar 

  9. Dhir, R., Singh, C., Lehal, G.S.: A Structural Feature Based Approach for Script Identification of Gurmukhi and Roman Character and Words. In: The proc. of 39th Annual National Convention of Computer Society of India (CSI) held at Mumbai, India (2004)

    Google Scholar 

  10. Pati, P.B., Raju, S.S., Pati, N., Ramakrishnan, A.G.: Gabor filters for document analysis in Indian Bilingual Documents. In: The Proc. Of ICISIP, pp. 123–126 (2004)

    Google Scholar 

  11. Pati, P.B., Ramakrishnan, A.G.: HVS inspired system for Script Identification in Indian Multi-Script Documents. In: Proc. of 7th International Workshop on Document Analysis System, Nelson Newland, pp. 380–389 (2006)

    Google Scholar 

  12. Pati, P.B., Ramakrishnan, A.G.: Word level multi-script identification. The Pattern Recognition Letters 29, 1218–1219 (2008)

    Article  Google Scholar 

  13. Dhandra, B.V., Mallikarjun, H., Hegadi, R., Malemath, V.S.: Word-wise Script Identification from Bilingual Documents based on Morphological Reconstruction. In: The Proc. of First IEEE International Conference on Digital Information Management, pp. 389–394 (2006)

    Google Scholar 

  14. Dhandra, B.V., Mallikarjun, H., Hegadi, R., Malemath, V.S.: Word–wise Script Identification based on Morphological Reconstruction in Printed Bilingual Documents. In: The Proc. of IET International Conference on Vision Information Engineering VIE, Bangalore, pp. 389–393 (2006)

    Google Scholar 

  15. Dhandra, B.V., Hangarge, M.: On Separation of English Numerals from Multilingual Document Images. The Journal of Multimedia 2(6), 26–33 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rani, R., Dhir, R., Lehal, G.S. (2011). Comparative Analysis of Gabor and Discriminating Feature Extraction Techniques for Script Identification. In: Singh, C., Singh Lehal, G., Sengupta, J., Sharma, D.V., Goyal, V. (eds) Information Systems for Indian Languages. ICISIL 2011. Communications in Computer and Information Science, vol 139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19403-0_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19403-0_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19402-3

  • Online ISBN: 978-3-642-19403-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics