Skip to main content

Line-Level Script Identification for Six Handwritten Scripts Using Texture Based Features

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 340))

Abstract

Script identification from a given document image has some important applicability in many computer applications such as automatic archiving of multilingual documents, searching online archives of document images and for the selection of script specific Optical Character Recognition (OCR) engine in any multilingual environment. In this paper, we propose a texture based approach for text line-level script identification of six handwritten scripts namely, Bangla, Devnagari, Malayalam, Tamil, Telugu and Roman. A set of 80 features based on Gray Level Co-occurrence Matrix (GLCM) has been designed for the present work. Multi Layer Perceptron (MLP) is found to be the best classifier among a set of popular multiple classifiers which is then extensively tested by tuning different parameters. Finally, an accuracy of 95.67 % has been achieved on a dataset of 600 text lines using 3-fold cross validation with epoch size 1,500 of MLP classifier.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Pal, U., Chaudhuri, B.B.: Script line separation from indian multi-script documents. In: Proceedings of 5th International Conference on Document Analysis and Recognition (ICDAR), pp. 406–409. (1999)

    Google Scholar 

  2. Pal, U., Chaudhuri, B.B.: Identification of different script lines from multi-script documents. Image Vis. Comput. 20(13–14), 945–954 (2002)

    Article  Google Scholar 

  3. Pal, U., Sinha, S., Chaudhuri, B.B.: Multi-script line identification from indian documents. In: Proceedings of 7th International Conference on Document Analysis and Recognition (ICDAR), pp. 880–884. (2003)

    Google Scholar 

  4. Joshi, G.D., Garg, S., Sivaswamy, J.: Script identification from Indian documents. In: International Workshop Document Analysis Systems, Nelson. Lecture Notes in Computer Science, vol. 3872, pp. 255–267. (2006)

    Google Scholar 

  5. Padma, M.C., Vijaya, P.A.: Identification of Telugu, Devnagari and English scripts using discriminating features. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 1(2), 64–78 (2009)

    Google Scholar 

  6. Gopakumar, R., SubbaReddy, N.V., Makkithaya, K., Dinesh Acharya, U.: Script identification from multilingual indian documents using structural features. J. Comput. 2(7), 106–111 (2010)

    Google Scholar 

  7. Chaudhuri, B.B., Bera, S.: Handwritten text line identification in Indian scripts. In: Proceedings of 10th International Conference on Document Analysis and Recognition, pp. 636–640. (2009)

    Google Scholar 

  8. Hangarge, M., Dhandra, B.V.: Offline handwritten script identification in document images. Int. J. Comput. Appl. (IJCA) 4(6), 6–10 (2010)

    Google Scholar 

  9. Haralick, R.M., Shanmungam, K., Dinstein, I.: Textural features of image classification. IEEE Trans. Syst. Man, Cybern. 3, 610–621 (1973)

    Article  Google Scholar 

  10. Haralick, R.M., Watson, L.: A facet model for image data. Comput. Vision Graph. Image Process. 15, 113–129 (1981)

    Article  Google Scholar 

  11. Busch, A., Boles, W.W., Sridharan, S.: Texture for script identification. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1720–1732 (2005)

    Article  Google Scholar 

  12. Gonzalez, R.C., Woods, R.E.: Digital Image Processing, vol. I. PHI, New Delhi (1992)

    Google Scholar 

  13. Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M.: Extraction of text lines from handwritten documents using piecewise water flow technique. J. Intell. Syst. 23(3), 245–260 (2014)

    Google Scholar 

  14. Ostu, N.: A thresholding selection method from gray-level histogram. IEEE Trans. Syst. Man Cybern. SMC-8, 62–66 (1978)

    Google Scholar 

  15. www.cs.waikato.ac.nz/ml/weka/documentation.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pawan Kumar Singh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer India

About this paper

Cite this paper

Singh, P.K., Sarkar, R., Nasipuri, M. (2015). Line-Level Script Identification for Six Handwritten Scripts Using Texture Based Features. In: Mandal, J., Satapathy, S., Kumar Sanyal, M., Sarkar, P., Mukhopadhyay, A. (eds) Information Systems Design and Intelligent Applications. Advances in Intelligent Systems and Computing, vol 340. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2247-7_30

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-2247-7_30

  • Published:

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-2246-0

  • Online ISBN: 978-81-322-2247-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics