Automatic Text-Line Level Handwritten Indic Script Recognition: A Two-Stage Framework

Singh, Pawan Kumar; Mukhopadhyay, Anirban; Sarkar, Ram; Nasipuri, Mita

doi:10.1007/978-981-10-7563-6_39

Pawan Kumar Singh¹⁸,
Anirban Mukhopadhyay¹⁸,
Ram Sarkar¹⁸ &
…
Mita Nasipuri¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 701))

1312 Accesses

Abstract

Script dependency of the Optical Character Recognition (OCR) systems is a huge obstacle for the digitalization of document images in a multi-script environment. Researchers around the world have developed various feature extraction and classification methodologies till date but mostly those are limited to bi-script and tri-script scenarios. The present work proposes an automatic two-stage framework for text-line based script recognition from the document images written in 12 Indic scripts. A misclassified text-line, at the first stage, is further examined by segmenting the same into its constituent words and the script recognition module is repeated on the obtained words. The pooled consequence of this two-stage framework helps to improve the overall accuracy of text-line level script classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Singh, P.K., Sarkar, R., Das, N., Basu, S., Nasipuri, M.: Identification of Devnagari and Roman script from multiscript handwritten documents. In: Proceedings of 5th International Conference on PReMI, pp. 509–514. LNCS 8251 (2013)
Google Scholar
Joshi, G.D., Garg, S., Sivaswamy, J.: Script identification from Indian documents. In: Lecture Notes in Computer Science: International Workshop Document Analysis Systems, pp. 255–267. Nelson, LNCS-3872, Feb 2006
Google Scholar
Hiremath, P.S., Shivashankar, S.: Wavelet based co-occurrence histogram features for texture classification with an application to script identification in document image. Pattern Recogn. Lett. 29(9), 1182–1189 (2008)
Article Google Scholar
Pati, P.B., Ramakrishnan, A.G.: Word level multi-script identification. Pattern Recogn. Lett. 29(9), 1218–1229 (2008)
Article Google Scholar
Padma, M.C., Vijaya, P.A.: Wavelet packet based texture features for automatic script identification. Int. J. Image Process. 4(1), 53–65 (2010)
Google Scholar
Obaidullah, S.M., Mondal, A., Roy, K.: Structural feature based approach for script identification from printed Indian document. In: Proceedings of IEEE Signal Processing and Integrated Networks (SPIN), pp. 120–124 (2014)
Google Scholar
Obaidullah, S.M., Mondal, A., Das, N., Roy, K.: Script identification from printed Indian document images and performance evaluation using different classifiers. Appl. Comput. Intel. Soft Comput. Article ID: 896128, 1–12 (2014)
Google Scholar
Hangarge, M., Santosh, K.C., Pardeshi, R.: Directional discrete cosine transform for handwritten script identification. In: Proceedings of 12th IEEE International Conference on Document Analysis and Recognition (ICDAR), pp. 344–348 (2013)
Google Scholar
Pardeshi, R., Chaudhuri, B.B., Hangarge, M., Santosh, K.C.: Automatic handwritten Indian scripts identification. In: Proceedings of 14th IEEE International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 375–340 (2014)
Google Scholar
Singh, P.K., Sarkar, R., Nasipuri, M.: Offline script identification from multilingual Indic-script documents: a state-of-the-art. Comput. Sci. Rev. (Elsevier) 15–16, 1–28 (2015)
MathSciNet Google Scholar
Pal, U., Sinha, S., Chaudhuri, B.B.: Multi-script line identification from Indian documents. In: Proceedings of 7th International Conference on Document Analysis and Recognition (ICDAR), pp. 880–884, Aug 2003
Google Scholar
Obaidullah, S.M., Santosh, K.C., Halder, C., Das, N., Roy, K.: Automatic Indic script identification from handwritten documents: page, block, line and word-level approach. Int. J. Mach. Learn. Cybern. 1–20 (2017)
Google Scholar
Chanda, S., Pal, S., Franke, K., Pal, U.: Two-stage approach for word-wise script identification. In: Proceedings of 10th IEEE International Conference on Document Analysis and Recognition (ICDAR), pp. 926–930 (2009)
Google Scholar
Ubul, K., Tursun, G., Aysa, A., Impedovo, D., Pirlo, G., Yibulayin, T.: Script identification of multi-script documents: a survey. IEEE Access 5, 6546–6559 (2017)
Google Scholar
Singh, P.K., Chatterjee, I., Sarkar, R.: Page-level handwritten script identification using modified log-Gabor filter based features. In: Proceedings of 2nd IEEE International Conference on Recent Trends in Information Systems (ReTIS), pp. 225–230 (2015)
Google Scholar
Sarkar, R., Malakar, S., Das, N., Basu, S., Kundu, M., Nasipuri, M.: Word extraction and character segmentation from text lines of unconstrained handwritten bangla document images. J. Intel. Syst. 20(3), 227–260 (2011)
Google Scholar
Gonzalez, R.C., Woods, R.E.: Digital Image Processing, vol. I. Prentice-Hall, India (1992)
Google Scholar
Khandelwal, A., Choudhury, P., Sarkar, R., Basu, S., Nasipuri, M., Das, N.: Text line segmentation for unconstrained handwritten document images using neighborhood connected component analysis. In: Proceedings of 3rd International Conference on Pattern Recognition and Machine Intelligence (PReMI’ 09). LNCS 5909, pp. 369–374 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Jadavpur University, 188, Raja S.C. Mullick Road, Kolkata, 700032, West Bengal, India
Pawan Kumar Singh, Anirban Mukhopadhyay, Ram Sarkar & Mita Nasipuri

Authors

Pawan Kumar Singh
View author publications
You can also search for this author in PubMed Google Scholar
Anirban Mukhopadhyay
View author publications
You can also search for this author in PubMed Google Scholar
Ram Sarkar
View author publications
You can also search for this author in PubMed Google Scholar
Mita Nasipuri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pawan Kumar Singh .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, PVP Siddhartha Institute of Technology, Vijayawada, Andhra Pradesh, India
Suresh Chandra Satapathy
Departamento de Engenharia Mecânica, Universidade do Porto, Porto, Portugal
Joao Manuel R.S. Tavares
Department of Electronics and Communication Engineering, SRMGPC, Lucknow, Uttar Pradesh, India
Vikrant Bhateja
School of Computer Application, KIIT University, Bhubaneswar, Odisha, India
J. R. Mohanty

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Singh, P.K., Mukhopadhyay, A., Sarkar, R., Nasipuri, M. (2018). Automatic Text-Line Level Handwritten Indic Script Recognition: A Two-Stage Framework. In: Satapathy, S., Tavares, J., Bhateja, V., Mohanty, J. (eds) Information and Decision Sciences. Advances in Intelligent Systems and Computing, vol 701. Springer, Singapore. https://doi.org/10.1007/978-981-10-7563-6_39

Download citation

DOI: https://doi.org/10.1007/978-981-10-7563-6_39
Published: 14 April 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7562-9
Online ISBN: 978-981-10-7563-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics