Text-Line Extraction from Historical Kannada Document

Ravi, P.; Naveena, C.; Sharath Kumar, Y. H.; Manjunath Aradhya, V. N.

doi:10.1007/978-981-13-9920-6_28

P. Ravi¹⁹,
C. Naveena²⁰,
Y. H. Sharath Kumar²¹ &
…
V. N. Manjunath Aradhya²²

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1014))

393 Accesses
2 Citations

Abstract

In this work, we propose identification of text line from a historical Kannada document. The proposed method consists of three stages: initially, preprocess the image by using Sauvola’s method and then apply the connected component and projection profile method to detect the text position of the text line. Finally, each text line is segmented based on projection points. The propose method is evaluated on Kannada historical document. Experimentation is carried out on the seventeen Kannada historical documents, in which the total number of lines together in all documents is 217 lines. We have tried few trail-and-error methods to identify the lines in the historical document image. Using the first method, we have detected 140 lines, but multiple lines were seen between each text line; the accuracy using this method was 64.51%. In the second method, we could detect 107 and the accuracy achieved was 49.30%. By using the third method, we could clearly detect 178, with reduced number of lines in between the text lines, and the accuracy in this case is 82.02%. Hence, we can conclude that using the third method most of the lines were precisely detected and obtained encouraging the result.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Sulem, L.L., Zahour, A., Taconet, B.: Text line segmentation of historical documents a survey. IJDAR, 123–138 (2007)
Google Scholar
Roy, P.P., Rayar, F., Ramel, J.-Y.: An efficient coarse-to-fine indexing technique for fast text retrieval in historical documents. In: IEEE, pp. 150–154 (2012)
Google Scholar
Keyvanpour, M., Tavoli, R.: Document image retrieval algorithms, analysis and promising directions. IJSE 93–106 (2013)
Google Scholar
Satange, D.N., Swati, S.B., Snehal, D.C.: Historical document preservation using image processing technique. IJCSMC, 247–255 (2013)
Google Scholar
Asi, A., Cohen, R., Kedem, K., El-Sana, J., Dinstein, I.: A coarse-to-fine approach for layout analysis of ancient manuscripts. In: IEEE, pp. 140–145 (2014)
Google Scholar
Chen, K., Wei, H., Hennebert, J., Ingold, R., Liwicki, M.: Page segmentation for historical handwritten document images using color and texture features. In: IEEE, pp. 488–493 (2014)
Google Scholar
Chatbri, H., Kwan, P., Kameyama, K.: An application- independent and segmentation-free approach for spotting queries in document images. In: IEEE, pp. 2891–2896 (2014)
Google Scholar
Rusinol, M., Aldavert, D., Toledo, R., Llados, J.: Efficient Segmentation-Free Keyword Spotting in Historical Document Collections, pp. 545–555. Elsevier (2015)
Google Scholar
Shen, M., Lei, H.: Improving OCR performance with background image elimination. In: IEEE, pp. 1566–1570 (2015)
Google Scholar
Asi, A., Cohen, R., Kedem, K., Sana, J.E.: Simplifying the reading of historical manuscripts. ICDAR, 826–830 (2015)
Google Scholar
Garg, R., Chaudhury, S.: Automatic selection of parameters for document image enhancement using image quality assessment. In: IEEE, pp. 422–427 (2016)
Google Scholar
Farhat, A., Zhai, X.: OCR based feature extraction and template matching algorithms for Qatari number plate. In: IEEE, pp. 978–983 (2016)
Google Scholar
Omer, B., Ali, A., Shaout, A., Elhafiz, M.: Two stage classifier for Arabic handwritten character recognition. IJARCCE, 646–650 (2016)
Google Scholar
Ugale, M.K., Joshi, M.S.: Improving optical character recognition for low resolution images. IJCSN, 145–148 (2017)
Google Scholar
Ahmad, I., Wang, X., Li, R., Ahmed, M., Ullah, R.: Line and ligature segmentation of Urdu Nastaleeq text. In: IEEE, pp. 10924–10940 (2017)
Google Scholar
Xu, S., Smith, D.: Retrieving and combining repeated passages to improve OCR. In: IEEE, pp. 978–981 (2017)
Google Scholar
Gari, A., Khaissidi, G., Mrabti, M., Yacoubi, M.E.: Skew detection and correction based on hough transform and Harris corners. In: IEEE (2017)
Google Scholar
Veena, G.S., Kumar, T.N.R., Sushma, A.: Handwritten Off-line Kannada Character/Word Recognition Using Hidden Markov Model, pp. 357–369. Springer (2018)
Google Scholar
Vishwas, H.S., Thomas, B.A., Naveena, C.: Text Line Segmentation of Unconstrained Handwritten Kannada Historical Script Documents, pp. 245–252. Springer, Berlin (2018)
Google Scholar
Ramakrishna Murty, M., Murthy, J.V.R., Prasad Reddy, P.V.G.D.: Text document classification based on a least square support vector machines with singular value decomposition. Int. J. Comput. Appl. (IJCA), 21–26 (2011)
Google Scholar
Ramakrishna Murty, M., Murthy, J.V.R., Prasad Reddy, P.V.G.D., Sapathy, S.C.: A survey of cross-domain text categorization techniques. In: Proceedings of IEEE Xplorer (2012)
Google Scholar
Singh, P.K., et al.: A comprehensive handwritten Indic script recognition system: a tree-based approach. J. Ambient Intell. Humanized Comput. 1–18 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, VTU-RRC, Belagavi, India
P. Ravi
Department of Computer Science and Engineering, SJB Institute of Technology, Bengaluru, India
C. Naveena
Department of Information Science & Engineering, Maharaja Institute of Technology, Mysuru, India
Y. H. Sharath Kumar
Department of MCA, JSS Science and Technology University, Mysuru, India
V. N. Manjunath Aradhya

Authors

P. Ravi
View author publications
You can also search for this author in PubMed Google Scholar
C. Naveena
View author publications
You can also search for this author in PubMed Google Scholar
Y. H. Sharath Kumar
View author publications
You can also search for this author in PubMed Google Scholar
V. N. Manjunath Aradhya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. Ravi .

Editor information

Editors and Affiliations

School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar, Odisha, India
Suresh Chandra Satapathy
Department of Electronics and Communication Engineering, SRMGPC, Lucknow, Uttar Pradesh, India
Vikrant Bhateja
Duy Tan University, Da Nang, Vietnam
Bao Le Nguyen
Graduate School, Duy Tan University, Da Nang, Vietnam
Nhu Gia Nguyen
Faculty of Information Technology, Hai Phong University, Hai Phong, Vietnam
Dac-Nhuong Le

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ravi, P., Naveena, C., Sharath Kumar, Y.H., Manjunath Aradhya, V.N. (2020). Text-Line Extraction from Historical Kannada Document. In: Satapathy, S., Bhateja, V., Nguyen, B., Nguyen, N., Le, DN. (eds) Frontiers in Intelligent Computing: Theory and Applications. Advances in Intelligent Systems and Computing, vol 1014. Springer, Singapore. https://doi.org/10.1007/978-981-13-9920-6_28

Download citation

DOI: https://doi.org/10.1007/978-981-13-9920-6_28
Published: 02 October 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9919-0
Online ISBN: 978-981-13-9920-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics