Abstract
In this work, we propose identification of text line from a historical Kannada document. The proposed method consists of three stages: initially, preprocess the image by using Sauvola’s method and then apply the connected component and projection profile method to detect the text position of the text line. Finally, each text line is segmented based on projection points. The propose method is evaluated on Kannada historical document. Experimentation is carried out on the seventeen Kannada historical documents, in which the total number of lines together in all documents is 217 lines. We have tried few trail-and-error methods to identify the lines in the historical document image. Using the first method, we have detected 140 lines, but multiple lines were seen between each text line; the accuracy using this method was 64.51%. In the second method, we could detect 107 and the accuracy achieved was 49.30%. By using the third method, we could clearly detect 178, with reduced number of lines in between the text lines, and the accuracy in this case is 82.02%. Hence, we can conclude that using the third method most of the lines were precisely detected and obtained encouraging the result.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sulem, L.L., Zahour, A., Taconet, B.: Text line segmentation of historical documents a survey. IJDAR, 123–138 (2007)
Roy, P.P., Rayar, F., Ramel, J.-Y.: An efficient coarse-to-fine indexing technique for fast text retrieval in historical documents. In: IEEE, pp. 150–154 (2012)
Keyvanpour, M., Tavoli, R.: Document image retrieval algorithms, analysis and promising directions. IJSE 93–106 (2013)
Satange, D.N., Swati, S.B., Snehal, D.C.: Historical document preservation using image processing technique. IJCSMC, 247–255 (2013)
Asi, A., Cohen, R., Kedem, K., El-Sana, J., Dinstein, I.: A coarse-to-fine approach for layout analysis of ancient manuscripts. In: IEEE, pp. 140–145 (2014)
Chen, K., Wei, H., Hennebert, J., Ingold, R., Liwicki, M.: Page segmentation for historical handwritten document images using color and texture features. In: IEEE, pp. 488–493 (2014)
Chatbri, H., Kwan, P., Kameyama, K.: An application- independent and segmentation-free approach for spotting queries in document images. In: IEEE, pp. 2891–2896 (2014)
Rusinol, M., Aldavert, D., Toledo, R., Llados, J.: Efficient Segmentation-Free Keyword Spotting in Historical Document Collections, pp. 545–555. Elsevier (2015)
Shen, M., Lei, H.: Improving OCR performance with background image elimination. In: IEEE, pp. 1566–1570 (2015)
Asi, A., Cohen, R., Kedem, K., Sana, J.E.: Simplifying the reading of historical manuscripts. ICDAR, 826–830 (2015)
Garg, R., Chaudhury, S.: Automatic selection of parameters for document image enhancement using image quality assessment. In: IEEE, pp. 422–427 (2016)
Farhat, A., Zhai, X.: OCR based feature extraction and template matching algorithms for Qatari number plate. In: IEEE, pp. 978–983 (2016)
Omer, B., Ali, A., Shaout, A., Elhafiz, M.: Two stage classifier for Arabic handwritten character recognition. IJARCCE, 646–650 (2016)
Ugale, M.K., Joshi, M.S.: Improving optical character recognition for low resolution images. IJCSN, 145–148 (2017)
Ahmad, I., Wang, X., Li, R., Ahmed, M., Ullah, R.: Line and ligature segmentation of Urdu Nastaleeq text. In: IEEE, pp. 10924–10940 (2017)
Xu, S., Smith, D.: Retrieving and combining repeated passages to improve OCR. In: IEEE, pp. 978–981 (2017)
Gari, A., Khaissidi, G., Mrabti, M., Yacoubi, M.E.: Skew detection and correction based on hough transform and Harris corners. In: IEEE (2017)
Veena, G.S., Kumar, T.N.R., Sushma, A.: Handwritten Off-line Kannada Character/Word Recognition Using Hidden Markov Model, pp. 357–369. Springer (2018)
Vishwas, H.S., Thomas, B.A., Naveena, C.: Text Line Segmentation of Unconstrained Handwritten Kannada Historical Script Documents, pp. 245–252. Springer, Berlin (2018)
Ramakrishna Murty, M., Murthy, J.V.R., Prasad Reddy, P.V.G.D.: Text document classification based on a least square support vector machines with singular value decomposition. Int. J. Comput. Appl. (IJCA), 21–26 (2011)
Ramakrishna Murty, M., Murthy, J.V.R., Prasad Reddy, P.V.G.D., Sapathy, S.C.: A survey of cross-domain text categorization techniques. In: Proceedings of IEEE Xplorer (2012)
Singh, P.K., et al.: A comprehensive handwritten Indic script recognition system: a tree-based approach. J. Ambient Intell. Humanized Comput. 1–18 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ravi, P., Naveena, C., Sharath Kumar, Y.H., Manjunath Aradhya, V.N. (2020). Text-Line Extraction from Historical Kannada Document. In: Satapathy, S., Bhateja, V., Nguyen, B., Nguyen, N., Le, DN. (eds) Frontiers in Intelligent Computing: Theory and Applications. Advances in Intelligent Systems and Computing, vol 1014. Springer, Singapore. https://doi.org/10.1007/978-981-13-9920-6_28
Download citation
DOI: https://doi.org/10.1007/978-981-13-9920-6_28
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9919-0
Online ISBN: 978-981-13-9920-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)