Advertisement

Age-Type Identification and Recognition of Historical Kannada Handwritten Document Images Using HOG Feature Descriptors

  • Parashuram Bannigidad
  • Chandrashekar GudadaEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 810)

Abstract

Most of the historical Kannada handwritten documents are preserved in the manuscript preservation centre and archaeological departments. The historical Kannada handwritten documents are generally degraded in nature, due to this degradation, the documents are impossible to read and understand the contents. Hence, it is very much essential to restore by digitizing the historical Kannada handwritten documents and also recognize the originality of the dynasty to which it belongs. The main objective of the research work is to reconstruct, digitize and recognize the historical Kannada handwritten document images by applying image enhancement techniques and obtain the HOG feature descriptors using K-nearest neighbour (K-NN) and SVM classifiers. In this paper, we have considered historical Kannada handwritten document images of different dynasties based on their age-type; Vijayanagara dynasty (1460 AD), Mysore Wadiyar dynasty (1936 AD), Vijayanagara dynasty (1400 AD) and Hoysala dynasty (1340 AD) for experimentation. The average classification accuracy for different dynasties: in case of K-NN classifier is 92.3% and SVM classifier is 96.7%, It is observed that the SVM classifier has got a good classification performance comparatively K-NN classifier for Historical Kannada handwritten document images. The experimental outcomes are tested with manual results and other methods in the literature, which show the thoroughness of the proposed technique.

Keywords

Restoration Segmentation Kannada K-NN SVM Recognition HOG Handwritten script Historical documents 

1 Introduction

Kannada is the most prominently spoken and is the official language in the state of Karnataka in India, Most of the Indian dynasties were ruled in this area, and Kannada scripts are the oldest among 20 languages of the Dravidian family [1]. Kannada script is advanced over a time of 1500 years and there a lot of variations in the scripts used by different dynasties. The development of Kannada character has experienced numerous evolutions and changes amidst its use during the time of different dynasties [2]. The Kadamba script was the first dynastic rule of Karnataka in the fifth Century AD. The Adi Ganga script was recognized during Centuries between fourth AD and sixth AD. It has a resemblance to the Kadamba script. The script used by the Badami Chalukya is called Badami Chalukya script and it can be seen in the records of sixth—seventh Century AD. The Rastrakuta script recognized during eighteenth Century AD. The Kalyana Chalukya script was recognized between tenth and twelfth Century AD. The Vijayanagara script was used between fourteenth and sixteenth Century AD. The Mysore Wadiyar is the last dynasty, which ruled in 1eighteenthnineteenth Century AD, in this period, the Vijayanagara scripts were also evolved and it is almost similar to the Aadhunika Kannada script, which is also similar in nature to the present day Kannada handwritten script [3].

The restoration and recognition of degraded historical Kannada handwritten document images involve restoration, digitization, preservation, knowledge extraction, etc. In this paper, the degraded historical Kannada handwritten document images of Vijayanagara dynasty (1460 AD), Mysore Wadiyar dynasty (1936 AD), Vijayanagara dynasty (1400 AD) and Hoysala dynasty (1340 AD) written on a paper (Hasataprati) are considered, which are shown in Fig. 1. Many Government organizations; National Mission for Manuscript, New Delhi, and Archaeological Survey of India, manuscript preservation centres, universities, etc., are working on digitization process and preservation of degraded historical Kannada handwritten documents.
Fig. 1

Sample historical Kannada handwritten documents from different dynasties a Vijayanagara dynasty (1460 AD) b Mysore Wadiyar dynasty (1936 AD) c Vijayanagara dynasty (1400 AD) d Hoysala dynasty (1340 AD)

Kannada language inscriptions are categorized based on their style of writing as Pre-old Kannada (Poorva Halagannada), Old-Kannada (Halagannada), Nadugannada and Aadhunika Kannada. The evolution of sample historical Kannada handwritten scripts of different dynasties is shown in Fig. 2.
Fig. 2

The evolution of sample historical Kannada handwritten scripts of different dynasties

Very few researchers have contributed to this area in the literature. The recognition of holistic word from handwritten historical documents has been proposed by Lavrenko et al. [4]. The recognition of handwritten text form historical documents in the tranScriptorium project was investigated by Sanchez et al. [5]. A character recognition using geometry-based features was presented by Dileep et al. [6]. Romero et al. [7] proposed a handwritten text recognition for historical document. The recognition of words in historical, classical Mongolian document was investigated by Gao et al. [8]. The classification of ancient Kannada scripts using SVM and K-NN was carried out by Soumya et al. [9]. Prediction of the era of historical scripts using Curvelet transform-based approach was done by Gangamma et al. [10]. Predication of the era of a historical script using SVM classifier was carried out by Soumya et al. [11]. Gabor-and Zonal-based feature approach for recognition of historical documents was carried out by Soumya et al. [12]. Stone-inscripted Kannada character matching using SIFTS has been investigated by Mohana et al. [13]. The offline handwriting text recognition was proposed by Verma et al. [14]. Parashuram and Chandrashekar [15, 16] have proposed an image enhancement method for degraded historical Kannada handwritten document images and ensure the quality of historic Kannada handwritten document images based on the performance evaluation approaches, i.e. Precision, Recall, F-Measure, MSE and PSNR.

2 Proposed Method

The goal of the present investigation is to reconstruct, digitize and recognize the historical Kannada handwritten document images by applying image enhancement techniques and obtain the HOG feature descriptors using K-nearest neighbour (K-NN) and SVM classifiers. The purpose of the recognition is to identify the script of the dynasties, whether it belongs to Hoysala dynasty or Vijayanagara dynasty or Mysore Wadiyar dynasty.

2.1 Data Collection

The standard datasets of historical Kannada handwritten documents are rarely found in the literature. The historical Kannada handwritten documents are collected individually by visiting many institutions and universities, like; Department of P. G. Studies and Research in Kannada, Gulbarga University, Kalaburgi and Dept. of Hasataprati, Kannada University, Hampi. These Kannada handwritten documents are captured through Canon 1300D, 18 megapixels DSLR Camera at 5184 × 3456 resolutions in the JPEG format and store them as soft copies (own datasets) which contains a total of 1200 images of historical handwritten Kannada documents.

2.2 Preprocessing

It is more common that, the historical documents contain smear, uneven background illumination, and spot due to age or marks resulting from the ink that goes through the inscription (paper), generally called bleed-through. Apart from this, the style of the writer varies with inscription to inscription, which leads to the confusion and complexity to recognize the historical document. The preprocessing steps which include image enhancement and restoration will play an important role, not only to enhance the quality of the image, but also removes the unwanted objects, debris, noise, etc.

In the earlier papers, we have proposed a new image enhancement technique for degraded historical Kannada handwritten document images and are described in [15, 16]. The detailed steps of the algorithm are described in Fig. 3. The detailed approach of the proposed method is discussed in the form of the algorithm as given below.
Fig. 3

The detailed approach of the proposed algorithm

Algorithm: Age identification and classification of historical Kannada handwritten document images.
  1. Step 1.

    Input the camera-captured colour image.

     
  2. Step 2.

    Apply image enhancement techniques; namely the combination of Local Otsu and Global Otsu to restore and binarize the given input image.

     
  3. Step 3.

    Apply text block-wise segmentation method on Step 2 of size 512 × 512 and store them individually.

     
  4. Step 4.

    Repeat the Steps from 1–3 for all the Kannada handwritten document images of different age-types: namely Hoysala, Vijayanagara and Mysore dynasties

     
  5. Step 5.

    Extract HOG feature descriptors from Step 4 and store them as a knowledge base

     
  6. Step 6.

    Apply classification techniques; i.e. K-nearest neighbour classifier [17] and SVM [18] classifier to classify and recognize the historical Kannada handwritten document images, whether they belong to the Hoysala dynasty or Vijayanagara dynasty or Mysore dynasty.

     

2.3 Feature Selection and Extraction

The selection of features for document image is an important task, since the Kannada handwritten document image contains different shapes and styles. In this paper, we have used HOG feature descriptors [19]; and these are extracted based on their scale invariant and rotation invariant for recognition of the Kannada handwritten documents.

3 Experimental Results and Discussion

For the of purpose experimentation, we have considered 1200 image datasets from different dynasties, namely; Vijayanagara, Hoysala, and Mysore Wadiyar are described in Sect. 2.1.

The implementation is done on Intel Core i5, @ 2.40 GHz system using MATLAB R2015b. The original Historical Kannada handwritten camera captured document image of Vijayanagara dynasty (1460 AD) (Fig. 4a), converts the camera captured colour RGB image into greyscale image and considers only green channel and the combination of local otsu and global otsu is applied to reconstruct the individual character and output the enhanced and binarized image (Fig. 4b), apply the block segmentation method on Fig. 4b to obtain individual text block of size 512 × 512 (Fig. 4c). Extract the HOG feature descriptors on Fig. 4c for all the images and store them as a knowledge base. Finally, apply classification techniques; i.e. K-NN classifier and SVM classifier for classification and recognition. The average classification accuracy of the proposed method with K-NN classifier and SVM classifier is given in Table 1. The comparative performance of the proposed method in the literature is given in Table 2.
Fig. 4

Sample images of the proposed algorithm a Original camera captured historical Kannada handwritten document image of Vijayanagara dynasty (1460 AD) b Binarized image after restoration c Text block segmentation of size 512 × 512 on (b)

Table 1

The classification accuracy of the proposed method with K-NN and SVM classifier

Age-types (Dynasties)

Classification accuracy of the proposed method

K-NN classifier

SVM classifier

Recognition rate (%)

Error rate (%)

Recognition rate (%)

Error rate (%)

Vijayanagara (1460 AD)

98

2

98

2

Mysore Wadiyar (1936 AD)

86

14

96

4

Vijayanagara (1400 AD)

88

12

96

4

Hoysala (1340 AD)

98

2

97

3

Average accuracy

92.3

96.7

Table 2

The comparison performance of the proposed method in the literature

Author

Method

Dataset size

Accuracy (%)

Karthik et al. [20]

Adaptive window sizing and histogram of oriented gradient

200

94

Alaci et al. [21]

Potential piece-wise separation line technique

204

94.98

Belagali et al. [22]

Zoning-based invariant moment feature

980

94.69

Proposed method

Histogram-oriented gradient feature descriptors

1200

96.70

The classification accuracy for different dynasties: the K-NN classifier has got 92.3% and SVM classifier has got 96.7%. Based on the experimentation [23, 24], it is observed that the SVM classifier has got a good classification performance comparatively K-NN classifier for historical Kannada handwritten document images. The results of the confusion matrix of K-NN classifier and its performance accuracy towards historical Kannada handwritten document images is given in Table 3. The result of the confusion matrix of SVM classifier which, is shown in Table 4 indicates better recognition rates towards historical Kannada handwritten document images.
Table 3

Confusion matrix for K-NN classifier

Age-types (Dynasties)

Vijayanagara (1460 AD)

Mysore Wadiyar (1936 AD)

Vijayanagara (1400 AD)

Hoysala (1340 AD)

Unknown

Total

Vijayanagara (1460 AD)

294

1

5

300

Mysore Wadiyar (1936 AD)

4

257

1

38

300

Vijayanagara (1400 AD)

9

264

27

300

Hoysala (1340 AD)

5

2

293

300

Table 4

Confusion matrix for SVM classifier

Age-types (Dynasties)

Vijayanagara

(1460 AD)

Mysore Wadiyar

(1936 AD)

Vijayanagara

(1400 AD)

Hoysala

(1340 AD)

Unknown

Total

Vijayanagara (1460 AD)

294

4

2

300

Mysore Wadiyar (1936 AD)

2

288

1

9

300

Vijayanagara (1400 AD)

7

287

6

300

Hoysala (1340 AD)

3

1

5

291

300

4 Conclusion

In this paper, we have proposed an algorithm to restore and recognize handwritten documents from the historical Kannada handwritten documents by extracting HOG feature descriptors and accuracy is measured by using K-NN and SVM classifiers. For the purpose of experimentation, we have considered historical Kannada handwritten document images of different dynasties based on their age-type; i.e. Vijayanagara dynasty (1460 AD), Mysore Wadiyar dynasty (1936 AD), Vijayanagara dynasty (1400 AD) and Hoysala dynasty (1340 AD). The average classification accuracy for different dynasties: the K-NN classifier has got 92.3% and SVM classifier has got 96.7%, Based on the experimentation, it is observed that the SVM classifier has got a good classification performance comparatively K-NN classifier for historical Kannada handwritten document images. The experimental outcomes are tested with manual results and other methods in the literature, which shows the thoroughness of the proposed technique.

Notes

Acknowledgements

The authors are indebted to The Chairman, Department of P. G. Studies and Research in Kannada, Gulbarga University, Kalaburgi and Dept. of Hasataprati, Kannada University, Hampi for providing the historical Kannada handwritten documents and perception of manual outcomes.

References

  1. 1.
    Manjunath, M.G., Devarajaswamy, G.K., Vikasa, K.L.: Jagadhguru Sri Madhvacharya Trust, Sri Raghavendra Swami Matta, MantralayaGoogle Scholar
  2. 2.
    Narasimha Murthy, A.V.: Kannada Lipiya Ugama Mattu Vikasa, Kannada Adhyayana Samsthe, Mysore University, Mysore (1968)Google Scholar
  3. 3.
    Reddy, D.: Lipiya Huttu Mattu Belavanige—Origin and Evolution of Script, Kannada Pustaka Pradhikara (Kannada Book Authority), BangaloreGoogle Scholar
  4. 4.
    Lavrenko, V., Rath, T.M., Manmath, R.: Holistic word recognition for handwritten documents. In: Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL’04), pp. 278–287 (2004)Google Scholar
  5. 5.
    Sanchez, J.A., Bosch, V., Romero, V., Depuydt, K., de Does, J.: A Handwritten text recognition for historical documents in the tran Scriptorium Project. In: Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage, DATeCH ‘14, pp. 111–117 (2014)Google Scholar
  6. 6.
    Dileep, D.: A feature extraction technique based on character geometry for character recognition. Comput. Res. Repos. J. 1202(3884), 1–4 (2012)Google Scholar
  7. 7.
    Romero, V., Serrano, N., Toselli, A.H., Sanchez, J.A., Vidal, E.: Handwritten text recognition for historical documents. In: Proceedings of Langauage Technologies for Digital heritage and Cultural Heritage Workshop, pp. 90–96Google Scholar
  8. 8.
    Gao, G., Su, X., Wei, H., Gong, Y.: Classical Mongolian words recognition in historical document. IEEE ICDAR 1520–5363(11), 692–697 (2011)Google Scholar
  9. 9.
    Soumya, A., Hemantha Kumar, G.: Performance analysis of random forests with SVM and KNN in classification of ancient Kannada scripts. Int. J. Comput. Technol. 13(9), pp. 4907–4921 (2014)Google Scholar
  10. 10.
    Gangamma, B., Murthy, K.S., Punitha, P.: Curvelet transform based approach for prediction of epigraphical scripts era. In: IEEE International Conference on Computational Intelligence and Computing Research, pp. 1–6 (2012) 978-4673-1344/12Google Scholar
  11. 11.
    Soumya, A., Hemantha Kumar, G.H.: SVM classifier for the predication of era of an epigraphical script. IJP2P 2(2), 12–22 (2011)CrossRefGoogle Scholar
  12. 12.
    Soumya, A., Hemantha Kumar, G.: Recognition of historical records using Gabor and zonal features. SIPIJ 6(4) 57–69 (2015)Google Scholar
  13. 13.
    Mohana, H.S., Navya, K., Srikanth, P.C., Shivakumar, G.: Stone inscripted Kannada Character matching Using SIFT. In: Proceedings of the IRF International Conference, pp. 126–131 (2014)Google Scholar
  14. 14.
    Verma, B., Blumenstein, M., Kulkarni, S.: Recent achievements in offline handwriting recognition system. In: International Conference on Computational Intelligence and Multimedia Applications (1998)Google Scholar
  15. 15.
    Bannigidad, P., Gudada, C.: Restoration of degraded historical Kannada handwritten document images using image enhancement techniques. In: International Conference on Soft Computing and Pattern Recognition (SoCPaR 2016), pp. 498–508. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-60618-7_49Google Scholar
  16. 16.
    Bannigidad, P., Gudada, C.: Restoration of degraded Kannada handwritten paper inscriptions (Hastaprati) using image enhancement techniques. In: IEEE International Conference on Computer Communication and Informatics (ICCCI-2017), pp. 1–6 (2017)  https://doi.org/10.1109/iccci.2017.8117697
  17. 17.
  18. 18.
  19. 19.
  20. 20.
    Karthik, S., Srikanta Murthy, K.: Segmentation and recognition of handwritten kannada text using relevance feedback and histogram of oriented gradients—a novel approach. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 7(1), 472–476 (2016)Google Scholar
  21. 21.
    Alaei, A., Nagabhushan, P., Pal, U.: A benchmark Kannada handwritten document dataset and its segmentation. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 141–145 (2011)Google Scholar
  22. 22.
    Belagali, Netravati, Angadi, Shanmukhappa A.: OCR for handwritten Kannada language script. Int. J. Recent Trends Eng. Res. (IJRTER) 02(08), 190–197 (2016)Google Scholar
  23. 23.
    Sridharamurthy, S.K., Sudarshana Reddy, H.R.: PCA based feature for handwritten Kannada characters recognition. In: International Conference on Emerging Research in Electronics, Computer Science and Technology (ICERECT) (2015).  https://doi.org/10.1109/erect.2015.7499053
  24. 24.
    Manjunath Aradhya, V.N., Hemantha Kumar, G., Noushath, S.: Multilingual OCR system for South Indian scripts and English documents: an approach based on Fourier transform and principal component analysis. In: Engineering Applications of Artificial Intelligence Elsevier, vol. 21, no. 4, pp. 658–668 (2008)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Rani Channamma UniversityBelagaviIndia

Personalised recommendations