Advertisement

Sādhanā

, 44:141 | Cite as

Devanagari ancient documents recognition using statistical feature extraction techniques

  • Sonika Narang
  • M K Jindal
  • Munish KumarEmail author
Article
  • 15 Downloads

Abstract

Devanagari ancient document recognition process is drawing a lot of consideration from researchers nowadays. These ancient documents contain a wealth of knowledge. However, these documents are not available to all because of their fragile condition. A Devanagari ancient manuscript recognition system is designed for digital archiving. This system includes image binarization, character segmentation and recognition phases. It incorporates automatic recognition of scanned and segmented characters. Segmented characters may include basic characters (vowels and consonants), modifiers (matras) and various compound characters (characters formed by joining more than one basic characters). In this paper, handwritten Devanagari ancient manuscripts recognition system has been presented using statistical features extraction techniques. In feature extraction phase, intersection points, open endpoints, centroid, horizontal peak extent and vertical peak extent features are extracted. For classification, Convolutional Neural Network, Neural Network, Multilayer Perceptron, RBF-SVM and random forest techniques are considered in this work. Various feature extraction and classification techniques are considered and compared to the recognition of basic characters segmented from Devanagari ancient manuscripts. A data set, of 6152 pre-segmented samples of Devanagari ancient documents, is considered for experimental work. Authors have achieved 88.95% recognition accuracy using a combination of all features and a combination of all classifiers considered in this work by a simple majority voting scheme.

Keywords

Ancient manuscripts Devanagari historical documents feature extraction classification 

References

  1. 1.
    Shah K R and Badgujar D D 2013 Devnagari handwritten character recognition (DHCR) for ancient documents: a review. In: Proceedings of the 2013 IEEE Conference on Information and Communication Technology. 656–660Google Scholar
  2. 2.
    Sarkar R, Malakar S, Das N, Basu S and Nasipuri M 2010 A script independent technique for extraction of characters from handwritten word images. Int. J. Comput. Appl. 1(23): 83–88CrossRefGoogle Scholar
  3. 3.
    Kleber F, Sablatnig R, Gau M and Miklas H 2008 Ancient document analysis based on text line extraction. In: Proceedings of the 19th International Conference on Pattern Recognition. 1–4Google Scholar
  4. 4.
    Bansal V and Sinha R M K 2001 A complete OCR for printed Hindi text in Devanagari script. In: Proceedings of the 6th International Conference on Document Analysis and Recognition. 800–804Google Scholar
  5. 5.
    Kim M S, Jang M D, Choi H L, Rhee T H, Kim J H and Kwag H K 2004 Digitalizing scheme of handwritten Hanja historical documents. In: Proceedings of the First International Workshop on Document Image Analysis for Libraries. 321–327Google Scholar
  6. 6.
    Sousa J M C, Pinto J R C, Ribeiro C S and Gil J M 2005 Ancient document recognition using fuzzy methods. In: Proceedings of the IEEE International Conference on Fuzzy Systems. 833–836Google Scholar
  7. 7.
    Cecotti H and Belaid A 2005 Hybrid OCR combination approach complemented by a specialized ICR applied on ancient documents. In: Proceedings of the 8th International Conference on Document Analysis and Recognition. 1045–1049Google Scholar
  8. 8.
    Diem M and Sablatnig R 2009 Recognition of degraded handwritten characters using local features. In: Proceedings of the 10th International Conference on Document Analysis and Recognition. 221–225Google Scholar
  9. 9.
    Raghuraj S, Yadav C S, Verma P and Yadav V 2010 Optical Character Recognition (OCR) for printed Devanagari script using artificial neural network. Int. J.Computer Science & Communication 1(1): 91–95Google Scholar
  10. 10.
    Holambe A N, Thool R C and Jagade S M 2011 A brief review and survey of feature extraction methods for Devnagari OCR. In: Proceedings of the 9th International Conference on ICT and Knowledge Engineering. 99–104Google Scholar
  11. 11.
    Yadav D, Sánchez-Cuadrado S and Morato J 2013 OCR for Hindi language using a neural network approach. J. Inf. Process. Syst. 9(1): 117–140CrossRefGoogle Scholar
  12. 12.
    Yunxue S Y, Wang C and Xiao B 2015 A character image restoration method for unconstrained handwritten Chinese character recognition. Int. J. Doc. Anal. Recognit. 18(1): 73–86CrossRefGoogle Scholar
  13. 13.
    Katiyar G and Mehfuz S 2016 A hybrid recognition system for off-line handwritten characters. SpringerPlus 5: 1–18CrossRefGoogle Scholar
  14. 14.
    Belhe S, Paulzagade C, Deshmukh A, Jetley S and Mehrotra K 2012 Hindi handwritten word recognition using HMM and symbol tree. In: Proceedings of the Workshop on Document Analysis and Recognition (DAR). 9–14Google Scholar
  15. 15.
    Lehal G S and Singh C 1999 Feature extraction and classification for OCR of Gurmukhi script. Vivek 12(2): 2–12Google Scholar
  16. 16.
    Kumar M, Sharma R K and Jindal M K 2013 A novel feature extraction technique for offline handwritten Gurmukhi character recognition. IETE J. Res. 59(6): 687–692CrossRefGoogle Scholar
  17. 17.
    Kumar M, Sharma R K and Jindal M K 2014 Efficient feature extraction techniques for offline handwritten Gurmukhi character recognition. Natl. Acad. Sci. Lett. 37(4): 381–391CrossRefGoogle Scholar
  18. 18.
    Kumar M, Sharma R K and Jindal M K 2018 Character and numeral recognition for non-Indic and Indic scripts: a survey. Artif. Intell. Rev.  https://doi.org/10.1007/s10462-017-9607-x
  19. 19.
    Kumar M, Jindal M K and Sharma R K 2012 Offline handwritten Gurmukhi character recognition: study of different features and classifiers combinations. In: Proceedings of the International Workshop on Document Analysis and Recognition, IIT Bombay. 94–99Google Scholar
  20. 20.
    Elleuch M, Maalej R and Kherallah M 2016 A new design based-SVM of the CNN classifier architecture with dropout for offline Arabic handwritten recognition. Procedia Computer Science 80: 1712–1723CrossRefGoogle Scholar
  21. 21.
    Lecun Y, Bottou L, Bengio Y and Haffner P 1998 Gradient-based learning applied to document recognition. Proc. IEEE 86(11): 2278–2324CrossRefGoogle Scholar
  22. 22.
    Niu X Y, Xia L Y, Wang T X and Zhang X Y 2010 Application of BP-ANN and LS-SVM to discrimination of rice origin based on trace metals. Proc. Int. Conf. Mach. Learn. Cybern. 3: 1426–1430Google Scholar
  23. 23.
    Zhang Y, Liu B and Yang F 2016 Differential evolution based selective ensemble of extreme learning machine. In: IEEE Trustcom/Bgdatase/ispa. 1327–1333Google Scholar
  24. 24.
    Kumar M, Sharma R K and Jindal M K 2014 A novel hierarchical technique for offline handwritten Gurmukhi character recognition. Natl. Acad. Sci. Lett. 37(6): 567–572CrossRefGoogle Scholar

Copyright information

© Indian Academy of Sciences 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceDAV CollegeAboharIndia
  2. 2.Department of Computer Science and ApplicationsPanjab University Regional CentreMuktsarIndia
  3. 3.Department of Computational SciencesMaharaja Ranjit Singh Punjab Technical UniversityBathindaIndia

Personalised recommendations