UHTelPCC: A Dataset for Telugu Printed Character Recognition

  • Rakesh KummariEmail author
  • Chakravarthy BhagvatiEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1037)


This paper describes how UHTelPCC, a dataset for Telugu printed character recognition, is created and its characteristics. The dataset is created from characters extracted from images of printed Telugu texts from the period 1950–1990. Thus, it is hoped that the dataset provides the basis for developing practical Telugu OCR systems. UHTelPCC is to provide a standard benchmark for comparing different algorithms for Telugu OCR and helps in research and development of Telugu OCR systems. UHTelPCC contains 70K samples of 325 classes, and these samples are divided into 50K, 10K, 10K training, validation, and test sets respectively. It is hoped that UHTelPCC serves like MNIST, a dataset for handwritten digit recognition, for Telugu printed character recognition. The baseline performances on the test set using KNN, MLP, and CNN are 98.85%, 99.52%, and 99.68% respectively. UHTelPCC is available at


Optical Character Recognition OCR Printed Telugu OCR UHTelPCC Telugu dataset OCR dataset Telugu character dataset 



We thank Amit Patel for his efforts in labeling connected components. The first author acknowledges the financial support received from the Council of Scientific and Industrial Research (CSIR), Government of India in the form of a Junior Research Fellowship.


  1. 1.
    Achanta, R., Hastie, T.: Telugu OCR framework using deep learning. arXiv preprint arXiv:1509.05962 (2015)
  2. 2.
    Balm, G.: An introduction to optical character reader considerations. Pattern Recogn. 2(3), 151–166 (1970)CrossRefGoogle Scholar
  3. 3.
    Casey, R.G., Lecolinet, E.: A survey of methods and strategies in character segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 18(7), 690–706 (1996)CrossRefGoogle Scholar
  4. 4.
    Dongre, V.J., Mankar, V.H.: Development of comprehensive Devnagari numeral and character database for offline handwritten character recognition. Appl. Comput. Intell. Soft Comput. 2012, 29 (2012)CrossRefGoogle Scholar
  5. 5.
    Fujisawa, H., Nakano, Y., Kurino, K.: Segmentation methods for character recognition: from segmentation to document structure analysis. Proc. IEEE 80(7), 1079–1092 (1992)CrossRefGoogle Scholar
  6. 6.
    Gonzalez, R.C., Woods, R.E., et al.: Digital Image Processing (2002)Google Scholar
  7. 7.
    Govindan, V., Shivaprasad, A.: Character recognition - a review. Pattern Recogn. 23(7), 671–683 (1990)CrossRefGoogle Scholar
  8. 8.
    Govindaraju, V., Setlur, S.: Guide to OCR for Indic Scripts. Springer, London (2009)Google Scholar
  9. 9.
    Guyon, I., Haralick, R.M., Hull, J.J., Phillips, I.T.: Data sets for OCR and document image understanding research. In: Handbook of Character Recognition and Document Image Analysis, pp. 779–799. World Scientific (1997)Google Scholar
  10. 10.
    Hegadi, R.S., Kamble, P.M.: Recognition of Marathi handwritten numerals using multi-layer feed-forward neural network. In: 2014 World Congress on Computing and Communication Technologies (WCCCT), pp. 21–24. IEEE (2014)Google Scholar
  11. 11.
    Impedovo, S., Ottaviano, L., Occhinegro, S.: Optical character recognition: a survey. Int. J. Pattern Recogn. Artif. Intell. 5(01n02), 1–24 (1991)CrossRefGoogle Scholar
  12. 12.
    Jayadevan, R., Kolhe, S.R., Patil, P.M., Pal, U.: Offline recognition of Devanagari script: a survey. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 41(6), 782–796 (2011)CrossRefGoogle Scholar
  13. 13.
    John, J., Pramod, K., Balakrishnan, K.: Offline handwritten Malayalam character recognition based on chain code histogram. In: 2011 International Conference on Emerging Trends in Electrical and Computer Technology (ICETECT), pp. 736–741. IEEE (2011)Google Scholar
  14. 14.
    Kamble, P.M., Hegadi, R.S.: Handwritten Marathi character recognition using r-hog feature. Procedia Comput. Sci. 45, 266–274 (2015)CrossRefGoogle Scholar
  15. 15.
    Kamble, P.M., Hegadi, R.S.: Comparative study of handwritten Marathi characters recognition based on KNN and SVM classifier. In: Santosh, K.C., Hangarge, M., Bevilacqua, V., Negi, A. (eds.) RTIP2R 2016. CCIS, vol. 709, pp. 93–101. Springer, Singapore (2017). Scholar
  16. 16.
    Kannan, R.J., Prabhakar, R.: An Improved Handwritten Tamil Character Recognition System Using Octal Graph (2008)CrossRefGoogle Scholar
  17. 17.
    Kannan, R.J., Prabhakar, R., Suresh, R.: Off-line cursive handwritten Tamil character recognition. In: International Conference on Security Technology, 2008. SECTECH 2008, pp. 159–164. IEEE (2008)Google Scholar
  18. 18.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  19. 19.
    Mantas, J.: An overview of character recognition methodologies. Pattern Recogn. 19(6), 425–430 (1986)CrossRefGoogle Scholar
  20. 20.
    Murthy, K.N.: Natural Language Processing: An Information Access Perspective. Ess Ess Publications for Sarada Ranganathan Endowment For Library Science (2006)Google Scholar
  21. 21.
    Murthy, K.N., Srinivasu, B.: Roman transliteration of Indic scripts. In: 10th International Conference on Computer Applications, University of Computer Studies, Yangon, Myanmar, 28–29 February 2012 (2012)Google Scholar
  22. 22.
    Negi, A., Bhagvati, C., Krishna, B.: An OCR system for Telugu. In: Sixth International Conference on Document Analysis and Recognition, 2001. Proceedings, pp. 1110–1114. IEEE (2001)Google Scholar
  23. 23.
    Pal, U., Chaudhuri, B.: Indian script character recognition: a survey. Pattern Recogn. 37(9), 1887–1899 (2004)CrossRefGoogle Scholar
  24. 24.
    Pal, U., Jayadevan, R., Sharma, N.: Handwriting recognition in Indian regional scripts: a survey of offline techniques. ACM Trans. Asian Lang. Inf. Process. (TALIP) 11(1), 1 (2012)CrossRefGoogle Scholar
  25. 25.
    Patel, A., Sukumar, B., Bhagvati, C.: SVM with inverse fringe as feature for improving accuracy of Telugu OCR systems. In: Sa, P.K., Sahoo, M.N., Murugappan, M., Wu, Y., Majhi, B. (eds.) Progress in Intelligent Computing Techniques: Theory, Practice, and Applications. AISC, vol. 518, pp. 253–263. Springer, Singapore (2018). Scholar
  26. 26.
    Prakash, K.C., Srikar, Y., Trishal, G., Mandal, S., Channappayya, S.S.: Optical character recognition (ocr) for telugu: Database, algorithm and application. arXiv preprint arXiv:1711.07245 (2017)
  27. 27.
    Rajasekaran, S., Deekshatulu, B.: Recognition of printed Telugu characters. Computer Graph. Image Process. 6(4), 335–360 (1977)CrossRefGoogle Scholar
  28. 28.
    Santosh, K.C.: Character recognition based on DTW-radon. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 264–268. IEEE (2011)Google Scholar
  29. 29.
    Santosh, K.C., Wendling, L.: Character recognition based on non-linear multi-projection profiles measure. Front. Comput. Sci. 9(5), 678–690 (2015)CrossRefGoogle Scholar
  30. 30.
    Singh, S.: Optical character recognition techniques: a survey. J. Emerg. Trends Comput. Inf. Sci. 4(6), 545–550 (2013)Google Scholar
  31. 31.
    Slimane, F., Ingold, R., Kanoun, S., Alimi, A.M., Hennebert, J.: A new Arabic printed text image database and evaluation protocols. In: 10th International Conference on Document Analysis and Recognition, 2009. ICDAR 2009, pp. 946–950. IEEE (2009)Google Scholar
  32. 32.
    Srinivas, B.A., Agarwal, A., Rao, C.R.: An overview of OCR research in Indian scripts. IJCSES 2(2), 141–153 (2008)Google Scholar
  33. 33.
    Trier, O.D., Jain, A.K., Taxt, T., et al.: Feature extraction methods for character recognition-a survey. Pattern Recogn. 29(4), 641–662 (1996)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.School of Computer and Information SciencesUniversity of HyderabadHyderabadIndia

Personalised recommendations