, Volume 27, Issue 1, pp 35–58 | Cite as

A font and size-independent OCR system for printed Kannada documents using support vector machines

  • T. V. Ashwin
  • P. S. Sastry


This paper describes an OCR system for printed text documents in Kannada, a South Indian language. The input to the system would be the scanned image of a page of text and the output is a machine editable file compatible with most typesetting software. The system first extracts words from the document image and then segments the words into sub-character level pieces. The segmentation algorithm is motivated by the structure of the script. We propose a novel set of features for the recognition problem which are computationally simple to extract. The final recognition is achieved by employing a number of 2-class classifiers based on the Support Vector Machine (SVM) method. The recognition is independent of the font and size of the printed text and the system is seen to deliver reasonable performance.


OCR pattern recognition support vector machines Kannada script 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Antani S, Agnihotri L 1999 Gujarathi character recognition. InProc. Fifth Int. Conf. on Document Analysis and Recognition, Bangalore (IEEE Computer Society Press) pp 418–421Google Scholar
  2. Ashwin T V 2000A font and size independent OCR for printed Kannada using SVM. M E Project Report, Dept. Electrical Engg., Indian Institute of Science, BangaloreGoogle Scholar
  3. Bansal V, Sinha R M K 1999 On how to describe shapes of Devanagari characters and use them for recognition. InProc. Fifth Int. Conf. on Document Analysis and Recognition, Bangalore (IEEE Computer Society Press) pp 410–13Google Scholar
  4. Bosker M 1992 Omnidocument technologies.Proc. IEEE 80: 1066–1078CrossRefGoogle Scholar
  5. Burges C 1988 A tutorial on support vector machines for pattern recognition.Data Mining Knowledge Discovery 2: 121–167, available at web Scholar
  6. Choudhury B B, Pal U 1997 An OCR system to read two Indian language scripts: Bangla and Devanagari. InProc. Fourth Int. Conf. on Document Analysis and Recognition (IEEE Computer Society Press) pp 1011–1015Google Scholar
  7. Jagadeesh G S Gopinath V 2000 Kantex, a transliteration package for Kannada available at http://langmuir.eecs.berkeley.edur venkates/kantex_l.00.html).Google Scholar
  8. Joachims T 1999a Making large-scale support vector machine learning practical. InAdvances in kernel methods -support vector learning (eds) B Scholkopf, C J C Burges, A Smola (Cambridge, MA: MIT Press) available at Google Scholar
  9. Joachims T 1999bSVMlight. Scholar
  10. Keerthi S S, Shevade S K, Bhattacharyya C, Murthy K R K 2000 A fast iterative nearest point algorithm for support vector machine classifier design.IEEE Trans. Neural Networks 11: 124–136CrossRefGoogle Scholar
  11. Lee H J, Chen B 1992 Recognition of handwritten Chinese characters via short line segments.Pattern Recogn. 25: 543–552CrossRefGoogle Scholar
  12. Lu S W, Ren Y, Suen C Y 1991 Hierarchical attributed graph representation and recognition of handwritten Chinese characters.Pattern Recogn. 24: 617–632CrossRefGoogle Scholar
  13. Mangasarian O L, Musicant D R 1999 Successive overrelaxation for support vector machines.IEEE Trans. Neural Networks 10: 1032–1037CrossRefGoogle Scholar
  14. O’Gorman L, Kasturi R 1995Document image analysis (IEEE Computer Society Press)Google Scholar
  15. Pavlidis T 1986 A vectorizer and feature extractor for document recognition.Comput. Vision Graphics Image Process. 35: 111–127CrossRefGoogle Scholar
  16. Platt J C 1999 Sequential minimal optimisation: A fast algorithm for training support vector machines. InAdvances in kernel methods -support vector learning (eds) B Scholkopf, C J C Burges, A Smola (Cambridge, MA: MIT Press) available at∼jplatt Google Scholar
  17. Sekita I, Toraichi K, Mori R 1988 Feature extraction of hand written Japanese characters using spline functions and relaxation matching.Pattern Recogn. 21: 821–828CrossRefGoogle Scholar
  18. Sinha R M K, Mahabala H 1979 Machine recognition of Devanagari script.IEEE Trans. Syst., Man Cybern. 9: 435–149MATHCrossRefMathSciNetGoogle Scholar
  19. Trier O D, Jain A K, Taxt T 1996 Feature extraction methods for character recognition -a survey.Pattern Recogn. 29: 641–662CrossRefGoogle Scholar
  20. Vapnik V N 1995The nature of statistical learning theory (New York: Springer-Verlag)MATHGoogle Scholar
  21. Vapnik V N 1999 An overview of statistical learning theory.IEEE Trans. Neural Networks 10: 988–999CrossRefGoogle Scholar

Copyright information

© Indian Academy of Sciences 2002

Authors and Affiliations

  • T. V. Ashwin
    • 1
  • P. S. Sastry
    • 1
  1. 1.Department of Electrical EngineeringIndian Institute of ScienceBangaloreIndia

Personalised recommendations