A font and size-independent OCR system for printed Kannada documents using support vector machines
This paper describes an OCR system for printed text documents in Kannada, a South Indian language. The input to the system would be the scanned image of a page of text and the output is a machine editable file compatible with most typesetting software. The system first extracts words from the document image and then segments the words into sub-character level pieces. The segmentation algorithm is motivated by the structure of the script. We propose a novel set of features for the recognition problem which are computationally simple to extract. The final recognition is achieved by employing a number of 2-class classifiers based on the Support Vector Machine (SVM) method. The recognition is independent of the font and size of the printed text and the system is seen to deliver reasonable performance.
KeywordsOCR pattern recognition support vector machines Kannada script
Unable to display preview. Download preview PDF.
- Antani S, Agnihotri L 1999 Gujarathi character recognition. InProc. Fifth Int. Conf. on Document Analysis and Recognition, Bangalore (IEEE Computer Society Press) pp 418–421Google Scholar
- Ashwin T V 2000A font and size independent OCR for printed Kannada using SVM. M E Project Report, Dept. Electrical Engg., Indian Institute of Science, BangaloreGoogle Scholar
- Bansal V, Sinha R M K 1999 On how to describe shapes of Devanagari characters and use them for recognition. InProc. Fifth Int. Conf. on Document Analysis and Recognition, Bangalore (IEEE Computer Society Press) pp 410–13Google Scholar
- Burges C 1988 A tutorial on support vector machines for pattern recognition.Data Mining Knowledge Discovery 2: 121–167, available athttp://svm.research.bell-labs.com/papers/tutoriaL web -page.ps.gz.CrossRefGoogle Scholar
- Choudhury B B, Pal U 1997 An OCR system to read two Indian language scripts: Bangla and Devanagari. InProc. Fourth Int. Conf. on Document Analysis and Recognition (IEEE Computer Society Press) pp 1011–1015Google Scholar
- Jagadeesh G S Gopinath V 2000 Kantex, a transliteration package for Kannada available at http://langmuir.eecs.berkeley.edur venkates/kantex_l.00.html).Google Scholar
- Joachims T 1999a Making large-scale support vector machine learning practical. InAdvances in kernel methods -support vector learning (eds) B Scholkopf, C J C Burges, A Smola (Cambridge, MA: MIT Press) available athttp://www-ai.cs.uni-dortmund.de/DOKUMENTE/joachims_99a.ps.gz Google Scholar
- Joachims T 1999bSVMlight. http://www-ai.informatik.uni-dortmund.de/FORSCHUNG/VER-FAHREN/SVM_LIGHT/svm_light.eng.htmlGoogle Scholar
- O’Gorman L, Kasturi R 1995Document image analysis (IEEE Computer Society Press)Google Scholar