A Robust OCR for Degraded Documents

  • Kapil Dev Dhingra
  • Sudip Sanyal
  • Pramod Kumar Sharma
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 4)

In the last two decades, many advances have been made in the field of document image analysis and recognition. In the recent past, several methods for recognizing Latin, Chinese, Japanese, and Arabic scripts have been proposed [7–9]. Until now, most of the OCR work has concentrated on high quality images and great success has been achieved by character recognition systems. Apart from these successes, there still exist two challenging problems in the field of recognition. The first one is optical character recognition (OCR) for low-quality images. Images having luminance variations, noise, and random degradation of text are difficult to read by OCR systems. The second open problem is that of recognizing off-line cursive handwritten character recognition [15]. Our work concentrates on the former one particularly for Devanagari script, which is the script for Hindi, Nepali, Marathi, and several other Indic languages. Together, these languages have a user base exceeding 500 million people.


Machine Intelligence Character Recognition Document Image Optical Character Recognition High Dimensional Feature Space 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bansal V, Sinha RMK (2001) A Devanagari OCR and a brief review of OCR research for Indian scripts. Proceedings of STRANS01Google Scholar
  2. 2.
    Chaudhari BB, Pal U (1997) An OCR system to read two Indian Languages scripts. Proc. of 4th Int. Conf. on Document Analysis and Recognition, 1011–1015Google Scholar
  3. 3.
    Atul Negi, Chakravarthy Bhagvati, Krishna B (2001) An OCR System for Telugu, ICDAR, 1110Google Scholar
  4. 4.
    Jawahar CV, Pavan Kumar MNSSK, Ravi Kiran SS (2003) A Bilingual OCR for Hindi-Telugu Documents and its Applications, ICDAR. 408–412Google Scholar
  5. 5.
    Xuewen Wang, Xiaoqing Ding, Changsong Liu (2002) Optimized Gabor Filters Based Feature Extraction for Character Recognition, Proc.16th International Conference on Pattern Recognition, 223–226Google Scholar
  6. 6.
    Qiang Huo, Yong Ge and Zhi Dan Feng, (2001) High Performance Chinese OCR Based on Gabor Features, Discriminative Feature Extraction and Model Training. Proc. IEEE International Conference on Accoustic, Speech and Signal Processing, 1517–1520Google Scholar
  7. 7.
    Mantas J (1986) An Overview of Character Recognition Methodologies, Pattern Recognition 19:425–430CrossRefGoogle Scholar
  8. 8.
    Bozinovic RM, Srihari SN (1989) Offline Cursive Script Word Recognition. IEEE Trans on Pattern Analaysis and Machine Intelligence 11:68–83CrossRefGoogle Scholar
  9. 9.
    Mori S, Suen CY, Yamamoto K (1992) Historical Review of OCR Research and Development. Proc. of IEEE 80:1029–1058CrossRefGoogle Scholar
  10. 10.
    Nagy G (2000) Twenty Years of Document Image Analysis in Pattern Analysis and Machine Intelligence. IEEE Trans. on Pattern Analysis and Machine Intelligence 22:38–62CrossRefGoogle Scholar
  11. 11.
    Zhang J, Yan Y, Lades M (1997) Face recognition: Eigenface, Elastic Matching, and Neural Nets. Proc. of IEEE 85:1423–1435CrossRefGoogle Scholar
  12. 12.
    Juang BH and Katigiri S (1992) Discriminative Learning for Minimum Error Classification Paper Title. IEEE Trans. on Signal Processing 4:3043–3054CrossRefGoogle Scholar
  13. 13.
    XuewenWang, Xiaoqing Ding and Changsong Liu (2005) Gabor Based Feature Extraction for Character Recognition. Pattern Recognition 38:369–379CrossRefGoogle Scholar
  14. 14.
    Alain Biem (2006) Minimum Classification Error Training for Online Handwriting Recognition. IEEE Trans. on Pattern Analysis and Machine Intelligence 28:1041–1051CrossRefGoogle Scholar
  15. 15.
    Plamondon R, Srihari SN (2000) On-line and Off-line Handwriting Recognition: A Comprehensive Survey. IEEE. Trans. on Pattern Analysis and Machine Intelligence 22:63–84CrossRefGoogle Scholar
  16. 16.
    Kanungo T. et al. (2000) A Statistical, Nonparametric Methodology for Document Degradation Model Validation. IEEE Trans. on Pattern Analysis and Machine Intelligence 20:1209–1223Google Scholar
  17. 17.
    Chaudhuri BB and Pal U (1997) Skew Angle Detection of Digitized Indian Script Documents. IEEE Trans. on Pattern Analysis and Machine Intelligence 19:182–186CrossRefGoogle Scholar
  18. 18.
    Maurer CR, Qi R, Raghavan V (2003) A Linear Time Algorithm for Computing Exact Euclidean Distance Transforms of Binary Images in Arbitrary Dimensions. IEEE Trans. on Pattern Analysis and Machine Intelligence 25:265–270CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Kapil Dev Dhingra
    • 1
  • Sudip Sanyal
    • 1
  • Pramod Kumar Sharma
    • 1
  1. 1.Indian Institute of Information Technology AllahabadUniversal Digital Library Research LabIndia

Personalised recommendations