Skip to main content

RWTH OCR: A Large Vocabulary Optical Character Recognition System for Arabic Scripts

  • Chapter
Guide to OCR for Arabic Scripts

Abstract

We present a novel large vocabulary OCR system, which implements a confidence- and margin-based discriminative training approach for model adaptation of an HMM-based recognition system to handle multiple fonts, different handwriting styles, and their variations. Most current HMM approaches are HTK-based systems which are maximum likelihood (ML) trained and which try to adapt their models to different writing styles using writer adaptive training, unsupervised clustering, or additional writer-specific data. Here, discriminative training based on the maximum mutual information (MMI) and minimum phone error (MPE) criteria are used instead. For model adaptation during decoding, an unsupervised confidence-based discriminative training within a two-pass decoding process is proposed. Additionally, we use neural network-based features extracted by a hierarchical multi-layer perceptron (MLP) network either in a hybrid MLP/HMM approach or to discriminatively retrain a Gaussian HMM system in a tandem approach. The proposed framework and methods are evaluated for closed-vocabulary isolated handwritten word recognition on the IFN/ENIT-database Arabic handwriting database, where the word error rate is decreased by more than 50 % relative to an ML trained baseline system. Preliminary results for large vocabulary Arabic machine-printed text recognition tasks are presented on a novel publicly available newspaper database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://code.google.com/p/ocropus/

  2. 2.

    http://code.google.com/p/tesseract-ocr/

  3. 3.

    http://www.irislink.com/readiris/

  4. 4.

    http://www.novodynamics.com/

  5. 5.

    http://www.hltpr.rwth-aachen.de/rwth-ocr/

  6. 6.

    http://www.hltpr.rwth-aachen.de/rwth-asr/

  7. 7.

    http://www.ashrafraouf.com/mmac

  8. 8.

    http://www.hltpr.rwth-aachen.de/~dreuw/arabic.php

  9. 9.

    http://www.pdflib.com/products/tet/

  10. 10.

    http://www.layoutltd.com/arabicxt.php

  11. 11.

    http://code.google.com/p/ocropus/

References

  1. AbdelRaouf, A., Higgins, C., Pridmore, T., Khalil, M.: Building a multi-modal Arabic corpus (MMAC). Int. J. Doc. Anal. Recognit. 13, 285–302 (2010). doi:10.1007/s10032-010-0128-2

    Article  Google Scholar 

  2. Al-Hashim, A.G., Mahmoud, S.A.: Printed Arabic text database (PATDB) for research and benchmarking. In: Proceedings of the 9th WSEAS International Conference on Applications of Computer Engineering, ACE’10, pp. 62–68, Stevens Point, Wisconsin, USA. World Scientific and Engineering Academy and Society (WSEAS), Stevens Point (2010)

    Google Scholar 

  3. Anastasakos, T., Balakrishnan, S.V.: The use of confidence measures in unsupervised adaptation of speech recognizers. In: International Conference on Spoken Language Processing (ICSLP), Sydney, Australia (1998)

    Google Scholar 

  4. Bazzi, I., Schwartz, R., Makhoul, J.: An omnifont open-vocabulary OCR system for English and Arabic. IEEE Trans. Pattern Anal. Mach. Intell. 21(6), 495–504 (1999)

    Article  Google Scholar 

  5. Bertolami, R., Bunke, H.: Hidden Markov model-based ensemble methods for offline handwritten text line recognition. Pattern Recognit. 41(11), 3452–3460 (2008)

    Article  MATH  Google Scholar 

  6. Biem, A.: Minimum classification error training for online handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28(7), 1041–1051 (2006)

    Article  Google Scholar 

  7. Bourland, H., Morgan, N.: Connectionist speech recognition: A hybrid approach. In: Series in Engineering and Computer Science, vol. 247. Kluwer Academic, Dordrecht (1994)

    Google Scholar 

  8. Chen, S.S., Gopalakrishnan, P.S.: Speaker, environment and channel change detection and clustering via the Bayesian information criterion. In: DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, Virginia, USA, February 1998, pp. 127–132 (1998)

    Google Scholar 

  9. Davidson, R., Hopely, R.: Arabic and Persian OCR training and test data sets. In: Symp. on Document Image Understanding Technology, 30 April–2 May, 1997

    Google Scholar 

  10. Do, T.-M.-T., Artières, T.: Maximum margin training of Gaussian HMMs for handwriting recognition. In: International Conference on Document Analysis and Recognition (ICDAR), Barcelona, Spain, July 2009, pp. 976–980 (2009)

    Chapter  Google Scholar 

  11. Dreuw, P.: Probabilistic sequence models for image sequence processing and recognition. Ph.D. thesis, RWTH Aachen University, Aachen, Germany, December 2011

    Google Scholar 

  12. Dreuw, P., Rybach, D., Deselaers, T., Zahedi, M., Ney, H.: Speech recognition techniques for a sign language recognition system. In: Interspeech, Antwerp, Belgium, August 2007, pp. 2513–2516 (2007)

    Google Scholar 

  13. Dreuw, P., Jonas, S., Ney, H.: White-space models for offline Arabic handwriting recognition. In: International Conference on Pattern Recognition (ICPR), Tampa, Florida, USA, December 2008

    Google Scholar 

  14. Dreuw, P., Heigold, G., Ney, H.: Confidence-based discriminative training for model adaptation in offline Arabic handwriting recognition. In: International Conference on Document Analysis and Recognition (ICDAR), Barcelona, Spain, July 2009, pp. 596–600 (2009)

    Chapter  Google Scholar 

  15. Dreuw, P., Rybach, D., Gollan, C., Ney, H.: Writer adaptive training and writing variant model refinement for offline Arabic handwriting recognition. In: International Conference on Document Analysis and Recognition (ICDAR), Barcelona, Spain, July 2009

    Google Scholar 

  16. Dreuw, P., Heigold, G., Ney, H.: Confidence and margin-based MMI/MPE discriminative training for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 14(3), 273–288 (2011)

    Article  Google Scholar 

  17. Dreuw, P., Doetsch, P., Plahl, C., Ney, H.: Hierarchical hybrid MLP/HMM or rather MLP features for a discriminatively trained Gaussian HMM: A comparison for offline handwriting recognition. In: IEEE International Conference on Image Processing, Brussels, Belgium, September 2011

    Google Scholar 

  18. El Abed, H., Märgner, V.: Improvement of Arabic handwriting recognition systems: combination and/or reject? In: Document Recognition and Retrieval XVI. Proc. SPIE, vol. 7247, San Jose, CA, USA, January 2009

    Google Scholar 

  19. El Abed, H., Märgner, V.: ICDAR 2009—Arabic handwriting recognition competition. Int. J. Doc. Anal. Recognit. 14(1), 3–13 (2010)

    Google Scholar 

  20. Espana-Boquera, S., Castro-Bleda, M., Gorbe-Moya, J., Zamora-Martinez, F.: Improving offline handwritten text recognition with hybrid HMM/ANN models. IEEE Trans. Pattern Anal. Mach. Intell. 33(4), 767–779 (2011)

    Article  Google Scholar 

  21. Fink, G.A., Plötz, T.: Unsupervised estimation of writing style models for improved unconstrained off-line handwriting recognition. In: International Workshop on Frontiers in Handwriting Recognition (IWFHR), La Baule, France, October 2006

    Google Scholar 

  22. Gales, M.J.F.: Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)

    Article  Google Scholar 

  23. Gollan, C., Bacchiani, M.: Confidence scores for acoustic model adaptation. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, NV, USA, April 2008, pp. 4289–4292 (2008)

    Google Scholar 

  24. Gollan, C., Ney, H.: Towards automatic learning in LVCSR: rapid development of a Persian broadcast transcription system. In: Interspeech, Brisbane, Australia, September 2008, pp. 1441–1444 (2008)

    Google Scholar 

  25. Graves, A., Liwicki, M., Fernandez, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855–868 (2009)

    Article  Google Scholar 

  26. Heigold, G.: A log-Linear discriminative modeling framework for speech recognition. Ph.D. thesis, RWTH Aachen University, Aachen, Germany, June 2010

    Google Scholar 

  27. Heigold, G., Deselaers, T., Schlüter, R., Ney, H.: Modified MMI/MPE: a direct evaluation of the margin in speech recognition. In: International Conference on Machine Learning (ICML), Helsinki, Finland, July 2008, pp. 384–391 (2008)

    Google Scholar 

  28. Heigold, G., Schlüter, R., Ney, H.: Modified MPE/MMI in a transducer-based framework. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan, April 2009, pp. 3749–3752 (2009)

    Google Scholar 

  29. Heigold, G., Dreuw, P., Hahn, S., Schlüter, R., Ney, H.: Margin-based discriminative training for string recognition. IEEE J. Sel. Top. Signal Process. 4(6), 917–925 (2010)

    Article  Google Scholar 

  30. Heigold, G., Wiesler, S., Nussbaum, M., Lehnen, P., Schlüter, R., Ney, H.: Discriminative HMMs, log–linear models, and CRFs: what is the difference? In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Dallas, Texas, USA, March 2010, pp. 5546–5549 (2010)

    Google Scholar 

  31. Hermansky, H., Sharma, S.: Traps—classifiers of temporal patterns. In: International Conference on Spoken Language Processing (ICSLP) (1998)

    Google Scholar 

  32. Jacobs, C., Simard, P.Y., Viola, P., Rinker, J.: Text recognition of low-resolution document images. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 695–699 (2005)

    Chapter  Google Scholar 

  33. Jebara, T.: Discriminative, generative, and imitative learning. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA, USA (2002)

    Google Scholar 

  34. Juan, A., Toselli, A.H., Domnech, J., Gonzlez, J., Salvador, I., Vidal, E., Casacuberta, F.: Integrated handwriting recognition and interpretation via finite-state models. Int. J. Pattern Recognit. Artif. Intell. 2004, 519–539 (2001)

    Google Scholar 

  35. Kae, A., Learned-Miller, E.: Learning on the fly: font-free approaches to difficult OCR problems. In: International Conference on Document Analysis and Recognition (ICDAR), Barcelona, Spain, July 2009, pp. 571–575 (2009)

    Chapter  Google Scholar 

  36. Kanthak, S., Schütz, K., Ney, H.: Using SIMD instructions for fast likelihood calculation in LVCSR. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Istanbul, Turkey, June 2000, pp. 1531–1534 (2000)

    Google Scholar 

  37. Kemp, T., Schaaf, T.: Estimating confidence using word lattices. In: European Conference on Speech Communication and Technology, Rhodes, Greece (1997)

    Google Scholar 

  38. Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput. Speech Lang. 9(2), 171–185 (1995)

    Article  Google Scholar 

  39. Leggetter, C.J., Woodland, P.C.: Flexible speaker adaptation using maximum likelihood linear regression. In: ARPA Spoken Language Technology Workshop, Austin, TX, USA, January 1995, pp. 104–109 (1995)

    Google Scholar 

  40. Lööf, J., Gollan, C., Hahn, S., Heigold, G., Hoffmeister, B., Plahl, C., Rybach, D., Schlüter, R., Ney, H.: The RWTH 2007 TC-STAR evaluation system for European English and Spanish. In: Interspeech, Antwerp, Belgium, August 2007, pp. 2145–2148 (2007)

    Google Scholar 

  41. Lööf, J., Schlüter, R., Ney, H.: Efficient estimation of speaker-specific projecting feature transforms. In: International Conference on Spoken Language Processing (ICSLP), Antwerp, Belgium, August 2007, pp. 1557–1560 (2007)

    Google Scholar 

  42. Lorigo, L.M., Govindaraju, V.: Offline Arabic handwriting recognition: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 28(85), 712–724 (2006)

    Article  Google Scholar 

  43. Lu, Y.: Machine printed character segmentation—an overview. Pattern Recognit. 28(1), 67–80 (1995)

    Article  Google Scholar 

  44. Lu, Z.A., Bazzi, I., Kornai, A., Makhoul, J., Natarajan, P.S., Schwartz, R.: A robust language-independent OCR system. In: AIPR Workshop: Advances in Computer-Assisted Recognition. Proc. SPIE, vol. 3584, pp. 96–104 (1998)

    Chapter  Google Scholar 

  45. Märgner, V., El Abed, H.: ICDAR 2007 Arabic handwriting recognition competition. In: International Conference on Document Analysis and Recognition (ICDAR), September 2007, vol. 2, pp. 1274–1278 (2007)

    Google Scholar 

  46. Märgner, V., El Abed, H.: ICDAR 2009 Arabic handwriting recognition competition. In: International Conference on Document Analysis and Recognition (ICDAR), Barcelona, Spain, July 2009, pp. 1383–1387 (2009)

    Chapter  Google Scholar 

  47. Märgner, V., El Abed, H.: ICFHR 2010 Arabic handwriting recognition competition. In: International Conference on Frontiers in Handwriting Recognition (ICFHR), Kalkota, India, November 2010

    Google Scholar 

  48. Märgner, V., Pechwitz, M., Abed, H.E.: ICDAR 2005 Arabic handwriting recognition competition. In: International Conference on Document Analysis and Recognition (ICDAR), Seoul, Korea, August 2005, vol. 1, pp. 70–74 (2005)

    Chapter  Google Scholar 

  49. Natarajan, P.: Portable language-independent adaptive translation from OCR, final report (phase 1). Technical report, BBN Technologies, June 2009

    Google Scholar 

  50. Natarajan, P., Saleem, S., Prasad, R., MacRostie, E., Subramanian, K.: Multi-lingual offline handwriting recognition using hidden Markov models: a script-independent approach. In: Arabic and Chinese Handwriting Recognition. LNCS, vol. 4768, pp. 231–250. Springer, Berlin (2008)

    Chapter  Google Scholar 

  51. Ney, H., Ortmanns, S.: Progress in dynamic programming search for LVCSR. Proc. IEEE 88(8), 1224–1240 (2000)

    Article  Google Scholar 

  52. NIST 2010 open handwriting recognition and translation evaluation plan, version 2.8, February 2010

    Google Scholar 

  53. Nopsuwanchai, R., Povey, D.: Discriminative training for HMM-based offline handwritten character recognition. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 114–118 (2003)

    Google Scholar 

  54. Nopsuwanchai, R., Biem, A., Clocksin, W.F.: Maximization of mutual information for offline Thai handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28(8), 1347–1351 (2006)

    Article  Google Scholar 

  55. Nußbaum-Thom, M., Wiesler, S., Sundermeyer, M., Plahl, C., Hahn, S., Schlüter, R., Ney, H.: The RWTH 2009 Quaero ASR evaluation system for English and German. In: Interspeech, Makuhari, Japan, September 2010

    Google Scholar 

  56. Olive, J.: Multilingual automatic document classification analysis and translation (MADCAT). Proposer Information Pamphlet SOL BAA 07-38, DARPA/IPTO (2007)

    Google Scholar 

  57. Ortmanns, S., Ney, H.: Look-ahead techniques for fast beam search. Comput. Speech Lang. 14(1), 15–32 (2000)

    Article  Google Scholar 

  58. Ortmanns, S., Ney, H., Aubert, X.: A word graph algorithm for large vocabulary continuous speech recognition. Comput. Speech Lang. 11(1), 43–72 (1997)

    Article  Google Scholar 

  59. Padmanabhan, M., Saon, G., Zweig, G.: Lattice-based unsupervised MLLR for speaker adaptation. In: ISCA ITRW Automatic Speech Recognition: Challenges for the Millennium, Paris, France 2000

    Google Scholar 

  60. Pastor, A.G., Khoury, I., Juan, A.: Windowed Bernoulli mixture HMMs for Arabic handwritten word recognition. In: International Conference on Frontiers in Handwriting Recognition (ICFHR), Kalkota, India, November 2010

    Google Scholar 

  61. Pechwitz, M., Snoussi Maddouri, S., Mägner, V., Ellouze, N., Amiri, H.: IFN/ENIT-database of handwritten Arabic words. In: Colloque International Francophone sur l’Ecrit et le Document (CIFED), Hammamet, Tunisia, October 2002

    Google Scholar 

  62. Pitz, M., Wessel, F., Ney, H.: Improved MLLR speaker adaptation using confidence measures for conversational speech recognition. In: International Conference on Spoken Language Processing (ICSLP), Beijing, China (2000)

    Google Scholar 

  63. Plahl, C., Hoffmeister, B., Hwang, M.-Y., Lu, D., Heigold, G., Lööf, J., Schlüter, R., Ney, H.: Recent improvements of the RWTH GALE Mandarin LVCSR system. In: Interspeech, Brisbane, Australia, September 2008, pp. 2426–2429 (2008)

    Google Scholar 

  64. Povey, D.: Discriminative training for large vocabulary speech recognition. Ph.D. thesis, Cambridge, England (2004)

    Google Scholar 

  65. Povey, D., Woodland, P.C.: Minimum phone error and I-smoothing for improved discriminative training. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, Orlando, FL, USA (2002)

    Google Scholar 

  66. Povey, D., Kanevsky, D., Kingsbury, B., Ramabhadran, B., Saon, G., Visweswariah, K.: Boosted MMI for model and feature-space discriminative training. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, NV, USA, April 2008

    Google Scholar 

  67. Prasad, R., Saleem, S., Kamali, M., Meermeier, R., Natarajan, P.: Improvements in hidden Markov model based Arabic OCR. In: International Conference on Pattern Recognition (ICPR), Tampa, FL, USA, December 2008

    Google Scholar 

  68. Rybach, D., Hahn, S., Gollan, C., Schlüter, R., Ney, H.: Advances in Arabic broadcast news transcription at RWTH. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Kyoto, Japan, December 2007, pp. 449–454 (2007)

    Chapter  Google Scholar 

  69. Rybach, D., Gollan, C., Heigold, G., Hoffmeister, B., Lööf, J., Schlüter, R., Ney, H.: The RWTH Aachen university open source speech recognition system. In: Interspeech, Brighton, UK, September 2009, pp. 2111–2114 (2009)

    Google Scholar 

  70. Schambach, M.-P., Rottland, J., Alary, T.: How to convert a Latin handwriting recognition system to Arabic. In: International Conference on Frontiers in Handwriting Recognition (ICFHR) (2008)

    Google Scholar 

  71. Schenk, J., Rigoll, G.: Novel hybrid NN/HMM modelling techniques for on-line handwriting recognition. In: International Workshop on Frontiers in Handwriting Recognition (IWFHR), La Baule, France, October 2006

    Google Scholar 

  72. Schlüter, R.: Investigations on discriminative training criteria. Ph.D. thesis, RWTH Aachen University, Aachen, Germany, September 2000

    Google Scholar 

  73. Schlüter, R., Müller, B., Wessel, F., Ney, H.: Interdependence of language models and discriminative training. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Keystone, CO, December 1999, vol. 1, pp. 119–122 (1999)

    Google Scholar 

  74. Slimane, F., Ingold, R., Kanoun, S., Alimi, M.A., Hennebert, J.: A new Arabic printed text image database and evaluation protocols. In: International Conference on Document Analysis and Recognition (ICDAR), Barcelona, Spain, July 2009, pp. 946–950 (2009)

    Chapter  Google Scholar 

  75. Stolcke, A.: SRILM—an extensible language modeling toolkit. In: International Conference on Spoken Language Processing (ICSLP), Denver, CO, USA, September 2002

    Google Scholar 

  76. Valente, F., Vepa, J., Plahl, C., Gollan, C., Hermansky, H., Schlüter, R.: Hierarchical neural networks feature extraction for LVCSR system. In: Interspeech, Antwerp, Belgium, August 2007, pp. 42–45 (2007)

    Google Scholar 

  77. von Ahn, L., Maurer, B., McMillen, C., Abraham, D., Blum, M.: reCAPTCHA: Human-based character recognition via web security measures. Science 321(5895), 1465–1468 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  78. Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X.A., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.4). Cambridge University Engineering Department, Cambridge (2006)

    Google Scholar 

  79. Zhang, J., Jin, R., Yang, Y., Hauptmann, A.G.: Modified logistic regression: an approximation to SVM and its applications in large-scale text categorization. In: International Conference on Machine Learning (ICML), August 2003

    Google Scholar 

Download references

Acknowledgements

We would like to thank Christian Plahl, Stefan Hahn, Simon Wiesler, Patrick Doetsch, Robert Pyttel, Stephan Jonas, Jens Forster, Christian Gollan, and Thomas Deselaers for their support.

This work was partly realized as part of the Google Research Award “Robust Recognition of Machine Printed Text” and as part of the Quaero Programme, funded by OSEO, the French State agency for innovation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philippe Dreuw .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag London

About this chapter

Cite this chapter

Dreuw, P., Rybach, D., Heigold, G., Ney, H. (2012). RWTH OCR: A Large Vocabulary Optical Character Recognition System for Arabic Scripts. In: Märgner, V., El Abed, H. (eds) Guide to OCR for Arabic Scripts. Springer, London. https://doi.org/10.1007/978-1-4471-4072-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-4072-6_9

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-4071-9

  • Online ISBN: 978-1-4471-4072-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics