RWTH OCR: A Large Vocabulary Optical Character Recognition System for Arabic Scripts

Dreuw, Philippe; Rybach, David; Heigold, Georg; Ney, Hermann

doi:10.1007/978-1-4471-4072-6_9

Philippe Dreuw³,
David Rybach³,
Georg Heigold³ &
…
Hermann Ney³

1804 Accesses
17 Citations

Abstract

We present a novel large vocabulary OCR system, which implements a confidence- and margin-based discriminative training approach for model adaptation of an HMM-based recognition system to handle multiple fonts, different handwriting styles, and their variations. Most current HMM approaches are HTK-based systems which are maximum likelihood (ML) trained and which try to adapt their models to different writing styles using writer adaptive training, unsupervised clustering, or additional writer-specific data. Here, discriminative training based on the maximum mutual information (MMI) and minimum phone error (MPE) criteria are used instead. For model adaptation during decoding, an unsupervised confidence-based discriminative training within a two-pass decoding process is proposed. Additionally, we use neural network-based features extracted by a hierarchical multi-layer perceptron (MLP) network either in a hybrid MLP/HMM approach or to discriminatively retrain a Gaussian HMM system in a tandem approach. The proposed framework and methods are evaluated for closed-vocabulary isolated handwritten word recognition on the IFN/ENIT-database Arabic handwriting database, where the word error rate is decreased by more than 50 % relative to an ML trained baseline system. Preliminary results for large vocabulary Arabic machine-printed text recognition tasks are presented on a novel publicly available newspaper database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

AbdelRaouf, A., Higgins, C., Pridmore, T., Khalil, M.: Building a multi-modal Arabic corpus (MMAC). Int. J. Doc. Anal. Recognit. 13, 285–302 (2010). doi:10.1007/s10032-010-0128-2
Article Google Scholar
Al-Hashim, A.G., Mahmoud, S.A.: Printed Arabic text database (PATDB) for research and benchmarking. In: Proceedings of the 9th WSEAS International Conference on Applications of Computer Engineering, ACE’10, pp. 62–68, Stevens Point, Wisconsin, USA. World Scientific and Engineering Academy and Society (WSEAS), Stevens Point (2010)
Google Scholar
Anastasakos, T., Balakrishnan, S.V.: The use of confidence measures in unsupervised adaptation of speech recognizers. In: International Conference on Spoken Language Processing (ICSLP), Sydney, Australia (1998)
Google Scholar
Bazzi, I., Schwartz, R., Makhoul, J.: An omnifont open-vocabulary OCR system for English and Arabic. IEEE Trans. Pattern Anal. Mach. Intell. 21(6), 495–504 (1999)
Article Google Scholar
Bertolami, R., Bunke, H.: Hidden Markov model-based ensemble methods for offline handwritten text line recognition. Pattern Recognit. 41(11), 3452–3460 (2008)
Article MATH Google Scholar
Biem, A.: Minimum classification error training for online handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28(7), 1041–1051 (2006)
Article Google Scholar
Bourland, H., Morgan, N.: Connectionist speech recognition: A hybrid approach. In: Series in Engineering and Computer Science, vol. 247. Kluwer Academic, Dordrecht (1994)
Google Scholar
Chen, S.S., Gopalakrishnan, P.S.: Speaker, environment and channel change detection and clustering via the Bayesian information criterion. In: DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, Virginia, USA, February 1998, pp. 127–132 (1998)
Google Scholar
Davidson, R., Hopely, R.: Arabic and Persian OCR training and test data sets. In: Symp. on Document Image Understanding Technology, 30 April–2 May, 1997
Google Scholar
Do, T.-M.-T., Artières, T.: Maximum margin training of Gaussian HMMs for handwriting recognition. In: International Conference on Document Analysis and Recognition (ICDAR), Barcelona, Spain, July 2009, pp. 976–980 (2009)
Chapter Google Scholar
Dreuw, P.: Probabilistic sequence models for image sequence processing and recognition. Ph.D. thesis, RWTH Aachen University, Aachen, Germany, December 2011
Google Scholar
Dreuw, P., Rybach, D., Deselaers, T., Zahedi, M., Ney, H.: Speech recognition techniques for a sign language recognition system. In: Interspeech, Antwerp, Belgium, August 2007, pp. 2513–2516 (2007)
Google Scholar
Dreuw, P., Jonas, S., Ney, H.: White-space models for offline Arabic handwriting recognition. In: International Conference on Pattern Recognition (ICPR), Tampa, Florida, USA, December 2008
Google Scholar
Dreuw, P., Heigold, G., Ney, H.: Confidence-based discriminative training for model adaptation in offline Arabic handwriting recognition. In: International Conference on Document Analysis and Recognition (ICDAR), Barcelona, Spain, July 2009, pp. 596–600 (2009)
Chapter Google Scholar
Dreuw, P., Rybach, D., Gollan, C., Ney, H.: Writer adaptive training and writing variant model refinement for offline Arabic handwriting recognition. In: International Conference on Document Analysis and Recognition (ICDAR), Barcelona, Spain, July 2009
Google Scholar
Dreuw, P., Heigold, G., Ney, H.: Confidence and margin-based MMI/MPE discriminative training for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 14(3), 273–288 (2011)
Article Google Scholar
Dreuw, P., Doetsch, P., Plahl, C., Ney, H.: Hierarchical hybrid MLP/HMM or rather MLP features for a discriminatively trained Gaussian HMM: A comparison for offline handwriting recognition. In: IEEE International Conference on Image Processing, Brussels, Belgium, September 2011
Google Scholar
El Abed, H., Märgner, V.: Improvement of Arabic handwriting recognition systems: combination and/or reject? In: Document Recognition and Retrieval XVI. Proc. SPIE, vol. 7247, San Jose, CA, USA, January 2009
Google Scholar
El Abed, H., Märgner, V.: ICDAR 2009—Arabic handwriting recognition competition. Int. J. Doc. Anal. Recognit. 14(1), 3–13 (2010)
Google Scholar
Espana-Boquera, S., Castro-Bleda, M., Gorbe-Moya, J., Zamora-Martinez, F.: Improving offline handwritten text recognition with hybrid HMM/ANN models. IEEE Trans. Pattern Anal. Mach. Intell. 33(4), 767–779 (2011)
Article Google Scholar
Fink, G.A., Plötz, T.: Unsupervised estimation of writing style models for improved unconstrained off-line handwriting recognition. In: International Workshop on Frontiers in Handwriting Recognition (IWFHR), La Baule, France, October 2006
Google Scholar
Gales, M.J.F.: Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)
Article Google Scholar
Gollan, C., Bacchiani, M.: Confidence scores for acoustic model adaptation. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, NV, USA, April 2008, pp. 4289–4292 (2008)
Google Scholar
Gollan, C., Ney, H.: Towards automatic learning in LVCSR: rapid development of a Persian broadcast transcription system. In: Interspeech, Brisbane, Australia, September 2008, pp. 1441–1444 (2008)
Google Scholar
Graves, A., Liwicki, M., Fernandez, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855–868 (2009)
Article Google Scholar
Heigold, G.: A log-Linear discriminative modeling framework for speech recognition. Ph.D. thesis, RWTH Aachen University, Aachen, Germany, June 2010
Google Scholar
Heigold, G., Deselaers, T., Schlüter, R., Ney, H.: Modified MMI/MPE: a direct evaluation of the margin in speech recognition. In: International Conference on Machine Learning (ICML), Helsinki, Finland, July 2008, pp. 384–391 (2008)
Google Scholar
Heigold, G., Schlüter, R., Ney, H.: Modified MPE/MMI in a transducer-based framework. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan, April 2009, pp. 3749–3752 (2009)
Google Scholar
Heigold, G., Dreuw, P., Hahn, S., Schlüter, R., Ney, H.: Margin-based discriminative training for string recognition. IEEE J. Sel. Top. Signal Process. 4(6), 917–925 (2010)
Article Google Scholar
Heigold, G., Wiesler, S., Nussbaum, M., Lehnen, P., Schlüter, R., Ney, H.: Discriminative HMMs, log–linear models, and CRFs: what is the difference? In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Dallas, Texas, USA, March 2010, pp. 5546–5549 (2010)
Google Scholar
Hermansky, H., Sharma, S.: Traps—classifiers of temporal patterns. In: International Conference on Spoken Language Processing (ICSLP) (1998)
Google Scholar
Jacobs, C., Simard, P.Y., Viola, P., Rinker, J.: Text recognition of low-resolution document images. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 695–699 (2005)
Chapter Google Scholar
Jebara, T.: Discriminative, generative, and imitative learning. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA, USA (2002)
Google Scholar
Juan, A., Toselli, A.H., Domnech, J., Gonzlez, J., Salvador, I., Vidal, E., Casacuberta, F.: Integrated handwriting recognition and interpretation via finite-state models. Int. J. Pattern Recognit. Artif. Intell. 2004, 519–539 (2001)
Google Scholar
Kae, A., Learned-Miller, E.: Learning on the fly: font-free approaches to difficult OCR problems. In: International Conference on Document Analysis and Recognition (ICDAR), Barcelona, Spain, July 2009, pp. 571–575 (2009)
Chapter Google Scholar
Kanthak, S., Schütz, K., Ney, H.: Using SIMD instructions for fast likelihood calculation in LVCSR. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Istanbul, Turkey, June 2000, pp. 1531–1534 (2000)
Google Scholar
Kemp, T., Schaaf, T.: Estimating confidence using word lattices. In: European Conference on Speech Communication and Technology, Rhodes, Greece (1997)
Google Scholar
Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput. Speech Lang. 9(2), 171–185 (1995)
Article Google Scholar
Leggetter, C.J., Woodland, P.C.: Flexible speaker adaptation using maximum likelihood linear regression. In: ARPA Spoken Language Technology Workshop, Austin, TX, USA, January 1995, pp. 104–109 (1995)
Google Scholar
Lööf, J., Gollan, C., Hahn, S., Heigold, G., Hoffmeister, B., Plahl, C., Rybach, D., Schlüter, R., Ney, H.: The RWTH 2007 TC-STAR evaluation system for European English and Spanish. In: Interspeech, Antwerp, Belgium, August 2007, pp. 2145–2148 (2007)
Google Scholar
Lööf, J., Schlüter, R., Ney, H.: Efficient estimation of speaker-specific projecting feature transforms. In: International Conference on Spoken Language Processing (ICSLP), Antwerp, Belgium, August 2007, pp. 1557–1560 (2007)
Google Scholar
Lorigo, L.M., Govindaraju, V.: Offline Arabic handwriting recognition: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 28(85), 712–724 (2006)
Article Google Scholar
Lu, Y.: Machine printed character segmentation—an overview. Pattern Recognit. 28(1), 67–80 (1995)
Article Google Scholar
Lu, Z.A., Bazzi, I., Kornai, A., Makhoul, J., Natarajan, P.S., Schwartz, R.: A robust language-independent OCR system. In: AIPR Workshop: Advances in Computer-Assisted Recognition. Proc. SPIE, vol. 3584, pp. 96–104 (1998)
Chapter Google Scholar
Märgner, V., El Abed, H.: ICDAR 2007 Arabic handwriting recognition competition. In: International Conference on Document Analysis and Recognition (ICDAR), September 2007, vol. 2, pp. 1274–1278 (2007)
Google Scholar
Märgner, V., El Abed, H.: ICDAR 2009 Arabic handwriting recognition competition. In: International Conference on Document Analysis and Recognition (ICDAR), Barcelona, Spain, July 2009, pp. 1383–1387 (2009)
Chapter Google Scholar
Märgner, V., El Abed, H.: ICFHR 2010 Arabic handwriting recognition competition. In: International Conference on Frontiers in Handwriting Recognition (ICFHR), Kalkota, India, November 2010
Google Scholar
Märgner, V., Pechwitz, M., Abed, H.E.: ICDAR 2005 Arabic handwriting recognition competition. In: International Conference on Document Analysis and Recognition (ICDAR), Seoul, Korea, August 2005, vol. 1, pp. 70–74 (2005)
Chapter Google Scholar
Natarajan, P.: Portable language-independent adaptive translation from OCR, final report (phase 1). Technical report, BBN Technologies, June 2009
Google Scholar
Natarajan, P., Saleem, S., Prasad, R., MacRostie, E., Subramanian, K.: Multi-lingual offline handwriting recognition using hidden Markov models: a script-independent approach. In: Arabic and Chinese Handwriting Recognition. LNCS, vol. 4768, pp. 231–250. Springer, Berlin (2008)
Chapter Google Scholar
Ney, H., Ortmanns, S.: Progress in dynamic programming search for LVCSR. Proc. IEEE 88(8), 1224–1240 (2000)
Article Google Scholar
NIST 2010 open handwriting recognition and translation evaluation plan, version 2.8, February 2010
Google Scholar
Nopsuwanchai, R., Povey, D.: Discriminative training for HMM-based offline handwritten character recognition. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 114–118 (2003)
Google Scholar
Nopsuwanchai, R., Biem, A., Clocksin, W.F.: Maximization of mutual information for offline Thai handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28(8), 1347–1351 (2006)
Article Google Scholar
Nußbaum-Thom, M., Wiesler, S., Sundermeyer, M., Plahl, C., Hahn, S., Schlüter, R., Ney, H.: The RWTH 2009 Quaero ASR evaluation system for English and German. In: Interspeech, Makuhari, Japan, September 2010
Google Scholar
Olive, J.: Multilingual automatic document classification analysis and translation (MADCAT). Proposer Information Pamphlet SOL BAA 07-38, DARPA/IPTO (2007)
Google Scholar
Ortmanns, S., Ney, H.: Look-ahead techniques for fast beam search. Comput. Speech Lang. 14(1), 15–32 (2000)
Article Google Scholar
Ortmanns, S., Ney, H., Aubert, X.: A word graph algorithm for large vocabulary continuous speech recognition. Comput. Speech Lang. 11(1), 43–72 (1997)
Article Google Scholar
Padmanabhan, M., Saon, G., Zweig, G.: Lattice-based unsupervised MLLR for speaker adaptation. In: ISCA ITRW Automatic Speech Recognition: Challenges for the Millennium, Paris, France 2000
Google Scholar
Pastor, A.G., Khoury, I., Juan, A.: Windowed Bernoulli mixture HMMs for Arabic handwritten word recognition. In: International Conference on Frontiers in Handwriting Recognition (ICFHR), Kalkota, India, November 2010
Google Scholar
Pechwitz, M., Snoussi Maddouri, S., Mägner, V., Ellouze, N., Amiri, H.: IFN/ENIT-database of handwritten Arabic words. In: Colloque International Francophone sur l’Ecrit et le Document (CIFED), Hammamet, Tunisia, October 2002
Google Scholar
Pitz, M., Wessel, F., Ney, H.: Improved MLLR speaker adaptation using confidence measures for conversational speech recognition. In: International Conference on Spoken Language Processing (ICSLP), Beijing, China (2000)
Google Scholar
Plahl, C., Hoffmeister, B., Hwang, M.-Y., Lu, D., Heigold, G., Lööf, J., Schlüter, R., Ney, H.: Recent improvements of the RWTH GALE Mandarin LVCSR system. In: Interspeech, Brisbane, Australia, September 2008, pp. 2426–2429 (2008)
Google Scholar
Povey, D.: Discriminative training for large vocabulary speech recognition. Ph.D. thesis, Cambridge, England (2004)
Google Scholar
Povey, D., Woodland, P.C.: Minimum phone error and I-smoothing for improved discriminative training. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, Orlando, FL, USA (2002)
Google Scholar
Povey, D., Kanevsky, D., Kingsbury, B., Ramabhadran, B., Saon, G., Visweswariah, K.: Boosted MMI for model and feature-space discriminative training. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, NV, USA, April 2008
Google Scholar
Prasad, R., Saleem, S., Kamali, M., Meermeier, R., Natarajan, P.: Improvements in hidden Markov model based Arabic OCR. In: International Conference on Pattern Recognition (ICPR), Tampa, FL, USA, December 2008
Google Scholar
Rybach, D., Hahn, S., Gollan, C., Schlüter, R., Ney, H.: Advances in Arabic broadcast news transcription at RWTH. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Kyoto, Japan, December 2007, pp. 449–454 (2007)
Chapter Google Scholar
Rybach, D., Gollan, C., Heigold, G., Hoffmeister, B., Lööf, J., Schlüter, R., Ney, H.: The RWTH Aachen university open source speech recognition system. In: Interspeech, Brighton, UK, September 2009, pp. 2111–2114 (2009)
Google Scholar
Schambach, M.-P., Rottland, J., Alary, T.: How to convert a Latin handwriting recognition system to Arabic. In: International Conference on Frontiers in Handwriting Recognition (ICFHR) (2008)
Google Scholar
Schenk, J., Rigoll, G.: Novel hybrid NN/HMM modelling techniques for on-line handwriting recognition. In: International Workshop on Frontiers in Handwriting Recognition (IWFHR), La Baule, France, October 2006
Google Scholar
Schlüter, R.: Investigations on discriminative training criteria. Ph.D. thesis, RWTH Aachen University, Aachen, Germany, September 2000
Google Scholar
Schlüter, R., Müller, B., Wessel, F., Ney, H.: Interdependence of language models and discriminative training. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Keystone, CO, December 1999, vol. 1, pp. 119–122 (1999)
Google Scholar
Slimane, F., Ingold, R., Kanoun, S., Alimi, M.A., Hennebert, J.: A new Arabic printed text image database and evaluation protocols. In: International Conference on Document Analysis and Recognition (ICDAR), Barcelona, Spain, July 2009, pp. 946–950 (2009)
Chapter Google Scholar
Stolcke, A.: SRILM—an extensible language modeling toolkit. In: International Conference on Spoken Language Processing (ICSLP), Denver, CO, USA, September 2002
Google Scholar
Valente, F., Vepa, J., Plahl, C., Gollan, C., Hermansky, H., Schlüter, R.: Hierarchical neural networks feature extraction for LVCSR system. In: Interspeech, Antwerp, Belgium, August 2007, pp. 42–45 (2007)
Google Scholar
von Ahn, L., Maurer, B., McMillen, C., Abraham, D., Blum, M.: reCAPTCHA: Human-based character recognition via web security measures. Science 321(5895), 1465–1468 (2008)
Article MathSciNet MATH Google Scholar
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X.A., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.4). Cambridge University Engineering Department, Cambridge (2006)
Google Scholar
Zhang, J., Jin, R., Yang, Y., Hauptmann, A.G.: Modified logistic regression: an approximation to SVM and its applications in large-scale text categorization. In: International Conference on Machine Learning (ICML), August 2003
Google Scholar

Download references

Acknowledgements

We would like to thank Christian Plahl, Stefan Hahn, Simon Wiesler, Patrick Doetsch, Robert Pyttel, Stephan Jonas, Jens Forster, Christian Gollan, and Thomas Deselaers for their support.

This work was partly realized as part of the Google Research Award “Robust Recognition of Machine Printed Text” and as part of the Quaero Programme, funded by OSEO, the French State agency for innovation.

Author information

Authors and Affiliations

Human Language Technology and Pattern Recognition, RWTH Aachen University, Ahornstr 55, 52056, Aachen, Germany
Philippe Dreuw, David Rybach, Georg Heigold & Hermann Ney

Authors

Philippe Dreuw
View author publications
You can also search for this author in PubMed Google Scholar
David Rybach
View author publications
You can also search for this author in PubMed Google Scholar
Georg Heigold
View author publications
You can also search for this author in PubMed Google Scholar
Hermann Ney
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Philippe Dreuw .

Editor information

Editors and Affiliations

Institute for Communications Technology, Braunschweig Technical University, Schleinitzstraße 22, Braunschweig, 38106, Niedersachsen, Germany
Volker Märgner
Institute for Communications Technology, Braunschweig Technical University, Schleinitzstraße 22, Braunschweig, 38106, Niedersachsen, Germany
Haikal El Abed

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dreuw, P., Rybach, D., Heigold, G., Ney, H. (2012). RWTH OCR: A Large Vocabulary Optical Character Recognition System for Arabic Scripts. In: Märgner, V., El Abed, H. (eds) Guide to OCR for Arabic Scripts. Springer, London. https://doi.org/10.1007/978-1-4471-4072-6_9

Download citation

DOI: https://doi.org/10.1007/978-1-4471-4072-6_9
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4071-9
Online ISBN: 978-1-4471-4072-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics