Abstract
We present a novel large vocabulary OCR system, which implements a confidence- and margin-based discriminative training approach for model adaptation of an HMM-based recognition system to handle multiple fonts, different handwriting styles, and their variations. Most current HMM approaches are HTK-based systems which are maximum likelihood (ML) trained and which try to adapt their models to different writing styles using writer adaptive training, unsupervised clustering, or additional writer-specific data. Here, discriminative training based on the maximum mutual information (MMI) and minimum phone error (MPE) criteria are used instead. For model adaptation during decoding, an unsupervised confidence-based discriminative training within a two-pass decoding process is proposed. Additionally, we use neural network-based features extracted by a hierarchical multi-layer perceptron (MLP) network either in a hybrid MLP/HMM approach or to discriminatively retrain a Gaussian HMM system in a tandem approach. The proposed framework and methods are evaluated for closed-vocabulary isolated handwritten word recognition on the IFN/ENIT-database Arabic handwriting database, where the word error rate is decreased by more than 50 % relative to an ML trained baseline system. Preliminary results for large vocabulary Arabic machine-printed text recognition tasks are presented on a novel publicly available newspaper database.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
References
AbdelRaouf, A., Higgins, C., Pridmore, T., Khalil, M.: Building a multi-modal Arabic corpus (MMAC). Int. J. Doc. Anal. Recognit. 13, 285–302 (2010). doi:10.1007/s10032-010-0128-2
Al-Hashim, A.G., Mahmoud, S.A.: Printed Arabic text database (PATDB) for research and benchmarking. In: Proceedings of the 9th WSEAS International Conference on Applications of Computer Engineering, ACE’10, pp. 62–68, Stevens Point, Wisconsin, USA. World Scientific and Engineering Academy and Society (WSEAS), Stevens Point (2010)
Anastasakos, T., Balakrishnan, S.V.: The use of confidence measures in unsupervised adaptation of speech recognizers. In: International Conference on Spoken Language Processing (ICSLP), Sydney, Australia (1998)
Bazzi, I., Schwartz, R., Makhoul, J.: An omnifont open-vocabulary OCR system for English and Arabic. IEEE Trans. Pattern Anal. Mach. Intell. 21(6), 495–504 (1999)
Bertolami, R., Bunke, H.: Hidden Markov model-based ensemble methods for offline handwritten text line recognition. Pattern Recognit. 41(11), 3452–3460 (2008)
Biem, A.: Minimum classification error training for online handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28(7), 1041–1051 (2006)
Bourland, H., Morgan, N.: Connectionist speech recognition: A hybrid approach. In: Series in Engineering and Computer Science, vol. 247. Kluwer Academic, Dordrecht (1994)
Chen, S.S., Gopalakrishnan, P.S.: Speaker, environment and channel change detection and clustering via the Bayesian information criterion. In: DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, Virginia, USA, February 1998, pp. 127–132 (1998)
Davidson, R., Hopely, R.: Arabic and Persian OCR training and test data sets. In: Symp. on Document Image Understanding Technology, 30 April–2 May, 1997
Do, T.-M.-T., Artières, T.: Maximum margin training of Gaussian HMMs for handwriting recognition. In: International Conference on Document Analysis and Recognition (ICDAR), Barcelona, Spain, July 2009, pp. 976–980 (2009)
Dreuw, P.: Probabilistic sequence models for image sequence processing and recognition. Ph.D. thesis, RWTH Aachen University, Aachen, Germany, December 2011
Dreuw, P., Rybach, D., Deselaers, T., Zahedi, M., Ney, H.: Speech recognition techniques for a sign language recognition system. In: Interspeech, Antwerp, Belgium, August 2007, pp. 2513–2516 (2007)
Dreuw, P., Jonas, S., Ney, H.: White-space models for offline Arabic handwriting recognition. In: International Conference on Pattern Recognition (ICPR), Tampa, Florida, USA, December 2008
Dreuw, P., Heigold, G., Ney, H.: Confidence-based discriminative training for model adaptation in offline Arabic handwriting recognition. In: International Conference on Document Analysis and Recognition (ICDAR), Barcelona, Spain, July 2009, pp. 596–600 (2009)
Dreuw, P., Rybach, D., Gollan, C., Ney, H.: Writer adaptive training and writing variant model refinement for offline Arabic handwriting recognition. In: International Conference on Document Analysis and Recognition (ICDAR), Barcelona, Spain, July 2009
Dreuw, P., Heigold, G., Ney, H.: Confidence and margin-based MMI/MPE discriminative training for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 14(3), 273–288 (2011)
Dreuw, P., Doetsch, P., Plahl, C., Ney, H.: Hierarchical hybrid MLP/HMM or rather MLP features for a discriminatively trained Gaussian HMM: A comparison for offline handwriting recognition. In: IEEE International Conference on Image Processing, Brussels, Belgium, September 2011
El Abed, H., Märgner, V.: Improvement of Arabic handwriting recognition systems: combination and/or reject? In: Document Recognition and Retrieval XVI. Proc. SPIE, vol. 7247, San Jose, CA, USA, January 2009
El Abed, H., Märgner, V.: ICDAR 2009—Arabic handwriting recognition competition. Int. J. Doc. Anal. Recognit. 14(1), 3–13 (2010)
Espana-Boquera, S., Castro-Bleda, M., Gorbe-Moya, J., Zamora-Martinez, F.: Improving offline handwritten text recognition with hybrid HMM/ANN models. IEEE Trans. Pattern Anal. Mach. Intell. 33(4), 767–779 (2011)
Fink, G.A., Plötz, T.: Unsupervised estimation of writing style models for improved unconstrained off-line handwriting recognition. In: International Workshop on Frontiers in Handwriting Recognition (IWFHR), La Baule, France, October 2006
Gales, M.J.F.: Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)
Gollan, C., Bacchiani, M.: Confidence scores for acoustic model adaptation. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, NV, USA, April 2008, pp. 4289–4292 (2008)
Gollan, C., Ney, H.: Towards automatic learning in LVCSR: rapid development of a Persian broadcast transcription system. In: Interspeech, Brisbane, Australia, September 2008, pp. 1441–1444 (2008)
Graves, A., Liwicki, M., Fernandez, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855–868 (2009)
Heigold, G.: A log-Linear discriminative modeling framework for speech recognition. Ph.D. thesis, RWTH Aachen University, Aachen, Germany, June 2010
Heigold, G., Deselaers, T., Schlüter, R., Ney, H.: Modified MMI/MPE: a direct evaluation of the margin in speech recognition. In: International Conference on Machine Learning (ICML), Helsinki, Finland, July 2008, pp. 384–391 (2008)
Heigold, G., Schlüter, R., Ney, H.: Modified MPE/MMI in a transducer-based framework. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan, April 2009, pp. 3749–3752 (2009)
Heigold, G., Dreuw, P., Hahn, S., Schlüter, R., Ney, H.: Margin-based discriminative training for string recognition. IEEE J. Sel. Top. Signal Process. 4(6), 917–925 (2010)
Heigold, G., Wiesler, S., Nussbaum, M., Lehnen, P., Schlüter, R., Ney, H.: Discriminative HMMs, log–linear models, and CRFs: what is the difference? In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Dallas, Texas, USA, March 2010, pp. 5546–5549 (2010)
Hermansky, H., Sharma, S.: Traps—classifiers of temporal patterns. In: International Conference on Spoken Language Processing (ICSLP) (1998)
Jacobs, C., Simard, P.Y., Viola, P., Rinker, J.: Text recognition of low-resolution document images. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 695–699 (2005)
Jebara, T.: Discriminative, generative, and imitative learning. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA, USA (2002)
Juan, A., Toselli, A.H., Domnech, J., Gonzlez, J., Salvador, I., Vidal, E., Casacuberta, F.: Integrated handwriting recognition and interpretation via finite-state models. Int. J. Pattern Recognit. Artif. Intell. 2004, 519–539 (2001)
Kae, A., Learned-Miller, E.: Learning on the fly: font-free approaches to difficult OCR problems. In: International Conference on Document Analysis and Recognition (ICDAR), Barcelona, Spain, July 2009, pp. 571–575 (2009)
Kanthak, S., Schütz, K., Ney, H.: Using SIMD instructions for fast likelihood calculation in LVCSR. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Istanbul, Turkey, June 2000, pp. 1531–1534 (2000)
Kemp, T., Schaaf, T.: Estimating confidence using word lattices. In: European Conference on Speech Communication and Technology, Rhodes, Greece (1997)
Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput. Speech Lang. 9(2), 171–185 (1995)
Leggetter, C.J., Woodland, P.C.: Flexible speaker adaptation using maximum likelihood linear regression. In: ARPA Spoken Language Technology Workshop, Austin, TX, USA, January 1995, pp. 104–109 (1995)
Lööf, J., Gollan, C., Hahn, S., Heigold, G., Hoffmeister, B., Plahl, C., Rybach, D., Schlüter, R., Ney, H.: The RWTH 2007 TC-STAR evaluation system for European English and Spanish. In: Interspeech, Antwerp, Belgium, August 2007, pp. 2145–2148 (2007)
Lööf, J., Schlüter, R., Ney, H.: Efficient estimation of speaker-specific projecting feature transforms. In: International Conference on Spoken Language Processing (ICSLP), Antwerp, Belgium, August 2007, pp. 1557–1560 (2007)
Lorigo, L.M., Govindaraju, V.: Offline Arabic handwriting recognition: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 28(85), 712–724 (2006)
Lu, Y.: Machine printed character segmentation—an overview. Pattern Recognit. 28(1), 67–80 (1995)
Lu, Z.A., Bazzi, I., Kornai, A., Makhoul, J., Natarajan, P.S., Schwartz, R.: A robust language-independent OCR system. In: AIPR Workshop: Advances in Computer-Assisted Recognition. Proc. SPIE, vol. 3584, pp. 96–104 (1998)
Märgner, V., El Abed, H.: ICDAR 2007 Arabic handwriting recognition competition. In: International Conference on Document Analysis and Recognition (ICDAR), September 2007, vol. 2, pp. 1274–1278 (2007)
Märgner, V., El Abed, H.: ICDAR 2009 Arabic handwriting recognition competition. In: International Conference on Document Analysis and Recognition (ICDAR), Barcelona, Spain, July 2009, pp. 1383–1387 (2009)
Märgner, V., El Abed, H.: ICFHR 2010 Arabic handwriting recognition competition. In: International Conference on Frontiers in Handwriting Recognition (ICFHR), Kalkota, India, November 2010
Märgner, V., Pechwitz, M., Abed, H.E.: ICDAR 2005 Arabic handwriting recognition competition. In: International Conference on Document Analysis and Recognition (ICDAR), Seoul, Korea, August 2005, vol. 1, pp. 70–74 (2005)
Natarajan, P.: Portable language-independent adaptive translation from OCR, final report (phase 1). Technical report, BBN Technologies, June 2009
Natarajan, P., Saleem, S., Prasad, R., MacRostie, E., Subramanian, K.: Multi-lingual offline handwriting recognition using hidden Markov models: a script-independent approach. In: Arabic and Chinese Handwriting Recognition. LNCS, vol. 4768, pp. 231–250. Springer, Berlin (2008)
Ney, H., Ortmanns, S.: Progress in dynamic programming search for LVCSR. Proc. IEEE 88(8), 1224–1240 (2000)
NIST 2010 open handwriting recognition and translation evaluation plan, version 2.8, February 2010
Nopsuwanchai, R., Povey, D.: Discriminative training for HMM-based offline handwritten character recognition. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 114–118 (2003)
Nopsuwanchai, R., Biem, A., Clocksin, W.F.: Maximization of mutual information for offline Thai handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28(8), 1347–1351 (2006)
Nußbaum-Thom, M., Wiesler, S., Sundermeyer, M., Plahl, C., Hahn, S., Schlüter, R., Ney, H.: The RWTH 2009 Quaero ASR evaluation system for English and German. In: Interspeech, Makuhari, Japan, September 2010
Olive, J.: Multilingual automatic document classification analysis and translation (MADCAT). Proposer Information Pamphlet SOL BAA 07-38, DARPA/IPTO (2007)
Ortmanns, S., Ney, H.: Look-ahead techniques for fast beam search. Comput. Speech Lang. 14(1), 15–32 (2000)
Ortmanns, S., Ney, H., Aubert, X.: A word graph algorithm for large vocabulary continuous speech recognition. Comput. Speech Lang. 11(1), 43–72 (1997)
Padmanabhan, M., Saon, G., Zweig, G.: Lattice-based unsupervised MLLR for speaker adaptation. In: ISCA ITRW Automatic Speech Recognition: Challenges for the Millennium, Paris, France 2000
Pastor, A.G., Khoury, I., Juan, A.: Windowed Bernoulli mixture HMMs for Arabic handwritten word recognition. In: International Conference on Frontiers in Handwriting Recognition (ICFHR), Kalkota, India, November 2010
Pechwitz, M., Snoussi Maddouri, S., Mägner, V., Ellouze, N., Amiri, H.: IFN/ENIT-database of handwritten Arabic words. In: Colloque International Francophone sur l’Ecrit et le Document (CIFED), Hammamet, Tunisia, October 2002
Pitz, M., Wessel, F., Ney, H.: Improved MLLR speaker adaptation using confidence measures for conversational speech recognition. In: International Conference on Spoken Language Processing (ICSLP), Beijing, China (2000)
Plahl, C., Hoffmeister, B., Hwang, M.-Y., Lu, D., Heigold, G., Lööf, J., Schlüter, R., Ney, H.: Recent improvements of the RWTH GALE Mandarin LVCSR system. In: Interspeech, Brisbane, Australia, September 2008, pp. 2426–2429 (2008)
Povey, D.: Discriminative training for large vocabulary speech recognition. Ph.D. thesis, Cambridge, England (2004)
Povey, D., Woodland, P.C.: Minimum phone error and I-smoothing for improved discriminative training. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, Orlando, FL, USA (2002)
Povey, D., Kanevsky, D., Kingsbury, B., Ramabhadran, B., Saon, G., Visweswariah, K.: Boosted MMI for model and feature-space discriminative training. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, NV, USA, April 2008
Prasad, R., Saleem, S., Kamali, M., Meermeier, R., Natarajan, P.: Improvements in hidden Markov model based Arabic OCR. In: International Conference on Pattern Recognition (ICPR), Tampa, FL, USA, December 2008
Rybach, D., Hahn, S., Gollan, C., Schlüter, R., Ney, H.: Advances in Arabic broadcast news transcription at RWTH. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Kyoto, Japan, December 2007, pp. 449–454 (2007)
Rybach, D., Gollan, C., Heigold, G., Hoffmeister, B., Lööf, J., Schlüter, R., Ney, H.: The RWTH Aachen university open source speech recognition system. In: Interspeech, Brighton, UK, September 2009, pp. 2111–2114 (2009)
Schambach, M.-P., Rottland, J., Alary, T.: How to convert a Latin handwriting recognition system to Arabic. In: International Conference on Frontiers in Handwriting Recognition (ICFHR) (2008)
Schenk, J., Rigoll, G.: Novel hybrid NN/HMM modelling techniques for on-line handwriting recognition. In: International Workshop on Frontiers in Handwriting Recognition (IWFHR), La Baule, France, October 2006
Schlüter, R.: Investigations on discriminative training criteria. Ph.D. thesis, RWTH Aachen University, Aachen, Germany, September 2000
Schlüter, R., Müller, B., Wessel, F., Ney, H.: Interdependence of language models and discriminative training. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Keystone, CO, December 1999, vol. 1, pp. 119–122 (1999)
Slimane, F., Ingold, R., Kanoun, S., Alimi, M.A., Hennebert, J.: A new Arabic printed text image database and evaluation protocols. In: International Conference on Document Analysis and Recognition (ICDAR), Barcelona, Spain, July 2009, pp. 946–950 (2009)
Stolcke, A.: SRILM—an extensible language modeling toolkit. In: International Conference on Spoken Language Processing (ICSLP), Denver, CO, USA, September 2002
Valente, F., Vepa, J., Plahl, C., Gollan, C., Hermansky, H., Schlüter, R.: Hierarchical neural networks feature extraction for LVCSR system. In: Interspeech, Antwerp, Belgium, August 2007, pp. 42–45 (2007)
von Ahn, L., Maurer, B., McMillen, C., Abraham, D., Blum, M.: reCAPTCHA: Human-based character recognition via web security measures. Science 321(5895), 1465–1468 (2008)
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X.A., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.4). Cambridge University Engineering Department, Cambridge (2006)
Zhang, J., Jin, R., Yang, Y., Hauptmann, A.G.: Modified logistic regression: an approximation to SVM and its applications in large-scale text categorization. In: International Conference on Machine Learning (ICML), August 2003
Acknowledgements
We would like to thank Christian Plahl, Stefan Hahn, Simon Wiesler, Patrick Doetsch, Robert Pyttel, Stephan Jonas, Jens Forster, Christian Gollan, and Thomas Deselaers for their support.
This work was partly realized as part of the Google Research Award “Robust Recognition of Machine Printed Text” and as part of the Quaero Programme, funded by OSEO, the French State agency for innovation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag London
About this chapter
Cite this chapter
Dreuw, P., Rybach, D., Heigold, G., Ney, H. (2012). RWTH OCR: A Large Vocabulary Optical Character Recognition System for Arabic Scripts. In: Märgner, V., El Abed, H. (eds) Guide to OCR for Arabic Scripts. Springer, London. https://doi.org/10.1007/978-1-4471-4072-6_9
Download citation
DOI: https://doi.org/10.1007/978-1-4471-4072-6_9
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4071-9
Online ISBN: 978-1-4471-4072-6
eBook Packages: Computer ScienceComputer Science (R0)