A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling for Handwriting Recognition

Bluche, Théodore; Ney, Hermann; Kermorvant, Christopher

doi:10.1007/978-3-319-11397-5_15

Théodore Bluche^7,8,
Hermann Ney^8,9 &
Christopher Kermorvant⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8791))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

1421 Accesses
22 Citations

Abstract

Long Short-Term Memory Recurrent Neural Networks are the current state-of-the-art in handwriting recognition. In speech recognition, Deep Multi-Layer Perceptrons (DeepMLPs) have become the standard acoustic model for Hidden Markov Models (HMMs). Although handwriting and speech recognition systems tend to include similar components and techniques, DeepMLPs are not used as optical model in unconstrained large vocabulary handwriting recognition. In this paper, we compare Bidirectional LSTM-RNNs with DeepMLPs for this task. We carried out experiments on two public databases of multi-line handwritten documents: Rimes and IAM. We show that the proposed hybrid systems yield performance comparable to the state-of-the-art, regardless of the type of features (hand-crafted or pixel values) and the neural network optical model (DeepMLP or RNN).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Augustin, E., Carré, M., Grosicki, E., Brodin, J.M., Geoffrois, E., Preteux, F.: RIMES evaluation campaign for handwritten mail processing. In: Proceedings of the Workshop on Frontiers in Handwriting Recognition (2006)
Google Scholar
Bianne, A.L., Menasri, F., Al-Hajj, R., Mokbel, C., Kermorvant, C., Likforman-Sulem, L.: Dynamic and contextual information in HMM modeling for handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 33(10), 2066–2080 (2011)
Article Google Scholar
Bloomberg, D.S., Kopec, G.E., Dasari, L.: Measuring document image skew and orientation. In: IS&T/SPIE’s Symposium on Electronic Imaging: Science & Technology, pp. 302–316. International Society for Optics and Photonics (1995)
Google Scholar
Bluche, T., Louradour, J., Knibbe, M., Moysset, B., Benzeghiba, M., Kermorvant, C.: The A2iA Arabic handwritten text recognition system at the OpenHaRT2013 evaluation. In: 11th IAPR Workshop on Document Analysis Systems (DAS2014), pp. 161–165 (2014)
Google Scholar
Bluche, T., Ney, H., Kermorvant, C.: Tandem HMM with convolutional neural network for handwritten word recognition. In: 38th International Conference on Acoustics Speech and Signal Processing (ICASSP2013), pp. 2390–2394 (2013)
Google Scholar
Buse, R., Liu, Z.Q., Caelli, T.: A structural and relational approach to handwritten word recognition. IEEE Trans. Syst. Man Cybern. 27(5), 847–861 (1997)
Article Google Scholar
Ciresan, D.C., Meier, U., Gambardella, L.M., Schmidhuber, J.: Deep, big, simple neural nets for handwritten digit recognition. Neural computation 22(12), 3207–3220 (2010)
Article Google Scholar
Cireşan, D.C., Meier, U., Gambardella, L.M., Schmidhuber, J.: Deep big multilayer perceptrons for digit recognition. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, 2nd edn, pp. 581–598. Springer, Heidelberg (2012)
Google Scholar
Deng, L., Li, J., Huang, J.T., Yao, K., Yu, D., Seide, F., Seltzer, M., Zweig, G., He, X., Williams, J., et al.: Recent advances in deep learning for speech research at microsoft. In: 38th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2013), pp. 8604–8608. IEEE (2013)
Google Scholar
Dreuw, P., Doetsch, P., Plahl, C., Ney, H.: Hierarchical hybrid MLP/HMM or rather MLP features for a discriminatively trained Gaussian HMM: a comparison for offline handwriting recognition. In: 18th IEEE International Conference on Image Processing (ICIP2011), pp. 3541–3544. IEEE (2011)
Google Scholar
Espana-Boquera, S., Castro-Bleda, M.J., Gorbe-Moya, J., Zamora-Martinez, F.: Improving offline handwritten text recognition with hybrid HMM/ANN models. IEEE Trans. Pattern Anal. Mach. Intell. 33(4), 767–779 (2011)
Article Google Scholar
Fiscus, J.G.: A post-processing system to yield reduced word error rates: recognizer output voting error reduction (rover). In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU1997), pp. 347–354. IEEE (1997)
Google Scholar
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376. ACM (2006)
Google Scholar
Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. In: NIPS, pp. 545–552 (2008)
Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Article MATH MathSciNet Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Johansson, S.: The LOB corpus of British english texts: presentation and comments. ALLC J. 1(1), 25–36 (1980)
Google Scholar
Kingsbury, B.: Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2009), pp. 3761–3764. IEEE (2009)
Google Scholar
Kozielski, M., Doetsch, P., Ney, H.: Improvements in RWTH’s system for off-line handwriting recognition. In: International Conference on Document Analysis and Recognition (ICDAR2013), pp. 935–939 (2013)
Google Scholar
Le Cun, Y., Bottou, L., Bengio, Y.: Reading checks with multilayer graph transformer networks. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP1997), vol. 1, pp. 151–154. IEEE (1997)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Marti, U.V., Bunke, H.: The IAM-database: an english sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002)
Article MATH Google Scholar
Menasri, F., Louradour, J., Bianne-Bernard, A.L., Kermorvant, C.: The A2iA French handwriting recognition system at the Rimes-ICDAR2011 competition. In: IS&T/SPIE Electronic Imaging, pp. 82970–82970. International Society for Optics and Photonics (2012)
Google Scholar
Messina, R., Kermorvant, C.: Surgenerative finite state transducer n-gram for out-of-vocabulary word recognition. In: 11th IAPR Workshop on Document Analysis Systems (DAS2014), pp. 212–216 (2014)
Google Scholar
Mohamed, A., Hinton, G., Penn, G.: Understanding how deep belief networks perform acoustic modelling. In: 37th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2012), pp. 4273–4276. IEEE (2012)
Google Scholar
Pham, V., Bluche, T., Kermorvant, C., Louradour, J.: Dropout improves recurrent neural networks for handwriting recognition. In: 14th International Conference on Frontiers in Handwriting Recognition (ICFHR2014) (2014)
Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., et al.: The Kaldi speech recognition toolkit. In: Workshop on Automatic Speech Recognition and Understanding (ASRU2011), pp. 1–4 (2011)
Google Scholar
Sainath, T.N., Mohamed, A., Kingsbury, B., Ramabhadran, B.: Deep convolutional neural networks for LVCSR. In: 38th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2013), pp. 8614–8618. IEEE (2013)
Google Scholar
Su, H., Li, G., Yu, D., Seide, F.: Error back propagation for sequence training of context-dependent deep networks for conversational speech transcription. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2013), pp. 6664–6668 (2013)
Google Scholar
Thomas, S., Chatelain, C., Paquet, T., Heutte, L.: Un modèle neuro markovien profond pour l’extraction de séquences dans des documents manuscrits. Doc. Numérique 16(2), 49–68 (2013)
Article Google Scholar
Tong, A., Przybocki, M., Maergner, V., El Abed, H.: NIST 2013 Open Handwriting Recognition and Translation (OpenHaRT’13) evaluation. In: 11th IAPR Workshop on Document Analysis Systems (DAS2014), pp. 81–85 (2014)
Google Scholar
Veselý, K., Ghoshal, A., Burget, L., Povey, D.: Sequence-discriminative training of deep neural networks. In: 14th Annual Conference of the International Speech Communication Association (INTERSPEECH2013), pp. 2345–2349 (2013)
Google Scholar
Vinciarelli, A., Luettin, J.: A new normalisation technique for cursive handwritten words. Pattern Recogn. Lett. 22, 1043–1050 (2001)
Article MATH Google Scholar

Download references

Acknowledgments

The authors would like to thank Michal Kozielsky and his colleagues from RWTH for providing the language model used in IAM experiments. This work was partly achieved as part of the Quaero Program, funded by OSEO, French State agency for innovation and was supported by the French Research Agency under the contract Cognilego ANR 2010-CORD-013.

Author information

Authors and Affiliations

A2iA SA, Paris, France
Théodore Bluche & Christopher Kermorvant
Spoken Language Processing Group, LIMSI CNRS, Orsay, France
Théodore Bluche & Hermann Ney
Human Language Technology and Pattern Recognition, RWTH Aachen University, Aachen, Germany
Hermann Ney

Authors

Théodore Bluche
View author publications
You can also search for this author in PubMed Google Scholar
Hermann Ney
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Kermorvant
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Théodore Bluche .

Editor information

Editors and Affiliations

University Joseph Fourier, Grenoble, France
Laurent Besacier
Rovira i Virgili University, Tarragona, Spain
Adrian-Horia Dediu
Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bluche, T., Ney, H., Kermorvant, C. (2014). A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling for Handwriting Recognition. In: Besacier, L., Dediu, AH., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2014. Lecture Notes in Computer Science(), vol 8791. Springer, Cham. https://doi.org/10.1007/978-3-319-11397-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-11397-5_15
Published: 03 September 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11396-8
Online ISBN: 978-3-319-11397-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics