Are 2D-LSTM really dead for offline text recognition?

  • Bastien MoyssetEmail author
  • Ronaldo Messina
Special Issue Paper


There is a recent trend in handwritten text recognition with deep neural networks to replace 2D recurrent layers with 1D and in some cases even completely remove the recurrent layers, relying on simple feed-forward convolutional-only architectures. The most used type of recurrent layer is the long short-term memory (LSTM). The motivations to do so are many: there are few open-source implementations of 2D-LSTM, even fewer supporting GPU implementations (currently cuDNN only implements 1D-LSTM); 2D recurrences reduce the amount of computations that can be parallelized and thus possibly increase the training/inference time; recurrences create global dependencies with respect to the input, and sometimes this may not be desirable. Many recent competitions were won by systems that employed networks that use 2D-LSTM layers. Most previous works that compared 1D or pure feed-forward architectures to 2D recurrent models have done so on simple datasets or did not fully optimize the “baseline” 2D model compared to the challenger model, which was dully optimized. In this work, we aim at a fair comparison between 2D and competing models and also extensively evaluate them on more complex datasets that are more representative of challenging “real-world” data, compared to “academic” datasets that are more restricted in their complexity. We aim at determining when and why the 1D and 2D recurrent models have different results. We also compare the results with a language model to assess if linguistic constraints do level the performance of the different networks. Our results show that for challenging datasets, 2D-LSTM networks still seem to provide the highest performances and we propose a visualization strategy to explain it.


Text line recognition Neural network Recurrent 2D-LSTM 1D-LSTM Convolutional 



  1. 1.
    Bluche, T., Louradour, J., Knibbe, M., Moysset, B., Benzeghiba, M.F., Kermorvant, C.: The a2ia arabic handwritten text recognition system at the open hart2013 evaluation. In: 11th IAPR International Workshop on Document Analysis Systems (DAS), pp. 161–165. IEEE (2014)Google Scholar
  2. 2.
    Bluche, T., Messina, R.: Faster segmentation-free handwritten Chinese text recognition with character decompositions. In: International Conference on Frontiers in Handwriting Recognition (ICFHR) (2016)Google Scholar
  3. 3.
    Bluche, T., Messina, R.: Gated convolutional recurrent neural networks for multilingual handwriting recognition. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR),vol. 1, pp. 646–651. IEEE (2017)Google Scholar
  4. 4.
    Bluche, T., Moysset, B., Kermorvant, C.: Automatic line segmentation and ground-truth alignment of handwritten documents. In: International Conference on Frontiers of Handwriting Recognition (2014)Google Scholar
  5. 5.
    Borisyuk, F., Gordo, A., Sivakumar, V.: Rosetta: large scale system for text detection and recognition in images. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 71–79. ACM (2018)Google Scholar
  6. 6.
    Breuel, T.M.: High performance text recognition using a hybrid convolutional-lstm implementation. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 11–16. IEEE (2017)Google Scholar
  7. 7.
    Brunessaux, S., Giroux, P., Grilhères, B., Manta, M., Bodin, M., Choukri, K., Galibert, O., Kahn, J.: The maurdor project: improving automatic processing of digital documents. In: 11th IAPR International Workshop on Document Analysis Systems (DAS), pp. 349–354. IEEE (2014)Google Scholar
  8. 8.
    Bunke, H., Roth, M., Schukat-Talamazzini, E.G.: Off-line cursive handwriting recognition using hidden Markov models. Pattern Recognit. 28(9), 1399–1413 (1995)CrossRefGoogle Scholar
  9. 9.
    Espana-Boquera, S., Castro-Bleda, M.J., Gorbe-Moya, J., Zamora-Martinez, F.: Improving offline handwritten text recognition with hybrid HMM/ANN models. IEEE Trans. Pattern Anal. Mach. Intell. 33(4), 767–779 (2011)CrossRefGoogle Scholar
  10. 10.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)Google Scholar
  11. 11.
    Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: 23rd International Conference on Machine Learning, pp. 369–376. ACM (2006)Google Scholar
  12. 12.
    Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 545–552 (2009)Google Scholar
  13. 13.
    Grosicki, E., El-Abed, H.: ICDAR 2011: French handwriting recognition competition. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 1459–1463 (2011)Google Scholar
  14. 14.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  15. 15.
    Levenshtein, I.V.: Binary codes capable of correcting deletions, insertions, and reversals. In: Cybernetics and Control Theory (1966) (Russian Edition: Doklady Akademii Nauk SSSR, vol. 163, no. 4 (1965))Google Scholar
  16. 16.
    Liu, C.L., Yin, F., Wang, D.H., Wang, Q.F.: CASIA online and offline Chinese handwriting databases. In: ICDAR, pp. 37–41. IEEE (2011)Google Scholar
  17. 17.
    Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. IJDAR 5(1), 39–46 (2002)CrossRefzbMATHGoogle Scholar
  18. 18.
    Mohri, M.: Finite-state transducers in language and speech processing. Comput. Linguist. 23, 269–311 (1997)MathSciNetGoogle Scholar
  19. 19.
    Moysset, B., Bluche, T., Knibbe, M., Benzeghiba, M.F., Messina, R., Louradour, J., Kermorvant, C.: The A2iA multi-lingual text recognition system at the Maurdor evaluation. In: International Conference on Frontiers of Handwriting Recognition (2014)Google Scholar
  20. 20.
    Oparin, I., Kahn, J., Galibert, O.: First Maurdor 2013 evaluation campaign in scanned document image processing. In: International Conference on Acoustics, Speech, and Signal Processing (2014)Google Scholar
  21. 21.
    Pham, V., Bluche, T., Kermorvant, C., Louradour, J.: Dropout improves recurrent neural networks for handwriting recognition. In: 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 285–290. IEEE (2014)Google Scholar
  22. 22.
    Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: Workshop on Automatic Speech Recognition and Understanding (2011)Google Scholar
  23. 23.
    Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 67–72. IEEE (2017)Google Scholar
  24. 24.
    Sabir, E., Rawls, S., Natarajan, P.: Implicit language model in lstm for OCR. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 7, pp. 27–31. IEEE (2017)Google Scholar
  25. 25.
    Sanchez, J., Romero, V., Toselli, A., Vidal, E.: ICFHR2014 competition on handwritten text recognition on transcriptorium datasets (HTRtS). In: International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 181–186 (2014)Google Scholar
  26. 26.
    Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)CrossRefGoogle Scholar
  27. 27.
    Sánchez, J.A., Romero, V., Toselli, A.H., Vidal, E.: Icfhr2016 competition on handwritten text recognition on the read dataset. In: ICFHR, pp. 630–635. IEEE Computer Society (2016)Google Scholar
  28. 28.
    Sánchez, J.A., Romero, V., Toselli, A.H., Villegas, M., Vidal, E.: Icdar2017 competition on handwritten text recognition on the read dataset. In: ICDAR, pp. 1383–1388. IEEE (2017)Google Scholar
  29. 29.
    Sánchez, J.A., Toselli, A.H., Romero, V., Vidal, E.: ICDAR 2015 competition HTRTS: handwritten text recognition on the transcriptorium dataset. In: ICDAR, pp. 1166–1170. IEEE Computer Society (2015). [Relocated from Tunis, Tunisia]Google Scholar
  30. 30.
    Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  31. 31.
    Stolcke, A.: SRILM—An extensible language modeling toolkit. In: Proceedings International Conference on Spoken Language Processing, pp. 257–286 (2002)Google Scholar
  32. 32.
    Stollenga, M.F., Byeon, W., Liwicki, M., Schmidhuber, J.: Parallel multi-dimensional LSTM, with application to fast biomedical volumetric image segmentation. In: NIPS, pp. 2998–3006 (2015)Google Scholar
  33. 33.
    Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 4, 26–31 (2012)Google Scholar
  34. 34.
    Yin, F., Wang, Q.F., Zhang, X.Y., Liu, C.L.: ICDAR 2013 Chinese Handwriting Recognition Competition. In: 12th International Conference on Document Analysis and Recognition. ICDAR ’13, pp. 1464–1470. IEEE Computer Society, Washington (2013)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.A2iA SAParisFrance

Personalised recommendations