Skip to main content
Log in

A transformer-based approach for Arabic offline handwritten text recognition

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Handwriting recognition is a challenging and critical problem in the fields of pattern recognition and machine learning, with applications spanning a wide range of domains. In this paper, we focus on the specific issue of recognizing offline Arabic handwritten text. Existing approaches typically utilize a combination of convolutional neural networks for image feature extraction and recurrent neural networks for temporal modeling, with connectionist temporal classification used for text generation. However, these methods suffer from a lack of parallelization due to the sequential nature of recurrent neural networks. Furthermore, these models cannot account for linguistic rules, necessitating the use of an external language model in the post-processing stage to boost accuracy. To overcome these issues, we introduce two alternative architectures, namely the Transformer Transducer and the standard sequence-to-sequence Transformer, and compare their performance in terms of accuracy and speed. Our approach can model language dependencies and relies only on the attention mechanism, thereby making it more parallelizable and less complex. We employ pre-trained Transformers for both image understanding and language modeling. Our evaluation on the Arabic KHATT dataset demonstrates that our proposed method outperforms the current state-of-the-art approaches for recognizing offline Arabic handwritten text.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The datasets analyzed during the current study are from public resources.

References

  1. Ahmad, I., Fink, G.A.: Handwritten arabic text recognition using multi-stage sub-core-shape hmms. Int. J. Doc. Anal. Recognit. (IJDAR) 22(3), 329–349 (2019)

    Article  Google Scholar 

  2. Ahmad, R., Naz, S., Afzal, M.Z., Rashid, S.F.,Liwicki, M., Dengel, A.: Khatt: A deep learning benchmark on arabic script. In 2017 14th IAPR international conference on document analysis and recognition (ICDAR), Volume 7, pp. 10–14. IEEE (2017)

  3. Altwaijry, N., Al-Turaiki, I.: Arabic handwriting recognition system using convolutional neural network. Neural Comput. Appl. 33(7), 2249–2261 (2021)

    Article  Google Scholar 

  4. Atienza, R.: Vision transformer for fast and efficient scene text recognition. In International conference on document analysis and recognition, pp. 319–334. Springer (2021)

  5. Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., Oh, S.J., Lee, H.: What is wrong with scene text recognition model comparisons? dataset and model analysis. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 4715–4723 (2019)

  6. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

  7. Bao, H., Dong, L., Wei, F.: Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254 (2021)

  8. Bleeker, M., de Rijke, M.: Bidirectional scene text recognition with a single decoder. arXiv preprint arXiv:1912.03656 (2019)

  9. Bluche, T.: Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. Advances in neural information processing systems 29 (2016)

  10. Bluche, T., Messina, R.: Gated convolutional recurrent neural networks for multilingual handwriting recognition. In 2017 14th IAPR international conference on document analysis and recognition (ICDAR), Volume 1, pp. 646–651. IEEE (2017)

  11. Chowdhury, A., Vig, L.: An efficient end-to-end neural model for handwritten text recognition. arXiv preprint arXiv:1807.07965 (2018)

  12. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  13. Diaz, D.H., Qin, S., Ingle, R., Fujii, Y., Bissacco, A.: Rethinking text line recognition models. arXiv preprint arXiv:2104.07787 (2021)

  14. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S. et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  15. El-Khair, I.A.: 1.5 billion words arabic corpus. arXiv preprint arXiv:1611.04033 (2016)

  16. El-Sawy, A., Loey, M., El-Bakry, H.: Arabic handwritten characters recognition using convolutional neural network. WSEAS Trans. Comput. Res. 5(1), 11–19 (2017)

    Google Scholar 

  17. Gao, Y., Chen, Y., Wang, J., Lu, H.: Reading scene text with attention convolutional sequence modeling. arXiv preprint arXiv:1709.04303 (2017)

  18. Graves, A.: Sequence transduction with recurrent neural networks. arXiv preprint arXiv:1211.3711 (2012)

  19. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning, pp. 369–376 (2006)

  20. Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: non-recurrent handwritten text-line recognition. Pattern Recogn. 129, 108766 (2022)

    Article  Google Scholar 

  21. Lee, J., Park, S., Baek, J., Oh, S.J., Kim, S., Lee, H.: On recognizing texts of arbitrary shapes with 2d self-attention. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition workshops, pp. 546–547 (2020)

  22. Li, M., Lv, T., Cui, L., Lu, Y., Florencio, D., Zhang, C., Li, Z., Wei, F.: Trocr: Transformer-based optical character recognition with pre-trained models. arXiv preprint arXiv:2109.10282 (2021)

  23. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022 (2021)

  24. Mahmoud, S.A., Ahmad, I., Al-Khatib, W.G., Alshayeb, M., Parvez, M.T., Märgner, V., Fink, G.A.: Khatt: an open arabic offline handwritten text database. Pattern Recogn. 47(3), 1096–1112 (2014)

    Article  Google Scholar 

  25. Mahmoud, S.A., Ahmad, I., Alshayeb, M., Al-Khatib, W.G., Parvez, M.T., Fink, G.A., Märgner, V., El Abed, H.: Khatt: Arabic offline handwritten text database. In 2012 International conference on frontiers in handwriting recognition, pp. 449–454. IEEE (2012)

  26. Michael, J., Labahn, R., Grüning, T., Zöllner, J.: Evaluating sequence-to-sequence models for handwritten text recognition. In 2019 international conference on document analysis and recognition (ICDAR), pp. 1286–1293. IEEE (2019)

  27. Mostafa, A., Mohamed, O., Ashraf, A., Elbehery, A., Jamal, S., Khoriba, G., Ghoneim, A.S.: Ocformer: A transformer-based model for arabic handwritten text recognition. In 2021 International mobile, intelligent, and ubiquitous computing conference (MIUCC), pp. 182–186. IEEE (2021)

  28. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017)

  29. Pechwitz, M., Maddouri, S.S., Märgner, V., Ellouze, N., Amiri, H. et al.: Ifn/enit-database of handwritten arabic words. In Proc. of CIFED, Volume 2, pp. 127–136. Citeseer (2002)

  30. Pham, V., Bluche, T., Kermorvant, C., Louradour, J.: Dropout improves recurrent neural networks for handwriting recognition. In 2014 14th international conference on frontiers in handwriting recognition, pp. 285–290. IEEE (2014)

  31. Safaya, A., Abdullatif, M., Yuret, D.: Kuisail at semeval-2020 task 12: Bert-cnn for offensive speech identification in social media. In Proceedings of the fourteenth workshop on semantic evaluation, pp. 2054–2059 (2020)

  32. Sajid, U., Chow, M., Zhang, J., Kim, T., Wang, G.: Parallel scale-wise attention network for effective scene text recognition. In 2021 international joint conference on neural networks (IJCNN), pp. 1–8. IEEE (2021)

  33. Sheng, F., Chen, Z., Xu, B.: Nrtr: A no-recurrence sequence-to-sequence model for scene text recognition. In 2019 International conference on document analysis and recognition (ICDAR), pp. 781–786. IEEE (2019)

  34. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)

    Article  Google Scholar 

  35. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pp. 10347–10357. PMLR (2021)

  36. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)

  37. Voigtlaender, P., Doetsch, P., Ney, H.: Handwriting recognition with large multidimensional long short-term memory recurrent neural networks. In 2016 15th international conference on frontiers in handwriting recognition (ICFHR), pp. 228–233. IEEE (2016)

  38. Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M. et al.: Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pp. 38–45 (2020)

  39. Zhang, Q., Lu, H., Sak, H., Tripathi, A., McDermott, E., Koo, S., Kumar, S.: Transformer transducer: A streamable speech recognition model with transformer encoders and rnn-t loss. In ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 7829–7833. IEEE (2020)

Download references

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

Both authors made nearly identical contributions to every aspect of the manuscript.

Corresponding author

Correspondence to Bagher BabaAli.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Ethical approval

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Momeni, S., BabaAli, B. A transformer-based approach for Arabic offline handwritten text recognition. SIViP 18, 3053–3062 (2024). https://doi.org/10.1007/s11760-023-02970-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-023-02970-9

Keywords

Navigation