Skip to main content

Text Line and Word Extraction of Arabic Handwritten Documents

  • Conference paper
  • First Online:
Innovations in Smart Cities Applications Edition 2 (SCA 2018)

Abstract

The documents of Arabic handwritten contain text lines and words. Words are often a succession of sub-words (characters, connected components) separated by spaces, in Arabic handwritten its spaces are divided into two types: the first type represents the spaces that separate two connected components of the same word (within-word), the second type are spaces that separate two connected components from two consecutive words(between-words). We detect the second type for word extracting. Word extraction based on the classification of spaces detected and extracts between-words spaces to segment the text into words. In this paper, we present a method for segmenting Arabic handwritten text into lines and words, to make our method of word extraction more optimal, we compute the threshold of spaces for each line, the threshold is not fixed in the document, each line is associated its classification threshold spaces. Before segmenting the text into words, it is necessary to segment it into text lines in order to apply our method to each line. To extract the lines, the preprocessing is applied to the text images in order to apply the proposed method for the line segmentation step. Our system is applied on the benchmarking datasets of the Arabic handwriting database for text recognition (AHDB) and the experimental results are very promising as we achieved a success word extraction rate of 87.9%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. AlKhateeb, J.H., Jiang, J.J., Ren, J., Ipson, S.: Interactive knowledge discovery for baseline estimation and word segmentation. Recent advances in technologies (2009)

    Google Scholar 

  2. Al-Dmour, A., Fraij, F.: Segmenting Arabic handwritten documents into text lines and words. Int. J. Adv. Comput. Technol. (IJACT) 6(3), 2014 (2014)

    Google Scholar 

  3. Al-Dmour, A., Abu Zitar, R.: Word extraction from Arabic handwritten documents based on statistical measures. Int. Rev. Comput. Software 11(5), 2016 (2016)

    Google Scholar 

  4. Al-Muallim, H., Yamaguchi, S.: A method of recognition of Arabic cursive handwriting. Pattern Anal. Mach. Intell. 9(1987), 715–722 (1987)

    Article  Google Scholar 

  5. Papavassiliou, V., Stafylakis, T., Katsouros, V., Carayannis, G.: Handwritten document image segmentation into text lines and words. Pattern Recogn. 43(1), 369–377 (2010)

    Article  Google Scholar 

  6. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9, 62–66 (1979)

    Article  Google Scholar 

  7. Al-ma’adeed, S., Elliman, D. . Higgins, C.A., Campus, J.: A data base for Arabic handwritten text recognition research. In: Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR’02) (2002)

    Google Scholar 

  8. Aouadi, N., Echi, AK.: Word extraction and recognition in Arabic handwritten text. Int. J. Comput. Inf. Sci. 12(1) (2016)

    Article  Google Scholar 

  9. Kumar, J., Abd-Almageed, W., Kang, L., Doermann, D.S.: Handwritten Arabic text line segmentation using affinity propagation. In: Proceeding(s) of DAS 10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 135–142 (2010)

    Google Scholar 

  10. Shi, Z., Setlur, S., Govindaraju, V., Setlur, S., Govindaraju V.: A steerable directional local profile technique for extraction of handwritten Arabic text lines. In: ICDAR, pp. 176–180 (2009)

    Google Scholar 

  11. Ouwayed, N., Belaıd, A.: Separation of overlapping and touching lines within handwritten Arabic documents. In: Proceeding(s) of the 13th International Conference on Computer Analysis of Images and Patterns, CAIP. 9, pp. 123–138 (2009)

    Chapter  Google Scholar 

  12. Khayyat, M., Lam, L., Suen, C.Y., Yin, F., Liu, C-L.: Arabic handwritten text line extraction by applying an adaptive mask to morphological dilation. In: Proceeding(s) of 10th IAPR International Workshop on Document Analysis Systems, pp. 100–104 (2012)

    Google Scholar 

  13. Dinges, L., Al-Hamadi, A., Elzobi, M.: A locale group based line segmentation approach for non uniform skewed and curved Arabic handwritings. In: 12th International Conference on Document Analysis and Recognition (ICDAR), IEEE (2013)

    Google Scholar 

  14. Yousif, I., Shaout, A.: Off-Line handwriting Arabic text recognition: a survey. Int. J. Adv. Res. Comput. Sci. Software Eng. 4(9) (2014)

    Google Scholar 

  15. Ouwayed, N., Belaid, A.: A general approach for multi-oriented text line extraction of handwritten document. Int. J. Doc. Anal. Recogn., Springer Verlag (2011)

    Google Scholar 

  16. Abdullah, S., AL-Nassiri, A., Salam, R.A.: Off-Line Arabic handwritten word segmentation using rotational invariant segments features (2008)

    Google Scholar 

  17. Elnagar, A., Bentrcia, R.: A recognition-based approach to segmenting Arabic handwritten text. J. Intell. Learn. Syst. Appl. 93–103 (2015)

    Article  Google Scholar 

  18. Lawgali, A.: A survey on arabic character recognition. Int. J. Signal Process. Image Process. Pattern Recogn. 8(2) 401–426 (2015)

    Article  Google Scholar 

  19. Lorigo, L., Govindaraju,V.: Off-line Arabic handwriting recognition: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 28(05) 712–724 (2006)

    Article  Google Scholar 

  20. Parvez, M.T., Mahmoud, S.A.: Offline Arabic handwritten text recognition: a survey. ACM Comput. Surv. 45(2) (2013)

    Article  Google Scholar 

  21. Boulid, Y., El Youssfi, E.M., A. SOUHAR. Reconnaissance de l’écriture manuscrite arabe en mode hors ligne (2016)

    Google Scholar 

  22. El Abed, H., Märgner, V.: The IFN/ENIT-database—a tool to develop Arabic handwriting recognition systems. In: 9th International Symposium on Signal Processing and Its Application (2007)

    Google Scholar 

  23. http://handwriting.qu.edu.qa/dataset/

  24. Menasri, F.,: Contributions à la reconnaissance de l’écriture arabe manuscrite, Thèse Université Paris Descartes (2008)

    Google Scholar 

  25. Ouchtati, S., Redjimi, M. ., Bedda, M.: Recognition of the Arabic handwritten words of the algerian departments. Int. J. Comput. Theory Eng. 6(2) (2014)

    Article  Google Scholar 

  26. Abuzaraida, M.A., Zeki, A.M., Zeki, A.M.: Online recognition of Arabic handwritten words system based on Alignments matching Algorithm. In: Proceedings of the International conference on computing, Mathematics and statistics, Springer Nature Singapore (2017)

    Google Scholar 

  27. Khémiri, A., KacemEchi, A., Belaid, A., Elloumi, M.: A system for off-line Arabic handwritten word recognition based on bayesian approach. In: 15th International Conference on Frontiers in Handwriting Recognition (2016)

    Google Scholar 

  28. Ebrahinpour, R., Amini, M., Sharifizadehi, F.: Farsi handwritten recognition using combining neural networks based on stacked generalization. Int. J. Electr. Eng. Inf. 3(2) 146–160 (2011)

    Article  Google Scholar 

  29. Nouar, F., Aissaoui, M.E., Seridi, H.: Approche globale pour la reconnaissance de mots arabes manuscrits par combinaison parallèle de classifieurs. In: Proceedings des Journées des Jeunes Chercheurs en Informatique (JCI) (2008)

    Google Scholar 

  30. Alkhoury, I.: Arabic handwritten word recognition based on Bernoulli mixture HMM, Master Thesis, University of Valencia (2010)

    Google Scholar 

  31. Mohamed, K.: Reconnaissance de formes appliquée à l’écriture Arabe manuscrite par des multiclassifieurs, thesis (2010)

    Google Scholar 

  32. Boukerma, H.: Combinaison de classifieurs flous pour la reconnaissance de l’écriture arabe manuscrite, Master Thesis, (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Asmae Lamsaf .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lamsaf, A., Aitkerroum, M., Boulaknadel, S., Fakhri, Y. (2019). Text Line and Word Extraction of Arabic Handwritten Documents. In: Ben Ahmed, M., Boudhir, A., Younes, A. (eds) Innovations in Smart Cities Applications Edition 2. SCA 2018. Lecture Notes in Intelligent Transportation and Infrastructure. Springer, Cham. https://doi.org/10.1007/978-3-030-11196-0_42

Download citation

Publish with us

Policies and ethics