Abstract
The documents of Arabic handwritten contain text lines and words. Words are often a succession of sub-words (characters, connected components) separated by spaces, in Arabic handwritten its spaces are divided into two types: the first type represents the spaces that separate two connected components of the same word (within-word), the second type are spaces that separate two connected components from two consecutive words(between-words). We detect the second type for word extracting. Word extraction based on the classification of spaces detected and extracts between-words spaces to segment the text into words. In this paper, we present a method for segmenting Arabic handwritten text into lines and words, to make our method of word extraction more optimal, we compute the threshold of spaces for each line, the threshold is not fixed in the document, each line is associated its classification threshold spaces. Before segmenting the text into words, it is necessary to segment it into text lines in order to apply our method to each line. To extract the lines, the preprocessing is applied to the text images in order to apply the proposed method for the line segmentation step. Our system is applied on the benchmarking datasets of the Arabic handwriting database for text recognition (AHDB) and the experimental results are very promising as we achieved a success word extraction rate of 87.9%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
AlKhateeb, J.H., Jiang, J.J., Ren, J., Ipson, S.: Interactive knowledge discovery for baseline estimation and word segmentation. Recent advances in technologies (2009)
Al-Dmour, A., Fraij, F.: Segmenting Arabic handwritten documents into text lines and words. Int. J. Adv. Comput. Technol. (IJACT) 6(3), 2014 (2014)
Al-Dmour, A., Abu Zitar, R.: Word extraction from Arabic handwritten documents based on statistical measures. Int. Rev. Comput. Software 11(5), 2016 (2016)
Al-Muallim, H., Yamaguchi, S.: A method of recognition of Arabic cursive handwriting. Pattern Anal. Mach. Intell. 9(1987), 715–722 (1987)
Papavassiliou, V., Stafylakis, T., Katsouros, V., Carayannis, G.: Handwritten document image segmentation into text lines and words. Pattern Recogn. 43(1), 369–377 (2010)
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9, 62–66 (1979)
Al-ma’adeed, S., Elliman, D. . Higgins, C.A., Campus, J.: A data base for Arabic handwritten text recognition research. In: Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR’02) (2002)
Aouadi, N., Echi, AK.: Word extraction and recognition in Arabic handwritten text. Int. J. Comput. Inf. Sci. 12(1) (2016)
Kumar, J., Abd-Almageed, W., Kang, L., Doermann, D.S.: Handwritten Arabic text line segmentation using affinity propagation. In: Proceeding(s) of DAS 10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 135–142 (2010)
Shi, Z., Setlur, S., Govindaraju, V., Setlur, S., Govindaraju V.: A steerable directional local profile technique for extraction of handwritten Arabic text lines. In: ICDAR, pp. 176–180 (2009)
Ouwayed, N., Belaıd, A.: Separation of overlapping and touching lines within handwritten Arabic documents. In: Proceeding(s) of the 13th International Conference on Computer Analysis of Images and Patterns, CAIP. 9, pp. 123–138 (2009)
Khayyat, M., Lam, L., Suen, C.Y., Yin, F., Liu, C-L.: Arabic handwritten text line extraction by applying an adaptive mask to morphological dilation. In: Proceeding(s) of 10th IAPR International Workshop on Document Analysis Systems, pp. 100–104 (2012)
Dinges, L., Al-Hamadi, A., Elzobi, M.: A locale group based line segmentation approach for non uniform skewed and curved Arabic handwritings. In: 12th International Conference on Document Analysis and Recognition (ICDAR), IEEE (2013)
Yousif, I., Shaout, A.: Off-Line handwriting Arabic text recognition: a survey. Int. J. Adv. Res. Comput. Sci. Software Eng. 4(9) (2014)
Ouwayed, N., Belaid, A.: A general approach for multi-oriented text line extraction of handwritten document. Int. J. Doc. Anal. Recogn., Springer Verlag (2011)
Abdullah, S., AL-Nassiri, A., Salam, R.A.: Off-Line Arabic handwritten word segmentation using rotational invariant segments features (2008)
Elnagar, A., Bentrcia, R.: A recognition-based approach to segmenting Arabic handwritten text. J. Intell. Learn. Syst. Appl. 93–103 (2015)
Lawgali, A.: A survey on arabic character recognition. Int. J. Signal Process. Image Process. Pattern Recogn. 8(2) 401–426 (2015)
Lorigo, L., Govindaraju,V.: Off-line Arabic handwriting recognition: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 28(05) 712–724 (2006)
Parvez, M.T., Mahmoud, S.A.: Offline Arabic handwritten text recognition: a survey. ACM Comput. Surv. 45(2) (2013)
Boulid, Y., El Youssfi, E.M., A. SOUHAR. Reconnaissance de l’écriture manuscrite arabe en mode hors ligne (2016)
El Abed, H., Märgner, V.: The IFN/ENIT-database—a tool to develop Arabic handwriting recognition systems. In: 9th International Symposium on Signal Processing and Its Application (2007)
Menasri, F.,: Contributions à la reconnaissance de l’écriture arabe manuscrite, Thèse Université Paris Descartes (2008)
Ouchtati, S., Redjimi, M. ., Bedda, M.: Recognition of the Arabic handwritten words of the algerian departments. Int. J. Comput. Theory Eng. 6(2) (2014)
Abuzaraida, M.A., Zeki, A.M., Zeki, A.M.: Online recognition of Arabic handwritten words system based on Alignments matching Algorithm. In: Proceedings of the International conference on computing, Mathematics and statistics, Springer Nature Singapore (2017)
Khémiri, A., KacemEchi, A., Belaid, A., Elloumi, M.: A system for off-line Arabic handwritten word recognition based on bayesian approach. In: 15th International Conference on Frontiers in Handwriting Recognition (2016)
Ebrahinpour, R., Amini, M., Sharifizadehi, F.: Farsi handwritten recognition using combining neural networks based on stacked generalization. Int. J. Electr. Eng. Inf. 3(2) 146–160 (2011)
Nouar, F., Aissaoui, M.E., Seridi, H.: Approche globale pour la reconnaissance de mots arabes manuscrits par combinaison parallèle de classifieurs. In: Proceedings des Journées des Jeunes Chercheurs en Informatique (JCI) (2008)
Alkhoury, I.: Arabic handwritten word recognition based on Bernoulli mixture HMM, Master Thesis, University of Valencia (2010)
Mohamed, K.: Reconnaissance de formes appliquée à l’écriture Arabe manuscrite par des multiclassifieurs, thesis (2010)
Boukerma, H.: Combinaison de classifieurs flous pour la reconnaissance de l’écriture arabe manuscrite, Master Thesis, (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Lamsaf, A., Aitkerroum, M., Boulaknadel, S., Fakhri, Y. (2019). Text Line and Word Extraction of Arabic Handwritten Documents. In: Ben Ahmed, M., Boudhir, A., Younes, A. (eds) Innovations in Smart Cities Applications Edition 2. SCA 2018. Lecture Notes in Intelligent Transportation and Infrastructure. Springer, Cham. https://doi.org/10.1007/978-3-030-11196-0_42
Download citation
DOI: https://doi.org/10.1007/978-3-030-11196-0_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11195-3
Online ISBN: 978-3-030-11196-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)