Abstract
The research on offline handwritten Arabic character recognition has received more and more attention in recent years, because of the increasing needs of Arabic document digitization. The variation in Arabic handwriting brings great difficulty in character segmentation and recognition, eg., the sub-parts (diacritics) of the Arabic character may shift away from the main part. In this paper, a new probabilistic segmentation model is proposed. First, a contour-based over-segmentation method is conducted, cutting the word image into graphemes. The graphemes are sorted into 3 queues, which are character main parts, sub-parts (diacritics) above or below main parts respectively. The confidence for each character is calculated by the probabilistic model, taking into account both of the recognizer output and the geometric confidence besides with logical constraint. Then, the global optimization is conducted to find optimal cutting path, taking weighted average of character confidences as objective function. Experiments on handwritten Arabic documents with various writing styles show the proposed method is effective.
This paper is supported by National Natural Science Foundation of China (project 60472002).
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Al-Yousefi, H., Udpa, S.S.: Recognition of Arabic characters. IEEE Transactions on Pattern Analysis and Machine Intelligence (1992)
Amin, A., Mari, J.F.: Machine recognition and correction of printed Arabic text. IEEE Transactions on Systems, Man and Cybernetics (1989)
Amin, A., Al-Sadoun, H.B.: A new segmentation technique of Arabic text. in Pattern Recognition. In: Conference B: Proceedings of 11th IAPR International Conference on Pattern Recognition Methodology and Systems, vol. II (1992)
Sari, T., Souici, L., Sellami, M.: Off-line handwritten Arabic character Segmentation algorithm: ACSA. In: Proceedings of Eighth International Workshop on Frontiers in Handwriting Recognition, pp. 452–457 (2002)
Olivier, C., et al.: Segmentation and Coding of Arabic Handwritten Words. In: 13th International Conference on Pattern Recognition, ICPR 1996 (1996)
Jin, J., et al.: Printed Arabic document recognition system. Vision Geometry XIII. In: Latecki, L.J., Mount, D.M., Wu, A.Y. (eds.) Proceedings of the SPIE, vol. 5676, pp. 48–55 (2004)
Cheung, A., Bennamoun, M., Bergmann, N.W.: A recognition-based Arabic optical character recognition system. In: IEEE International Conference on Systems, Man, and Cybernetics (1998)
Pechwitz, M., Maergner, V.: HMM based approach for handwritten Arabic word recognition using the IFN/ENIT - database. In: Proceedings of Seventh International Conference on Document Analysis and Recognition (2003)
Fakir, M., Hassani, M.M., Sodeyama, C.: Recognition of Arabic characters using Karhunen-Loeve transform anddynamic programming. In: Proceedings of 1999 IEEE International Conference on Systems, Man, and Cybernetics. IEEE SMC 1999 (1999)
Dehghan, M., et al.: Holistic handwritten word recognition using discrete HMM and self-organizing feature map. In: Proc. IEEE Int. Conf. Syst. Man Cybern. (2000)
Bortolozzi, F., et al.: Recent advances in handwriting recognition. In: Proceedings of the IWDA 2005 (2005)
Sarfraz, M., Nawaz, S.N., Al-Khuraidly, A.: Offline Arabic Text Recognition System. In: 2003 International Conference on Geometric Modeling and Graphics, GMAG 2003 (2003)
Najoua, B.A., Noureddine, E.: A robust approach for Arabic printed character segmentation. In: Proceedings of the Third International Conference on Document Analysis and Recognition (1995)
Motawa, D., Amin, A., Sabourin, R.: Segmentation of Arabic cursive script. In: Proceedings of the Fourth International Conference on Document Analysis and Recognition (1997)
Bushofa, B.M.F., Spann, M.: Segmentation of Arabic characters using their contour information. In: The 1997 13th International Conference on Digital Signal Processing, DSP. Part 2 (of 2) (1997)
Romeo-Pakker, K., Miled, H., Lecourtier, Y.: A new approach for Latin/Arabic character segmentation. In: Proceedings of the Third International Conference on Document Analysis and Recognition (1995)
Tolba, M.F., Shaddad, E.: On the automatic reading of printed Arabic characters. In: Proceedings of IEEE International Conference on Systems, Man and Cybernetics, pp. 496–498 (1990)
Maergner, V.: SARAT-a system for the recognition of Arabic printed text. In: Conference B: Proceedings of 11th IAPR International Conference on Pattern Recognition Methodology and Systems. Pattern Recognition, vol. II (1992)
Elgammal, A.M., Ismail, M.A.: A Graph-Based Segmentation and Feature-Extraction Framework for Arabic Text Recognition. In: Sixth InternationalConference on Document Analysis and Recognition, ICDAR 2001 (2001)
Lethelier, E., Leroux, M., Poste, M.G.L.: An automatic reading system for handwritten numeral amounts on French checks. In: Proceedings of the Third International Conference on Document Analysis and Recognition (1995)
Wang, H., et al.: New statistical method for machine-printed Arabic character recognition. In: Smith, E.H.B., Taghva, K. (eds.) Proceedings of SPIE. Document Recognition and Retrieval XII, vol. 5676 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xiu, P., Peng, L., Ding, X., Wang, H. (2006). Offline Handwritten Arabic Character Segmentation with Probabilistic Model. In: Bunke, H., Spitz, A.L. (eds) Document Analysis Systems VII. DAS 2006. Lecture Notes in Computer Science, vol 3872. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11669487_36
Download citation
DOI: https://doi.org/10.1007/11669487_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32140-8
Online ISBN: 978-3-540-32157-6
eBook Packages: Computer ScienceComputer Science (R0)