Abstract
For document analysis and recognition systems, script identification is considered as an important preprocessing step in the design of multi-scripts OCR system. In this paper, we propose a novel HMM based identification system to recognize on only one level the writing type (handwritten or machine-printed) and the script nature (Arabic or Latin) of the input image. The proposed system is based on Histogram of Oriented Gradient (HOG) features which have demonstrated an interesting properties for script characterization. Experiments have been conducted on word and line images collected from public databases and show promising results.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Baati, K., Kanoun, S., Benjlaiel, M.: Diffirenciation d’ecriture Arabe et Latine de natures Imprimee et Manuscrite par approche globale. In: Proceedings of Colloque International Francophone sur l’ecrit et le Document CIFED, pp. 313–324 (2010)
Kavallieratou, E., Stamatatos, S.: Discrimination of machine-printed from handwritten text using simple structural characteristics. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 1, pp. 437–440, 23–26 August 2004
Zhou, L., Lu, Y., Tan, C.L.: Bangla/English script identification based on analysis of connected component profiles. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 243–254. Springer, Heidelberg (2006). doi:10.1007/11669487_22
Mozaffari, S., Bahar, P.: Farsi/Arabic handwritten from machine-printed words discrimination. In: 2012 International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 698–703, 18–20 September 2012
Pal, U., Chaudhuri, B.B.: Script line separation from Indian multi-script documents. In: Proceedings of the Fifth International Conference on Document Analysis and Recognition, ICDAR 1999, pp. 406–409, 20–22 September 1999
Faria da Silva, L., Conici, A., Sanchez, A.: Automatic discrimination between printed and handwrittentext in documents. In: 2009 XXII Brazilian Symposium on Computer Graphics and Image Processing (SIBGRAPI), pp. 261–267, 11–15 October 2009
Pal, U., Chaudhuri, B.B.: Machine-printed and hand-written text lines identification. Pattern Recogn. Lett. 22(3–4), 431–441 (2001)
Benjelil, M., Kanoun, S., Alimi, A.M., Mullot, R.: Three decision levels strategy for Arabic and Latin texts differentiation in printed and handwritten natures. In: Ninth International Conference on Document Analysis and Recognition, ICDAR 2007, vol. 2, pp. 1103–1107, 23–26 September 2007
Pati, P.B., Ramakrishnan, A.G.: Word level multi-script identification. Pattern Recogn. Lett. 29(9), 1218–1229 (2008)
Guo, J.K., Ma, M.Y.: Separating handwritten material from machine printed text using hidden Markov models. In: Proceedings of the Sixth International Conference on Document Analysis and Recognition, pp. 439–443 (2001)
Genzel, D., Popat, A.C., Teunen, R., Fujii, Y.: HMM-based script identification for OCR. In: Proceedings of the 4th International Workshop on Multilingual OCR, article 2. ACM, New York (2013)
El Abed, H., Margner, V.: The IFN/ENIT-database - a tool to develop Arabic handwriting recognition systems. In: 9th International Symposium on Signal Processing and Its Applications, ISSPA 2007, pp. 1–4, 12–15 February 2007
Slimane, F., Ingold, R., Kanoun, S., Alimi, A.M., Hennebert, J.: A new Arabic printed text image database and evaluation protocols. In: 10th International Conference on Document Analysis and Recognition, ICDAR 2009, pp. 946–950, 26–29 July 2009
Chtourou, I., Cheikh Rouhou, A., Kallem, F., Kanoun, S.: ALTID: Arabic/Latin text images database for recognition research. In: 13th International Conference on Document Analysis and Recognition, ICDAR 2015 (2015)
Mahmoud, S.A., Ahmad, I., Al-Khatib, W.G., Alshayeb, M., Parvez, M.T., Margner, V.: KHATT: an open Arabic offline handwritten text database. Pattern Recogn. PR 47(3), 1096–1112 (2014)
Hamzah, L., Mahmoud, S.A., Sameh, A.: KAFD Arabic font database. Pattern Recogn. PR 47(6), 2231–2240 (2014)
Grosicki, E., El Abed, H.: ICDAR 2009 handwriting recognition competition. In: 10th International Conference on Document Analysis and Recognition, ICDAR 2009, pp. 1398–1402, 26–29 July 2009
Rodriguez, J., Perronnin, F.: Local gradient histogram features for word spotting in unconstrained handwritten documents. In: Proceedings of International Conference on Frontiers in Handwriting Recognition (ICFHR 2008), pp. 7–12 (2008)
Ghosh, D., Dube, T., Shivaprasad, A.P.: Script recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 32, 2142–2161 (2009)
Saidani, A., Kacem, A., Belaid, A.: Arabic/Latin and machine-printed/handwritten word discrimination using HOG-based shape descriptor. ELCVIA Electron. Lett. Comput. Vis. Image Anal. 14, 1–23 (2015)
Smith, R.W.: Hybrid page layout analysis via tab-stop detection. In: 10th International Conference on Document Analysis and Recognition, pp. 241–245 (2009)
Smith, R.: An overview of the Tesseract OCR engine. In: Proceedings of the Ninth International Conference on Document Analysis and Recognition, ICDAR 2007, vol. 02, pp. 629–633 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Cheikh Rouhou, A., Abdelhedi, Z., Kessentini, Y. (2017). A HMM-Based Arabic/Latin Handwritten/Printed Identification System. In: Abraham, A., Haqiq, A., Alimi, A., Mezzour, G., Rokbani, N., Muda, A. (eds) Proceedings of the 16th International Conference on Hybrid Intelligent Systems (HIS 2016). HIS 2016. Advances in Intelligent Systems and Computing, vol 552. Springer, Cham. https://doi.org/10.1007/978-3-319-52941-7_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-52941-7_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52940-0
Online ISBN: 978-3-319-52941-7
eBook Packages: EngineeringEngineering (R0)