Skip to main content

A HMM-Based Arabic/Latin Handwritten/Printed Identification System

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 552))

Abstract

For document analysis and recognition systems, script identification is considered as an important preprocessing step in the design of multi-scripts OCR system. In this paper, we propose a novel HMM based identification system to recognize on only one level the writing type (handwritten or machine-printed) and the script nature (Arabic or Latin) of the input image. The proposed system is based on Histogram of Oriented Gradient (HOG) features which have demonstrated an interesting properties for script characterization. Experiments have been conducted on word and line images collected from public databases and show promising results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Baati, K., Kanoun, S., Benjlaiel, M.: Diffirenciation d’ecriture Arabe et Latine de natures Imprimee et Manuscrite par approche globale. In: Proceedings of Colloque International Francophone sur l’ecrit et le Document CIFED, pp. 313–324 (2010)

    Google Scholar 

  2. Kavallieratou, E., Stamatatos, S.: Discrimination of machine-printed from handwritten text using simple structural characteristics. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 1, pp. 437–440, 23–26 August 2004

    Google Scholar 

  3. Zhou, L., Lu, Y., Tan, C.L.: Bangla/English script identification based on analysis of connected component profiles. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 243–254. Springer, Heidelberg (2006). doi:10.1007/11669487_22

    Chapter  Google Scholar 

  4. Mozaffari, S., Bahar, P.: Farsi/Arabic handwritten from machine-printed words discrimination. In: 2012 International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 698–703, 18–20 September 2012

    Google Scholar 

  5. Pal, U., Chaudhuri, B.B.: Script line separation from Indian multi-script documents. In: Proceedings of the Fifth International Conference on Document Analysis and Recognition, ICDAR 1999, pp. 406–409, 20–22 September 1999

    Google Scholar 

  6. Faria da Silva, L., Conici, A., Sanchez, A.: Automatic discrimination between printed and handwrittentext in documents. In: 2009 XXII Brazilian Symposium on Computer Graphics and Image Processing (SIBGRAPI), pp. 261–267, 11–15 October 2009

    Google Scholar 

  7. Pal, U., Chaudhuri, B.B.: Machine-printed and hand-written text lines identification. Pattern Recogn. Lett. 22(3–4), 431–441 (2001)

    Article  MATH  Google Scholar 

  8. Benjelil, M., Kanoun, S., Alimi, A.M., Mullot, R.: Three decision levels strategy for Arabic and Latin texts differentiation in printed and handwritten natures. In: Ninth International Conference on Document Analysis and Recognition, ICDAR 2007, vol. 2, pp. 1103–1107, 23–26 September 2007

    Google Scholar 

  9. Pati, P.B., Ramakrishnan, A.G.: Word level multi-script identification. Pattern Recogn. Lett. 29(9), 1218–1229 (2008)

    Article  Google Scholar 

  10. Guo, J.K., Ma, M.Y.: Separating handwritten material from machine printed text using hidden Markov models. In: Proceedings of the Sixth International Conference on Document Analysis and Recognition, pp. 439–443 (2001)

    Google Scholar 

  11. Genzel, D., Popat, A.C., Teunen, R., Fujii, Y.: HMM-based script identification for OCR. In: Proceedings of the 4th International Workshop on Multilingual OCR, article 2. ACM, New York (2013)

    Google Scholar 

  12. El Abed, H., Margner, V.: The IFN/ENIT-database - a tool to develop Arabic handwriting recognition systems. In: 9th International Symposium on Signal Processing and Its Applications, ISSPA 2007, pp. 1–4, 12–15 February 2007

    Google Scholar 

  13. Slimane, F., Ingold, R., Kanoun, S., Alimi, A.M., Hennebert, J.: A new Arabic printed text image database and evaluation protocols. In: 10th International Conference on Document Analysis and Recognition, ICDAR 2009, pp. 946–950, 26–29 July 2009

    Google Scholar 

  14. Chtourou, I., Cheikh Rouhou, A., Kallem, F., Kanoun, S.: ALTID: Arabic/Latin text images database for recognition research. In: 13th International Conference on Document Analysis and Recognition, ICDAR 2015 (2015)

    Google Scholar 

  15. Mahmoud, S.A., Ahmad, I., Al-Khatib, W.G., Alshayeb, M., Parvez, M.T., Margner, V.: KHATT: an open Arabic offline handwritten text database. Pattern Recogn. PR 47(3), 1096–1112 (2014)

    Article  Google Scholar 

  16. Hamzah, L., Mahmoud, S.A., Sameh, A.: KAFD Arabic font database. Pattern Recogn. PR 47(6), 2231–2240 (2014)

    Article  Google Scholar 

  17. Grosicki, E., El Abed, H.: ICDAR 2009 handwriting recognition competition. In: 10th International Conference on Document Analysis and Recognition, ICDAR 2009, pp. 1398–1402, 26–29 July 2009

    Google Scholar 

  18. Rodriguez, J., Perronnin, F.: Local gradient histogram features for word spotting in unconstrained handwritten documents. In: Proceedings of International Conference on Frontiers in Handwriting Recognition (ICFHR 2008), pp. 7–12 (2008)

    Google Scholar 

  19. Ghosh, D., Dube, T., Shivaprasad, A.P.: Script recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 32, 2142–2161 (2009)

    Article  Google Scholar 

  20. Saidani, A., Kacem, A., Belaid, A.: Arabic/Latin and machine-printed/handwritten word discrimination using HOG-based shape descriptor. ELCVIA Electron. Lett. Comput. Vis. Image Anal. 14, 1–23 (2015)

    Article  Google Scholar 

  21. Smith, R.W.: Hybrid page layout analysis via tab-stop detection. In: 10th International Conference on Document Analysis and Recognition, pp. 241–245 (2009)

    Google Scholar 

  22. Smith, R.: An overview of the Tesseract OCR engine. In: Proceedings of the Ninth International Conference on Document Analysis and Recognition, ICDAR 2007, vol. 02, pp. 629–633 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed Cheikh Rouhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Cheikh Rouhou, A., Abdelhedi, Z., Kessentini, Y. (2017). A HMM-Based Arabic/Latin Handwritten/Printed Identification System. In: Abraham, A., Haqiq, A., Alimi, A., Mezzour, G., Rokbani, N., Muda, A. (eds) Proceedings of the 16th International Conference on Hybrid Intelligent Systems (HIS 2016). HIS 2016. Advances in Intelligent Systems and Computing, vol 552. Springer, Cham. https://doi.org/10.1007/978-3-319-52941-7_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-52941-7_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-52940-0

  • Online ISBN: 978-3-319-52941-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics