Database for Arabic Printed Text Recognition Research

  • Faten Kallel Jaiem
  • Slim Kanoun
  • Maher Khemakhem
  • Haikal El Abed
  • Jihain Kardoun
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8156)


This paper presents a real database for the Arabic printed text recognition, APTID / MF (Arabic Printed Text Image Database / Multi-Font).This database can be used to evaluate the system that recognizes Arabic printed texts with an open vocabulary. APTID / MF may be also used for research in word segmentation and font identification. APTID / MF is obtained from 387 pages of Arabic printed documents scanned with grayscale format and 300 dpi resolutions. From this documents, 1,845 text-blocks have been extracted. In addition ground truth file is provided for each texts-block. APTID / MF also includes an Arabic printed character image dataset made up of 27,402 samples. The database is freely available to interested researchers.


Arabic printed text APTID / MF database Open vocabulary Ground truth 


  1. 1.
    Amara, N.B.: On the Problematic and Orientations in Recognition of the Arabic Writing. In: CIFED 2002, pp. 1–10 (2002)Google Scholar
  2. 2.
    Kanoun, S., Alimi, A.M., Lecourtier, Y.: Affixal Approach for Arabic Decom-posable Vocabulary Recognition: A Validation on Printed Word in Only One Font. In: ICDAR 2005, pp. 1025–1029 (2005)Google Scholar
  3. 3.
    Pechwitz, M., Maddouri, S., Margner, V., Ellouze, N., Amiri, H.: IFN/ENIT-Database of Handwritten Arabic Words. In: CIFED 2002, pp. 127–136 (2002)Google Scholar
  4. 4.
    Mozaffari, S., Faez, K., Faradji, F., Ziaratban, M., Golzan, M.: Isolated Far-si/Arabic character database for handwritten OCR research. In: International Work-shop on Frontiers of Handwriting Recognition, pp. 385–389 (2006)Google Scholar
  5. 5.
    Mozaffari, S., El Abed, H., Margner, V., Faez, K., Amirshahi, A.: IfN/Farsi-Database: A Database of Farsi Handwritten City Names. ICFHR (2008)Google Scholar
  6. 6.
    Slimane, F., Ingold, R., Kanoun, S., Alimi, A., Hennebert, J.: A New Arabic Printed Text Image Database and Evaluation Protocols. In: proc. of 10th IEEE International Conference on Document Analysis and Recognition, ICDAR 2009, pp. 946–950 (2009)Google Scholar
  7. 7.
    Davidson, R., Hopely, R.: Arabic and Persian OCR Training and Test Data Sets. In: Proceedings of Symposium. On Document Image Understanding Technology (1997)Google Scholar
  8. 8.
    AL-hashim, A.G., Mahmoud, S.A.: Benchmark Database and GUI Environment for Printed Arabic Text Recognition Research. Wseas Transactions Information Science and Applications 7(4), 10 (2010)Google Scholar
  9. 9.
    Hu, M.: Visual pattern recognition by moment invariants. IRE Trans. Information Theory, IT 8, 179–187 (1962)zbMATHGoogle Scholar
  10. 10.
    Flusser, J., Suk, T.: Pattern recognition by affine moment invariants. Pattern Recognition 26(1), 167–174 (1993)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Zernike, F.: Diffraction theory of the cut procedure and its improved form, the phase contrast method. Physica 1, 689–704 (1934)CrossRefzbMATHGoogle Scholar
  12. 12.
    Tsirikolias, K., Mertzios, B.G.: Statistical pattern recognition using efficient two dimensional moments with applications to character recognition. Pattern Recognition 26, 877–882 (1993)CrossRefGoogle Scholar
  13. 13.
    Derrode, S., Ghorbel, F.: Digital Fourier Mellin Transform- Reconstruction and es-timate of objects movement on levels of gray. In: Proc. of GRETSI conference, Grenoble, France, pp. 566–658 (1997)Google Scholar
  14. 14.
    Davis, C.B., Beecher, R., Beecher, M.: The statistical use of Fourier descriptors. Original Research Article Mathematical and Computer Modeling 11, 419–424 (1988)CrossRefGoogle Scholar
  15. 15.
    Freeman, H.: On the encoding of arbitrary geometric configurations. IEEE Trans. Electronic Comp. EC-10, 260–268 (1968)Google Scholar
  16. 16.
    Heutte, L.: Reconnaissance de caractères manuscrits: Application a la lecture au-tomatique des chèques et des enveloppes postales. Doctorat Thesis, University of Rouen (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Faten Kallel Jaiem
    • 1
  • Slim Kanoun
    • 1
  • Maher Khemakhem
    • 1
  • Haikal El Abed
    • 2
  • Jihain Kardoun
    • 3
  1. 1.MIRACL laboratory, ISIMSUniversity of SfaxTunisia
  2. 2.Institute for Communications TechnologyBraunschweig UniversityGermany
  3. 3.Department of Computer Engineering, ENISUniversity of SfaxTunisia

Personalised recommendations