Abstract
In this chapter, we present baseline detection challenges for Arabic script based languages and targeted Nastaliq and Naskh writing style. Baseline is an important step in the OCR as it directly affects the rest of the steps and increases the performance and efficiency of character segmentation and feature extraction in OCR process. Character recognition on Arabic script is relatively more difficult than Latin text due to the nature of Arabic script, which is cursive, context sensitive and different writing style. In this paper, we provide a comprehensive review of baseline detection methods for Urdu language. The aim of the chapter is to introduce the challenges during baseline detection in cursive script languages for Nastaliq and Naskh script.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Razzak, M.I., Mirza, A.A., et al.: Effect of ghost character theory on Arabic script based languages character recognition. Przeglad Elektrotechniczny, ISSN 0033-2097
Raza, A., Siddiqi, I., Abidi, A., Arif, F.: An unconstrained benchmark Urdu handwritten sentence database with automatic line segmentation. In: International Conference on Frontiers in Handwriting Recognition (2012)
Farooq, F., Govindaraju, V., Perrone, M.: Pre-processing methods for hand-written Arabic documents. In: Proceedings of the 2005 Eight International Conference on Document Analysis and Recognition (ICDAR 2005), pp. 267–271. IEEE (2005)
Al-Rashaideh, H.: Preprocessing phase for Arabic word handwritten recognition. Russian Academy of Sciences 6(1), 11–19 (2006)
Parhami, B., Taraghi, M.: Automatic recognition of printed farsi texts. Pattern Recognition 14, 395–403 (1981)
Boubaker, H., Kherallah, M., Alimi, A.M.: New algorithm of straight or curved baseline detection for short arabic handwritten writing. In: 10th International Conference on Document Analysis and Recognition, ICDAR 2009, pp. 778–782. IEEE (2009)
Natarajan, P., Belanger, D., Prasad, R., Kamali, M., Subramanian, K.: Baseline Dependent Percentile Features for Oine Arabic Handwriting Recognition. In: International Conference on Document Analysis and Recognition (ICDAR 2011), pp. 329–333. IEEE (2011)
Al-Badr, B., Mahmoud, S.A.: Survey and bibliography of Arabic optical text recognition. Signal Processing 41(1), 49–77 (1995)
Amin, A.: Online arabic character recognition: the state of the art. Pattern Recognition 31(5), 517–530 (1998)
Shah, Z.A.: Ligature based optical character recognition of urdu-nastaleeq font. In: International Multi Topic Abstracts Conference, INMIC 2002, 25 p. IEEE (2002)
Sabbour, N., Shafait, F.: A segmentation-free approach to arabic and urdu ocr. In: IS&T/SPIE Electronic Imaging, pp. 86580–86580. International Society for Optics and Photonics (2013)
Pechwitz, M., Margner, V.: Baseline estimation for arabic handwritten words. In: Proceedings of the Electrochemical Society of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR) Frontiers in Handwriting Recognition (IWFHR), 479 p. (2002)
Nagabhushan, P., Alaei, A.: Tracing and straightening the baseline in hand-written persian/arabic text-line: A new approach based on painting-technique. The Proceeding of Int. Journal on Computer Science and Engineering, 907–916 (2010)
Abu-Ain, T., Sheikh Abdullah, S.N.H., Bataineh, B., Omar, K., Abu-Ein, A.: A novel baseline detection method of handwritten Arabic-script documents based on sub-words. In: Noah, S.A., Abdullah, A., Arshad, H., Abu Bakar, A., Othman, Z.A., Sahran, S., Omar, N., Othman, Z. (eds.) M-CAIT 2013. CCIS, vol. 378, pp. 67–77. Springer, Heidelberg (2013)
AL-Shatnawi, A., Omar, K.: A comparative study between methods of Arabic baseline detection. In: International Conference on Electrical Engineering and Informatics, ICEEI 2009, vol. 1, pp. 73–77. IEEE (2009)
Li, Q., Xie, Y.: Randomised hough transform with error propagation for line and circle detection. Pattern Analysis & Applications 6(1), 55–64 (2003)
Yamani, M., Idris, I., Razak, Z., Zulkiee, K.: Online handwriting text line segmentation: A review. IJCSNS International Journal of Computer Science and Network Security 8(7) (2008)
Likforman-Sulem, L., Hanimyan, A., Faure, C.: A hough based algorithm for extracting text lines in handwritten documents. In: Proceedings of the Third International Conference on Document Analysis and Recognition, vol. 2, pp. 774–777. IEEE (1995)
Maddouri, S.S., Samoud, F.B., Bouriel, K., Ellouze, N., El Abed, H.: Baseline extraction: Comparison of six methods on ifn/enit database. In: The 11th International Conference on Frontiers in Handwriting Recognition (2008)
Burrow, P.: Arabic handwriting recognition. m.sc. thesis. Master’s thesis, University of Edinburgh. England (2004)
Al-Shatnawi, A.M., Omar, K.: Methods of arabic language baseline detection, the state of art. ARISER 4, 185–193 (2008)
Pal, U., Sarkar, A.: Recognition of printed urdu script. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition, ICDAR 2003 (2003)
Ahmad, Z., Orakzai, J.K., Shamsher, I.: Urdu compound character recogni-tion using feed forward neural networks. In: 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009, pp. 457–462. IEEE (2009)
Sattar, S.A., Haque, S., Pathan, M.K.: Nastaliq optical character recognition. In: Proceedings of the 46th Annual Southeast Regional Conference on XX, pp. 329–331. ACM (2008)
Javed, S.T., Hussain, S., Maqbool, A., Asloob, S., Jamil, S., Moin, H.: Segmentation free nastalique urdu ocr. In: Word Academy of Science, Engineering and Technology (2010)
Razzak, M.I., Sher, M., Hussain, S.A.: Locally baseline detection for online Arabic script based languages character recognition. International Journal of the Physical Sciences 5(7), 955–959 (2010)
Wali, A., Gulzar, A., Zia, A., Ghazali, M.A., Rafiq, M.I., Niaz, M.S., Hussain, S., Bashir, S.: contextual shape analysis of Nastaliq
Razzak, M.I., Hussain, S.A., Sher, M., Khan, Z.S.: Combining offline and online preprocessing for online urdu character recognition. In: Proceedings of the International MultiConference of Engineers and Computer Scientists, vol. 1, pp. 18–20 (2009)
Razzak, M.I., Anwar, F., Husain, S.A., Belaid, A., Sher, M.: Hmm and fuzzy logic: A hybrid approach for online urdu script-based languages character recognition. Knowledge-Based Systems 23(8), 914–923 (2010)
Razzak, M.I., Husain, S.A., Mirza, A.A., Belad, A.: Fuzzy based preprocessing using fusion of online and oine trait for online urdu script based languages char-acter recognition. International Journal of Innovative Computing, Information and Control 8, 1349–4198 (2012)
Razzak, M.I., Husain, S.A., Mirza, A.A., Khan, M.K.: Bio-inspired multilayered and multilanguage Arabic script character recognition system. International Journal of Innovative Computing, Information and Control 8 (2012)
Razzak, M.I.: Online Urdu Character Recognitio. In: Unconstrained Environment. PhD thesis, International Islamic University, Islamabad (2011)
Sardar, S., Wahab, A.: Optical character recognition system for Urdu. In: 2010 International Conference on Information and Emerging Technologies (ICIET), pp. 1–5. IEEE (2010)
Javed, S.T., Hussain, S.: Improving Nastalique specific pre-recognition process for Urdu OCR. In: IEEE 13th International Multitopic Conference (INMIC 2009), pp. 1–6 (2009)
Shafait, F., Keysers, D., Breuel, T.M., et al.: Layout analysis of Urdu document images. In: Multitopic Conference, INMIC 2006, pp. 293–298. IEEE (2006)
Breuel, T.M.: High performance document layout analysis. In: Proceedings of the Symposium on Document Image Understanding Technology, pp. 209–218 (2003)
Breuel, T.M.: Two geometric algorithms for layout analysis. In: Lopresti, D.P., Hu, J., Kashi, R.S. (eds.) DAS 2002. LNCS, vol. 2423, pp. 188–199. Springer, Heidelberg (2002)
Sattar, S.A., Shah, S.: Character Recognition of Arabic Script Languages. In: ICCIT 2012 (2012)
Naz, S., Hayat, K., Anwar, M.W., Akbar, H., Razzak, M.I.: Challenges in Baseline Detection of Cursive Script Languages. In: Science and Information Conference 2013, London, UK, October 7-9 (2013)
Mukhtar, O., Setlur, S., Govindaraju, V.: Experiments on urdu text recognition. In: Guide to OCR for Indic Scripts, pp. 163–171 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Naz, S., Razzak, M.I., Hayat, K., Anwar, M.W., Khan, S.Z. (2014). Challenges in Baseline Detection of Arabic Script Based Languages. In: Chen, L., Kapoor, S., Bhatia, R. (eds) Intelligent Systems for Science and Information. Studies in Computational Intelligence, vol 542. Springer, Cham. https://doi.org/10.1007/978-3-319-04702-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-04702-7_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04701-0
Online ISBN: 978-3-319-04702-7
eBook Packages: EngineeringEngineering (R0)