Segmentation Based Urdu Nastalique OCR

  • Sobia Tariq Javed
  • Sarmad Hussain
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8259)

Abstract

Urdu Language is written in Nastalique writing style, which is highly cursive, context sensitive and is difficult to process as only the last character in its ligature resides on the baseline. This paper focuses on the development of OCR using Hidden Markov Model and rule based post-processor. The recognizer gets the main body (without diacritics) as input and recognizes the corresponding ligature. Accuracy of the system is 92.73% for printed and then scanned document images at 36 font size.

Keywords

Nastalique Urdu OCR Urdu Segmentation 

References

  1. 1.
    Javed, S.T., Hussain, S.: Investigation into a Segmentation Based OCR for the Nastalique Writing System. Master’s thesis report at National University of Computer and Emerging Sciences, Lahore (2007), http://www.cle.org.pk/resources/theses.htm
  2. 2.
    8Javed, S.T., Hussain, S., Maqbool, A., Asloob, S., Jamil, S., Moin, H.: Segmentation Free Nastalique Urdu OCR. Journal of World Academy of Science, Engineering and Technology (70)(2010)Google Scholar
  3. 3.
    Javed, S.T., Hussain, S.: Improving Nastalique Specific Pre-Recognition Process for Urdu OCR. In: The Proceedings of 13th IEEE International Multitopic Conference 2009 (INMIC 2009), Islamabad, Pakistan (2009)Google Scholar
  4. 4.
    Jang, B.-K., Chin, R.T.: Analysis of thinning algorithms using mathematical morphology. IEEE Transactions on Pattern Analysis and Machine Intelligence (1990)Google Scholar
  5. 5.
    Rabiner, L., Juang, B.-H.: Theory and Implementation of Hidden Markov Models. In: Fundamental of Speech Recognition, ch. 6 (1993)Google Scholar
  6. 6.
    Young, S., Evermann, G., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (1995)Google Scholar
  7. 7.
    Ijaz, M., Hussain, S.: Corpus Based Urdu Lexicon Development. In: The Proceedings of Conference on Language Technology (CLT 2007), University of Peshawar, Pakistan (2007)Google Scholar
  8. 8.
    Pal, U., Sarkar, A.: Recognition of Printed Urdu Text. In: The Proceedings of the Seventh International Conference on Document Analysis and Recognition, ICDAR (2003)Google Scholar
  9. 9.
    Hussain, S.: www.LICT4D.asia/Fonts/Nafees_Nastalique. In: The Proceedings of 12th AMIC Annual Conference on E-Worlds: Governments, Business and Civil Society. Asian Media Information Center, Singapore (2003) Google Scholar
  10. 10.
    Wali, A., Hussain, S.: Context Sensitive Shape-Substitution in Nastaliq Writing system: Analysis and Formulation. In: The Proceedings of International Joint Conferences on Computer, Information, and Systems Sciences, and Engineering, CISSE (2006)Google Scholar
  11. 11.
    Lu, Z., Bazzi, I., Kornai, A., Makhoul, J.: A Robust, Language-Independent OCR System. In: The 27th AIPR Workshop: Advances in Computer Assisted Recognition, SPIE (1999)Google Scholar
  12. 12.
    Bojovic, M., Savic, M.D.: Training of Hidden Markov Models for Cursive Handwritten Word Recognition. In: The Proceedings of the15th International Conference on Pattern Recognition (ICPR), vol. 1 (2000)Google Scholar
  13. 13.
    Hussain, S., Afzal, M.: Urdu Computing Standards: UZT 1.01. In: The Proceedings of the IEEE International Multi-Topic Conference, Lahore, Pakistan (2001)Google Scholar
  14. 14.
    Hussain, S.: Letter to Sound Rules for Urdu Text to Speech System. In: The Proceedings of Workshop on Computational Approaches to Arabic Script-based Languages, COLING 2004, Geneva, Switzerland (2004)Google Scholar
  15. 15.
    Hussain, S., Durrani, N.: Urdu. In: A Study on Collation of Languages from Developing Asia, Center for Research in Urdu Language Processing, NUCES, Pakistan (2007)Google Scholar
  16. 16.
    El-Hajj, R., Likforman-Sulem, L., Mokbel, C.: Arabic Handwriting Recognition Using Baseline Dependant Features and Hidden Markov Modeling. In: The 8th International Conference on Document Analysis and Recognition (ICDAR), South Korea (2005)Google Scholar
  17. 17.
    Elms, A.J.: A Connected Character Recognizer Using Level Building of HMMs. In: The Proceedings of 12th International Conference on Pattern Recognition (1994)Google Scholar
  18. 18.
    Safabakhsh, R., Abidi, P.: Nastaaligh Handwritten Word Recognition Using a Continuous-Density Variable-Duration HMM. The Arabian Journal for Science and Engineering (2005)Google Scholar
  19. 19.
    Shah, Z., Saleem, F.: Ligature Based Optical Character Recognition of Urdu, Nastaliq Font. In: The Proceedings of International Multi Topic Conference, Karachi, Pakistan (2002)Google Scholar
  20. 20.
    Husain, S.A., Amin, S.H.: A Multi-tier Holistic approach for Urdu Nastaliq Recognition. In: The Proceedings of International Multi Topic Conference, Karachi, Pakistan (2002)Google Scholar
  21. 21.
    Ahmad, Z., Orakzai, J.K., Shamsher, I., Adnan, A.: Urdu Nastalique Optical Character Recognition. In: The Proceedings of World Academy of Science, Engineering and Technology (2007)Google Scholar
  22. 22.
    Shamsher, I., Ahmad, Z., Orakzai, J.K., Adnan, A.: OCR for Printed Urdu Script Using Feed Forward Neural Network. In: The Proceedings of World Academy of Science, Engineering and Technology (2007)Google Scholar
  23. 23.
    Malik, S., Khan, S.A.: Urdu online handwriting recognition. In: Proceedings of the IEEE Symposium on Emerging Technologies (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Sobia Tariq Javed
    • 1
  • Sarmad Hussain
    • 2
  1. 1.National University of Computer and Emerging SciencesLahorePakistan
  2. 2.Al-Khawarizmi Institute of Computer ScienceUniversity of Engineering and TechnologyLahorePakistan

Personalised recommendations