Abstract
Urdu Language is written in Nastalique writing style, which is highly cursive, context sensitive and is difficult to process as only the last character in its ligature resides on the baseline. This paper focuses on the development of OCR using Hidden Markov Model and rule based post-processor. The recognizer gets the main body (without diacritics) as input and recognizes the corresponding ligature. Accuracy of the system is 92.73% for printed and then scanned document images at 36 font size.
Chapter PDF
References
Javed, S.T., Hussain, S.: Investigation into a Segmentation Based OCR for the Nastalique Writing System. Master’s thesis report at National University of Computer and Emerging Sciences, Lahore (2007), http://www.cle.org.pk/resources/theses.htm
8Javed, S.T., Hussain, S., Maqbool, A., Asloob, S., Jamil, S., Moin, H.: Segmentation Free Nastalique Urdu OCR. Journal of World Academy of Science, Engineering and Technology (70)(2010)
Javed, S.T., Hussain, S.: Improving Nastalique Specific Pre-Recognition Process for Urdu OCR. In: The Proceedings of 13th IEEE International Multitopic Conference 2009 (INMIC 2009), Islamabad, Pakistan (2009)
Jang, B.-K., Chin, R.T.: Analysis of thinning algorithms using mathematical morphology. IEEE Transactions on Pattern Analysis and Machine Intelligence (1990)
Rabiner, L., Juang, B.-H.: Theory and Implementation of Hidden Markov Models. In: Fundamental of Speech Recognition, ch. 6 (1993)
Young, S., Evermann, G., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (1995)
Ijaz, M., Hussain, S.: Corpus Based Urdu Lexicon Development. In: The Proceedings of Conference on Language Technology (CLT 2007), University of Peshawar, Pakistan (2007)
Pal, U., Sarkar, A.: Recognition of Printed Urdu Text. In: The Proceedings of the Seventh International Conference on Document Analysis and Recognition, ICDAR (2003)
Hussain, S.: www.LICT4D.asia/Fonts/Nafees_Nastalique. In: The Proceedings of 12th AMIC Annual Conference on E-Worlds: Governments, Business and Civil Society. Asian Media Information Center, Singapore (2003)
Wali, A., Hussain, S.: Context Sensitive Shape-Substitution in Nastaliq Writing system: Analysis and Formulation. In: The Proceedings of International Joint Conferences on Computer, Information, and Systems Sciences, and Engineering, CISSE (2006)
Lu, Z., Bazzi, I., Kornai, A., Makhoul, J.: A Robust, Language-Independent OCR System. In: The 27th AIPR Workshop: Advances in Computer Assisted Recognition, SPIE (1999)
Bojovic, M., Savic, M.D.: Training of Hidden Markov Models for Cursive Handwritten Word Recognition. In: The Proceedings of the15th International Conference on Pattern Recognition (ICPR), vol. 1 (2000)
Hussain, S., Afzal, M.: Urdu Computing Standards: UZT 1.01. In: The Proceedings of the IEEE International Multi-Topic Conference, Lahore, Pakistan (2001)
Hussain, S.: Letter to Sound Rules for Urdu Text to Speech System. In: The Proceedings of Workshop on Computational Approaches to Arabic Script-based Languages, COLING 2004, Geneva, Switzerland (2004)
Hussain, S., Durrani, N.: Urdu. In: A Study on Collation of Languages from Developing Asia, Center for Research in Urdu Language Processing, NUCES, Pakistan (2007)
El-Hajj, R., Likforman-Sulem, L., Mokbel, C.: Arabic Handwriting Recognition Using Baseline Dependant Features and Hidden Markov Modeling. In: The 8th International Conference on Document Analysis and Recognition (ICDAR), South Korea (2005)
Elms, A.J.: A Connected Character Recognizer Using Level Building of HMMs. In: The Proceedings of 12th International Conference on Pattern Recognition (1994)
Safabakhsh, R., Abidi, P.: Nastaaligh Handwritten Word Recognition Using a Continuous-Density Variable-Duration HMM. The Arabian Journal for Science and Engineering (2005)
Shah, Z., Saleem, F.: Ligature Based Optical Character Recognition of Urdu, Nastaliq Font. In: The Proceedings of International Multi Topic Conference, Karachi, Pakistan (2002)
Husain, S.A., Amin, S.H.: A Multi-tier Holistic approach for Urdu Nastaliq Recognition. In: The Proceedings of International Multi Topic Conference, Karachi, Pakistan (2002)
Ahmad, Z., Orakzai, J.K., Shamsher, I., Adnan, A.: Urdu Nastalique Optical Character Recognition. In: The Proceedings of World Academy of Science, Engineering and Technology (2007)
Shamsher, I., Ahmad, Z., Orakzai, J.K., Adnan, A.: OCR for Printed Urdu Script Using Feed Forward Neural Network. In: The Proceedings of World Academy of Science, Engineering and Technology (2007)
Malik, S., Khan, S.A.: Urdu online handwriting recognition. In: Proceedings of the IEEE Symposium on Emerging Technologies (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Javed, S.T., Hussain, S. (2013). Segmentation Based Urdu Nastalique OCR. In: Ruiz-Shulcloper, J., Sanniti di Baja, G. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2013. Lecture Notes in Computer Science, vol 8259. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41827-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-41827-3_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41826-6
Online ISBN: 978-3-642-41827-3
eBook Packages: Computer ScienceComputer Science (R0)