Abstract
With advancement in optical character recognition technology, now it is possible to digitize printed and handwritten documents and to make it editable and searchable for many scripts and languages. But still the major challenges which need to be simplify in case of Urdu script is segmentation dilemma. The segmentation of Urdu text is untouched by most of the researchers due to complexity in Urdu script. An ideal preprocessing for Urdu script may reduce these complexities and simplify the segmentation process. The noise removal in Urdu is complex due to importance of dots and modifiers which are similar to noise. In character recognition system preprocessing intends to remove/reduce the noise, normalize image against present variations like skewness, slant, size etc. and minimize the storage requirement to increase processing speed. In present paper an attempt is made to recapitulate various preprocessing techniques proposed in literature for Arabic, Persian, Jawi and Urdu. Also the enhancement of the dark and noisy Urdu document is done using histogram equalization, spatial max and median filter, and frequency domain Gaussian Lowpass Filters. These noise free document image can help to improve further segmentation and feature extraction process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Peters RA (1995) A new algorithm for image noise reduction using mathematical morphology. IEEE Trans Image Process 4:554–568
Imran Razzak M, Afaq Hussain S, Muhammad S, Shafi Khan Z (2009) Combining offline and online preprocessing for online Urdu character recognition. In: Proceedings of the international multiconference of engineers and computer scientists, vol 1, Hong Kong, 18–20 March 2009, IMECS 2009
Hussain M et al (2005) Urdu character recognition using spatial temporal neural network. In: Proceedings of 9th international multitopic conference, IEEE INMIC 2005, 24–25 Dec 2005, pp 1–5
Sheikh Faisal R, Syed Saqib B, Faisal S, Thomas MB (2009) A discriminative learning approach for orientation detection of Urdu document images. In: Proceedings of multitopic conference, IEEE 13th international, INMIC 2009, 14–15 Dec 2009, pp 1–5
Malik Waqas S, Nicola N, Lei He C, Ching YS (2010) A novel handwritten Urdu word spotting based on connected components analysis. In: Proceeding of 2010 international conference on pattern recognition, IEEE computer society 2010
Syed Saqib B, Faisal S, Thomas MB (2011) High performance layout analysis of Arabic and Urdu document images. In: Proceedings of 2011 international conference on document analysis and recognition, pp 1275–1279
Malik Waqas S, Nicola N, Chun LH, Ching YS (2010) Holistic Urdu handwritten word recognition using support vector machine. In: Proceedings of 12th international conference on pattern recognition, 23–26 August, ISBN: 978-0-7695-4109-9
Shuwair S, Abdul W (2010) Optical character recognition system for Urdu, Information and emerging technologies (ICIET). In: Proceedings of 2010 international conference 9 Nov 2010
Alama’ adeed S et al. (2002) Recognition of offline handwritten Arabic word using hidden Markov model approach, ICPR 02. In: Proceedings of the 16th international conference on pattern recognition (ICPR 02), vol 3, ISBN:0-7695-1695-X, p 30481
Pechwitz M, M¨argner V (2002) Baseline estimation for Arabic handwritten words. In: Proceedings of 8th international workshop on frontiers in handwriting recognition, IWFHR 2002, August 2002, Niagara-on-the-Lake, Canada 2002
Safwan W, Zhixin S,Venu G (2009) Segmentation of Arabic handwriting based on both contour and skeleton segmentation. In: 10th international conference on document analysis and recognition, pp 793–797
Pal U, Anirban S (2003) In: Proceedings of 7th international conference on document analysis and recognition, pp 1183–1187
Atique Ur Rehman M (2010) A new scale invariant optimized chain code for Nastaliq character representation. In: 2nd international conference on computer modeling and simulation, pp 400–403
Hassan Shirali-Shahreza M, Shirali-Shahreza S (2008) Removing noises similar to dots from persian scanned documents. In: ISECS international colloquium on computing, communication, control, and management, pp 313–317
Vaseghi B, Alirezaee S, Ahmadi M, Amirfattahi R (2008) Off-line Farsi/Arabic handwritten word recognition using vector quantization and hidden Markov model. In: Proceedings of multitopic conference, INMIC 2008. IEEE International, 23–24 Dec 2008, pp 575–578
Al-Badr B, Robert MH (1995) Segmentation-free word recognition with application to Arabic. In: Proceedings of the 3rd international conference on document analysis and recognition, ICDAR ‘95, pp 355–359
Deya M, Adnan A, Robert S (1997) Segmentation of Arabic cursive script. In: Proceedings of the 4th international conference on document analysis and recognition pages, ICDAR ‘97, pp 625–628
Muhammad S, Syed Nazim N, Abdulaziz A-K (2003) Offline Arabic text recognition system. In: Proceedings of the 2003 international conference on geometric modeling and graphics GMAG-03
Cheung A, Bennamoun M, Bergmann NW (2001) A recognition-based Arabic optical character segmentation. Pattern recognition, vol 34, pp.215–233
Kavianifar M, Adnan A (1999) Preprocessing and structural feature extraction for a multi-fonts Arabic/Persian OCR. IJDR 1999, pp 213–216
Mahmoud AL-Shatnawi A, AL-Salaimeh S, AL-Zawaideh FH, Khairuddin O (2011) Offline Arabic text recognition–––an overview. World Comput Sci Inform Technol J (WCSIT) ISSN: 2221-0741 1(5): 184–192
Ahmed ME, Mohamed AI (2001) A graph-based segmentation and feature extraction framework for Arabic text recognition. In: Proceedings of ICDAR 2001, pp 622–626
Ziad O, Lebanon B (2009) Automatic processing of Arabic text. In: Proceedings of the 6th international conference on Innovations in information technology, IIT’09, p 6–10
Zhixin S, Srirangaraj S, Venu G (2011) Image enhancement for degraded binary document images. In: Proceedings of international conference on document analysis and recognition, pp 895–899
Abuhaiba ISI, Mahmoud SA, Green RJ (1994) Recognition of handwritten cursive Arabic characters. IEEE Transactions Pattern Anal Mach Intell 16(6): 644–672
Al-Shoshan AI (2006) Arabic OCR based on image invariants. In: Proceedings of the geometric modeling and imaging trends (GMAI’06)—July 2006 New
Mohammad FN, Khairuddin O, Mohamad SZ, Liong CY (2010) Handwritten cursive Jawi character recognition: a survey. In: Proceedings of 5th international conference on computer graphics, imaging and visualization, pp 247–256
Khairuddin O (2000) Jawi handwritten text recognition using multi-level classifier (in Malay), PhD Thesis, Universiti Putra Malaysia
Sharaf El-Deen S, Horaini M, Zainodin J, Khairuddin O (1993) Skeletonization, Laporan Teknik Jabatan Sains Komputer. Fakulti Sains Matematik dan Komputer. Universiti Kebangsaan Malaysia, Bangi
Naccashe NJ, Shinghal R (1984) SPTA: a proposed algorithm for thinning binary patterns. IEEE Trans Syst Man Cybernatics, SMC-14(3), May/June, pp 409–418
Mazani M (2002) In: Jawi handwritten text recognition using recurrent Bama neural networks (in Malay), PhD thesis, Universiti Kebangsaan Malaysia
Parker JR (1994) Practical computer vision. Wiley, Singapore
Philips D (1994) Image processing: analyzing and enhancing digital images. R&D Publications Inc, Kansas
Zhang TY, Suen CY (1984) A fast algorithm for thinning digital pattern. Comm ACM 7(3):236–239
Chen M-Y, Kundu A, Zhou J (1994) Off-line handwritten word recognition using a hidden Markov model type Stochastic network. IEEE Trans Patter Analy Mach Intell 16(5):481–496
Mohd Sanusi A (2003) Reengineering of slant and slope orientation skew histogram for Merong Mahawangsa Manuscript (in Malay), MIT Thesis, Fakulti Teknologi dan Sains Maklumat, Universiti Kebangsaan Malaysia, Bangi
Nafiz A, Yarman-Vural FT (2001) An overview of character recognition focused on off-line handwriting. IEEE Trans On Patter Analysis and Machine Intelligence 31(2): 216–233
Serra J (1994) Morphological filtering: an overview. Signal Process 38(1):3–11
Sonka M, Hlavac V, Boyle R (1999) Image processing, analysis and machine vision, 2nd edn. Brooks/Cole, CA
Gonzalez RC, Woods RE (2004) Digital image processing, 2nd edn Pearson Education
Acknowledgments
This work is sponsored by a G.H. Raisoni Doctoral fellowship, North Maharashtra University, Jalgoan. The author would like to acknowledge for their financial support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer India
About this paper
Cite this paper
Ramteke, R.J., Pathan, I.K. (2013). Noise Reduction in Urdu Document Image–Spatial and Frequency Domain Approaches. In: S, M., Kumar, S. (eds) Proceedings of the Fourth International Conference on Signal and Image Processing 2012 (ICSIP 2012). Lecture Notes in Electrical Engineering, vol 222. Springer, India. https://doi.org/10.1007/978-81-322-1000-9_42
Download citation
DOI: https://doi.org/10.1007/978-81-322-1000-9_42
Published:
Publisher Name: Springer, India
Print ISBN: 978-81-322-0999-7
Online ISBN: 978-81-322-1000-9
eBook Packages: EngineeringEngineering (R0)