Skip to main content

Noise Reduction in Urdu Document Image–Spatial and Frequency Domain Approaches

  • Conference paper
  • First Online:
Proceedings of the Fourth International Conference on Signal and Image Processing 2012 (ICSIP 2012)

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 222))

Abstract

With advancement in optical character recognition technology, now it is possible to digitize printed and handwritten documents and to make it editable and searchable for many scripts and languages. But still the major challenges which need to be simplify in case of Urdu script is segmentation dilemma. The segmentation of Urdu text is untouched by most of the researchers due to complexity in Urdu script. An ideal preprocessing for Urdu script may reduce these complexities and simplify the segmentation process. The noise removal in Urdu is complex due to importance of dots and modifiers which are similar to noise. In character recognition system preprocessing intends to remove/reduce the noise, normalize image against present variations like skewness, slant, size etc. and minimize the storage requirement to increase processing speed. In present paper an attempt is made to recapitulate various preprocessing techniques proposed in literature for Arabic, Persian, Jawi and Urdu. Also the enhancement of the dark and noisy Urdu document is done using histogram equalization, spatial max and median filter, and frequency domain Gaussian Lowpass Filters. These noise free document image can help to improve further segmentation and feature extraction process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Peters RA (1995) A new algorithm for image noise reduction using mathematical morphology. IEEE Trans Image Process 4:554–568

    Article  Google Scholar 

  2. Imran Razzak M, Afaq Hussain S, Muhammad S, Shafi Khan Z (2009) Combining offline and online preprocessing for online Urdu character recognition. In: Proceedings of the international multiconference of engineers and computer scientists, vol 1, Hong Kong, 18–20 March 2009, IMECS 2009

    Google Scholar 

  3. Hussain M et al (2005) Urdu character recognition using spatial temporal neural network. In: Proceedings of 9th international multitopic conference, IEEE INMIC 2005, 24–25 Dec 2005, pp 1–5

    Google Scholar 

  4. Sheikh Faisal R, Syed Saqib B, Faisal S, Thomas MB (2009) A discriminative learning approach for orientation detection of Urdu document images. In: Proceedings of multitopic conference, IEEE 13th international, INMIC 2009, 14–15 Dec 2009, pp 1–5

    Google Scholar 

  5. Malik Waqas S, Nicola N, Lei He C, Ching YS (2010) A novel handwritten Urdu word spotting based on connected components analysis. In: Proceeding of 2010 international conference on pattern recognition, IEEE computer society 2010

    Google Scholar 

  6. Syed Saqib B, Faisal S, Thomas MB (2011) High performance layout analysis of Arabic and Urdu document images. In: Proceedings of 2011 international conference on document analysis and recognition, pp 1275–1279

    Google Scholar 

  7. Malik Waqas S, Nicola N, Chun LH, Ching YS (2010) Holistic Urdu handwritten word recognition using support vector machine. In: Proceedings of 12th international conference on pattern recognition, 23–26 August, ISBN: 978-0-7695-4109-9

    Google Scholar 

  8. Shuwair S, Abdul W (2010) Optical character recognition system for Urdu, Information and emerging technologies (ICIET). In: Proceedings of 2010 international conference 9 Nov 2010

    Google Scholar 

  9. Alama’ adeed S et al. (2002) Recognition of offline handwritten Arabic word using hidden Markov model approach, ICPR 02. In: Proceedings of the 16th international conference on pattern recognition (ICPR 02), vol 3, ISBN:0-7695-1695-X, p 30481

    Google Scholar 

  10. Pechwitz M, M¨argner V (2002) Baseline estimation for Arabic handwritten words. In: Proceedings of 8th international workshop on frontiers in handwriting recognition, IWFHR 2002, August 2002, Niagara-on-the-Lake, Canada 2002

    Google Scholar 

  11. Safwan W, Zhixin S,Venu G (2009) Segmentation of Arabic handwriting based on both contour and skeleton segmentation. In: 10th international conference on document analysis and recognition, pp 793–797

    Google Scholar 

  12. Pal U, Anirban S (2003) In: Proceedings of 7th international conference on document analysis and recognition, pp 1183–1187

    Google Scholar 

  13. Atique Ur Rehman M (2010) A new scale invariant optimized chain code for Nastaliq character representation. In: 2nd international conference on computer modeling and simulation, pp 400–403

    Google Scholar 

  14. Hassan Shirali-Shahreza M, Shirali-Shahreza S (2008) Removing noises similar to dots from persian scanned documents. In: ISECS international colloquium on computing, communication, control, and management, pp 313–317

    Google Scholar 

  15. Vaseghi B, Alirezaee S, Ahmadi M, Amirfattahi R (2008) Off-line Farsi/Arabic handwritten word recognition using vector quantization and hidden Markov model. In: Proceedings of multitopic conference, INMIC 2008. IEEE International, 23–24 Dec 2008, pp 575–578

    Google Scholar 

  16. Al-Badr B, Robert MH (1995) Segmentation-free word recognition with application to Arabic. In: Proceedings of the 3rd international conference on document analysis and recognition, ICDAR ‘95, pp 355–359

    Google Scholar 

  17. Deya M, Adnan A, Robert S (1997) Segmentation of Arabic cursive script. In: Proceedings of the 4th international conference on document analysis and recognition pages, ICDAR ‘97, pp 625–628

    Google Scholar 

  18. Muhammad S, Syed Nazim N, Abdulaziz A-K (2003) Offline Arabic text recognition system. In: Proceedings of the 2003 international conference on geometric modeling and graphics GMAG-03

    Google Scholar 

  19. Cheung A, Bennamoun M, Bergmann NW (2001) A recognition-based Arabic optical character segmentation. Pattern recognition, vol 34, pp.215–233

    Google Scholar 

  20. Kavianifar M, Adnan A (1999) Preprocessing and structural feature extraction for a multi-fonts Arabic/Persian OCR. IJDR 1999, pp 213–216

    Google Scholar 

  21. Mahmoud AL-Shatnawi A, AL-Salaimeh S, AL-Zawaideh FH, Khairuddin O (2011) Offline Arabic text recognition–––an overview. World Comput Sci Inform Technol J (WCSIT) ISSN: 2221-0741 1(5): 184–192

    Google Scholar 

  22. Ahmed ME, Mohamed AI (2001) A graph-based segmentation and feature extraction framework for Arabic text recognition. In: Proceedings of ICDAR 2001, pp 622–626

    Google Scholar 

  23. Ziad O, Lebanon B (2009) Automatic processing of Arabic text. In: Proceedings of the 6th international conference on Innovations in information technology, IIT’09, p 6–10

    Google Scholar 

  24. Zhixin S, Srirangaraj S, Venu G (2011) Image enhancement for degraded binary document images. In: Proceedings of international conference on document analysis and recognition, pp 895–899

    Google Scholar 

  25. Abuhaiba ISI, Mahmoud SA, Green RJ (1994) Recognition of handwritten cursive Arabic characters. IEEE Transactions Pattern Anal Mach Intell 16(6): 644–672

    Google Scholar 

  26. Al-Shoshan AI (2006) Arabic OCR based on image invariants. In: Proceedings of the geometric modeling and imaging trends (GMAI’06)—July 2006 New

    Google Scholar 

  27. Mohammad FN, Khairuddin O, Mohamad SZ, Liong CY (2010) Handwritten cursive Jawi character recognition: a survey. In: Proceedings of 5th international conference on computer graphics, imaging and visualization, pp 247–256

    Google Scholar 

  28. Khairuddin O (2000) Jawi handwritten text recognition using multi-level classifier (in Malay), PhD Thesis, Universiti Putra Malaysia

    Google Scholar 

  29. Sharaf El-Deen S, Horaini M, Zainodin J, Khairuddin O (1993) Skeletonization, Laporan Teknik Jabatan Sains Komputer. Fakulti Sains Matematik dan Komputer. Universiti Kebangsaan Malaysia, Bangi

    Google Scholar 

  30. Naccashe NJ, Shinghal R (1984) SPTA: a proposed algorithm for thinning binary patterns. IEEE Trans Syst Man Cybernatics, SMC-14(3), May/June, pp 409–418

    Google Scholar 

  31. Mazani M (2002) In: Jawi handwritten text recognition using recurrent Bama neural networks (in Malay), PhD thesis, Universiti Kebangsaan Malaysia

    Google Scholar 

  32. Parker JR (1994) Practical computer vision. Wiley, Singapore

    Google Scholar 

  33. Philips D (1994) Image processing: analyzing and enhancing digital images. R&D Publications Inc, Kansas

    Google Scholar 

  34. Zhang TY, Suen CY (1984) A fast algorithm for thinning digital pattern. Comm ACM 7(3):236–239

    Article  MathSciNet  Google Scholar 

  35. Chen M-Y, Kundu A, Zhou J (1994) Off-line handwritten word recognition using a hidden Markov model type Stochastic network. IEEE Trans Patter Analy Mach Intell 16(5):481–496

    Article  Google Scholar 

  36. Mohd Sanusi A (2003) Reengineering of slant and slope orientation skew histogram for Merong Mahawangsa Manuscript (in Malay), MIT Thesis, Fakulti Teknologi dan Sains Maklumat, Universiti Kebangsaan Malaysia, Bangi

    Google Scholar 

  37. Nafiz A, Yarman-Vural FT (2001) An overview of character recognition focused on off-line handwriting. IEEE Trans On Patter Analysis and Machine Intelligence 31(2): 216–233

    Google Scholar 

  38. Serra J (1994) Morphological filtering: an overview. Signal Process 38(1):3–11

    Article  MATH  Google Scholar 

  39. Sonka M, Hlavac V, Boyle R (1999) Image processing, analysis and machine vision, 2nd edn. Brooks/Cole, CA

    Google Scholar 

  40. Gonzalez RC, Woods RE (2004) Digital image processing, 2nd edn Pearson Education

    Google Scholar 

Download references

Acknowledgments

This work is sponsored by a G.H. Raisoni Doctoral fellowship, North Maharashtra University, Jalgoan. The author would like to acknowledge for their financial support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. J. Ramteke .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer India

About this paper

Cite this paper

Ramteke, R.J., Pathan, I.K. (2013). Noise Reduction in Urdu Document Image–Spatial and Frequency Domain Approaches. In: S, M., Kumar, S. (eds) Proceedings of the Fourth International Conference on Signal and Image Processing 2012 (ICSIP 2012). Lecture Notes in Electrical Engineering, vol 222. Springer, India. https://doi.org/10.1007/978-81-322-1000-9_42

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-1000-9_42

  • Published:

  • Publisher Name: Springer, India

  • Print ISBN: 978-81-322-0999-7

  • Online ISBN: 978-81-322-1000-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics