Advertisement

An Efficient Segmentation Technique for Urdu Optical Character Recognizer (OCR)

  • Saud Ahmed Malik
  • Muazzam MaqsoodEmail author
  • Farhan Aadil
  • Muhammad Fahad Khan
Conference paper
Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 70)

Abstract

In Cursive languages like Urdu, segmentation of handwritten text lines is such a problem because of context sensitivity, diagonality of text etc. In this work, we presented a simple and robust line segmentation algorithm for Urdu handwritten and printed text. In the proposed line segmentation algorithm, modified header and baseline detection method are used. This technique purely depends on the counting pixels approach. Which efficiently segment Urdu handwritten and printed text lines along with skew detection. Handwritten and printed Urdu text dataset is manually generated for evaluating algorithm. Dataset consists of 80 pages having 687 handwritten Urdu text lines and printed dataset consist of 48 pages having 495 printed text lines. The algorithm performed significantly well on printed documents and handwritten Urdu text documents with well-separated lines and moderately well on a document containing overlapping words.

Keywords

Urdu OCR Text line segmentation Skew detection Header Baseline detection 

References

  1. 1.
    Ganai, A.F., Lone, F.R.: Character segmentation for Nastaleeq URDU OCR: a review. In: International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT). IEEE (2016)Google Scholar
  2. 2.
    Hussain, S., Ali, S.: Nastalique segmentation-based approach for Urdu OCR. Int. J. Doc. Anal. Recogn. (IJDAR) 18(4), 357–374 (2015)CrossRefGoogle Scholar
  3. 3.
    Rehman, A., Saba, T.: Off-line cursive script recognition: current advances, comparisons and remaining problems. Artif. Intell. Rev. 37(4), 261–288 (2012)CrossRefGoogle Scholar
  4. 4.
    Saba, T., Rehman, A., Sulong, G.: Cursive script segmentation with neural confidence. Int J. Innov. Comput. Inf. Control (IJICIC) 7(7), 1–10 (2011)Google Scholar
  5. 5.
    Palakollu, S., Dhir, R., Rani, R.: A new technique for line segmentation of handwritten hindi text. Spec. Issue Int. J. Comput. Appl. 0975–8887 (2011)Google Scholar
  6. 6.
    Amin, A.: Segmentation of printed Arabic text. In: International Conference on Advances in Pattern Recognition. Springer (2001)Google Scholar
  7. 7.
    Mandal, R., Manna, N.: Handwritten english character segmentation by baseline pixel burst method (BPBM). Adv. Model. Anal. B 57(1), 31–46 (2014)Google Scholar
  8. 8.
    Din, I.U., et al.: Line and ligature segmentation in printed Urdu document images. J. Appl. Environ. Biol. Sci. 6(3S), 114–120 (2016)MathSciNetGoogle Scholar
  9. 9.
    Naz, S., et al.: The optical character recognition of Urdu-like cursive scripts. Pattern Recogn. 47(3), 1229–1248 (2014)CrossRefGoogle Scholar
  10. 10.
    Lehal, G.S.: Ligature segmentation for Urdu OCR. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR). IEEE (2013)Google Scholar
  11. 11.
    Adiguzel, H., Sahin, E., Duygulu, P.: A hybrid for line segmentation in handwritten documents. In: 2012 International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE (2012)Google Scholar
  12. 12.
    Javed, S.T., Hussain, S.: Segmentation based urdu nastalique OCR. In: Iberoamerican Congress on Pattern Recognition. Springer (2013)Google Scholar
  13. 13.
    Muaz, A.: Urdu optical character recognition system MS thesis. Diss. National University of Computer & Emerging SciencesGoogle Scholar
  14. 14.
    Rana, A., Lehal, G.S.: Offline Urdu OCR using ligature based segmentation for Nastaliq Script. Indian J. Sci. Technol. 8(35) (2015)Google Scholar
  15. 15.
    Saabni, R., Asi, A., El-Sana, J.: Text line extraction for historical document images. Pattern Recogn. Lett. 35, 23–33 (2014)CrossRefGoogle Scholar
  16. 16.
    Brodić, D.: Text line segmentation with water flow algorithm based on power function. J. Electr. Eng. 66(3), 132–141 (2015)Google Scholar
  17. 17.
    Bal, A., Saha, R.: An improved method for handwritten document analysis using segmentation, baseline recognition and writing pressure detection. Procedia Comput. Sci. 93, 403–415 (2016)CrossRefGoogle Scholar
  18. 18.
    Vishwas, H., Thomas, B.A., Naveena, C.: Text line segmentation of unconstrained handwritten kannada historical script documents. In: Proceedings of International Conference on Cognition and Recognition. Springer (2018)Google Scholar
  19. 19.
    Peng, G., et al.: Text line segmentation using Viterbi algorithm for the palm leaf manuscripts of Dai. In: 2016 International Conference on Audio, Language and Image Processing (ICALIP). IEEE (2016)Google Scholar
  20. 20.
    Pastor-Pellicer, J., et al.: Complete system for text line extraction using convolutional neural networks and watershed transform. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS). IEEE (2016)Google Scholar
  21. 21.
    Vo, Q.N., Lee, G.: Dense prediction for text line segmentation in handwritten document images. In: 2016 IEEE International Conference on Image Processing (ICIP). IEEE (2016)Google Scholar
  22. 22.
    Ateeq, T., et al.: Ensemble-classifiers-assisted detection of cerebral microbleeds in brain MRI. Comput. Electr. Eng. (2018)Google Scholar
  23. 23.
    Kalsoom, A., et al.: A dimensionality reduction-based efficient software fault prediction using Fisher linear discriminant analysis (FLDA). J. Supercomputing, 1–35 (2018)Google Scholar
  24. 24.
    Khan, S., et al.: Optimized gabor feature extraction for mass classification using cuckoo search for big data e-healthcare. J. Grid Comput. 1–16 (2018)Google Scholar
  25. 25.
    Nazir, F., et al.: Social media signal detection using tweets volume, hashtag, and sentiment analysis. Multimedia Tools and Appl. 1–34 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Saud Ahmed Malik
    • 1
  • Muazzam Maqsood
    • 1
    Email author
  • Farhan Aadil
    • 1
  • Muhammad Fahad Khan
    • 1
  1. 1.Department of Computer ScienceCOMSATS University IslamabadIslamabadPakistan

Personalised recommendations