A Comparative analysis for identification and classification of text segmentation challenges in Takri Script

Abstract

Takri is an Indian regional class of scripts, used in hilly areas of north-west India which include Jammu and Kashmir (J & K), Himachal Pradesh (H.P.), Punjab and Uttarakhand. This script has immense variations; almost 13 identified in the whole region of North-west India. It has been observed that no work for text identification and recognition of Takri script has been done so far. Therefore, our work focuses on identifying and classifying the various challenges in the script based on comparative analysis of existing text segmentation approaches, as correct segmentation of text leads to more accurate machine recognition. As there were no metal fonts available for the script, it is required to collect the machine-printed form of data for solving the text identification problem in Takri script. The paper surveys for different text segmentation approaches and based on the structural properties of the script, shows an implementation of these on Takri data in three steps- Gurmukhi segmentation technique, Connected Component segmentation approach, and Gurmukhi touching characters segmentation approach. Results are analyzed for Segmentation Accuracy and Challenges are identified along with their statistical analysis. Further, the challenges identified as half- forms, numerous types of touching characters, overlapping bounding boxes, are classified. The effectiveness of these challenges was evaluated using Naïve-Bayesian machine learning algorithm. The results showed 80% accuracy in text identification and classification of Takri script.

This is a preview of subscription content, log in to check access.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12

References

  1. 1

    Fujisawa, Hiromichi, Yasuaki Nakano and Kiyomichi Kurino 1992 “Segmentation methods for character recognition: from segmentation to document structure analysis.” Proceedings of the IEEE 80(7): pp 1079–1092

    Article  Google Scholar 

  2. 2

    Govindan V K and Shivaprasad A P 1990 “Character recognition—a review.” Pattern Recognition 23(7): pp 671–683

    Article  Google Scholar 

  3. 3

    Mantas J 1986 “An overview of character recognition methodologies.” Pattern recognition 19(6): pp 425-430

    Article  Google Scholar 

  4. 4

    Kumar Sesh K S, Anoop M Namboodiri and Jawahar C V 2006 “Learning segmentation of documents with complex scripts.” Computer Vision, Graphics and Image Processing. Springer, Berlin, Heidelberg, pp 749-760

  5. 5

    Obaidullah S K Md et al. 2014 “Script identification from printed Indian document images and performance evaluation using different classifiers.” Applied Computational Intelligence and Soft Computing 2014”, p 22

  6. 6

    Mule, Gun. “akara. 1974. к к [The Story of Indian Scripts].” Dillı: Rajakamala Prakasana

    Google Scholar 

  7. 7

    Ghosh D, Dube T and Shivaprasad A 2010, “Script Recognition—A Review,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(12): pp. 2142-2161

    Article  Google Scholar 

  8. 8

    Lehal, Gurpreet S 2009 “A Complete Machine-Printed Gurmukhi OCR System.” Guide to OCR for Indic Scripts. Springer, pp 43-71 London

  9. 9

    Lakshmi Cc Vasanthaand Patvardhan C 2004”An optical character recognition system for printed Telugu text.” Pattern Analysis and Applications 7(2): pp 190-204

  10. 10

    Saeeda Naz et al. “The optical character recognition of Urdu-like cursive scripts.” Pattern Recognition 47.3, pp 1229-1248, 2014.

  11. 11

    Chaudhuri B B and PalU1997 “An OCR system to read two Indian language scripts: Bangla and Devanagari (Hindi).” Document Analysis and Recognition, Proceedings of the Fourth International Conference on. Vol. 2. IEEE

  12. 12

    Mohanty, Sanghamitra and Hemanta Kumar Behera 2004 “A complete OCR development system for Oriya script.” Proceedings of SIMPLE 4,

  13. 13

    Kunte RSanjeev and Sudhaker Samuel R D 2007”A simple and efficient optical character recognition system for basic symbols in printed Kannada text.” Sadhana 32(5):521

  14. 14

    Roy Partha Pratim et al. 2016”HMM-based Indic handwritten word recognition using zone segmentation.” Pattern Recognition 60”, pp. 1057-1075

  15. 15

    Pal U and Chaudhuri B B 2004 “Indian script character recognition: a survey.” Pattern Recognition 37(9):pp 1887-1899

    Article  Google Scholar 

  16. 16

    Umapada Pal, Ramachandran Jayadevan and Nabin Sharma 2012 Handwriting Recognition in Indian Regional Scripts: A Survey of Offline Techniques. 11, 1, Article 1

  17. 17

    Ishida, Richard 2002”An introduction to Indic scripts.” In: Proceedings of the 22nd Int. Unicode Conference

  18. 18

    Pandey and Anshuman. 2009 Proposal to Encode the Takri Script in ISO/IEC 10646. Vol. 2. L2/09-424). http://www.unicode.org

  19. 19

    Vogel J Ph. 1911”Antiquities of Chamba State, I, Calcutta,” pi. XXXI: 218

  20. 20

    Chhabra, B Ch. 1957 “Antiquities of Chamba State, Part II, Medieval and Later Inscriptions.” Memoirs of the Archaeological Survey of India. New Delhi, Government of India Press,

    Google Scholar 

  21. 21

    Charak, Sukh Dev Singh and Maharaja Ranbir Singh. 1985 Life and Times of Maharaja Ranbir Singh, 1830-1885. Jay and Kay Book House

  22. 22

    Shivanath 1997 Two Decades of Dogri Literature. Sahitya Akademi, New Delhi

  23. 23

    Kaul P K 2001 Antiquities of the Chenāb Valley in Jammu: inscriptions-copper plates, sanads, grants, firmāns and letters in Brāhmi-Shārdā-Tākri-Persian and Devnāgri scripts. Eastern Book Linkers, Delhi

  24. 24

    Pandey and Anshuman. 2015 Preliminary Proposal to Encode the Dogra Script in Unicode. Vol. 2. L2/15-213). http://www.unicode.org

  25. 25

    Pandey and Anshuman. 2010 A Roadmap for Scripts of the Landa Family. No. 3766. N3766 L2/10-011R. February 9, 2010. http://std.dkuug.dk/JTC1/SC2/WG2/docs

  26. 26

    Casey, Richard G and Eric Lecolinet 1996 “A survey of methods and strategies in character segmentation.” IEEE transactions on Pattern Analysis and Machine Intelligence 18(7):, pp 690-706

    Article  Google Scholar 

  27. 27

    Lehal G Sand Chandan Singh 2001 “A technique for segmentation of Gurmukhi text.” International Conference on Computer Analysis of Images and Patterns. Springer, Berlin, Heidelberg

  28. 28

    Jindal, Manish Kumar, Rajendra Kumar Sharma and Gurpreet Singh Lehal 2007”A study of different kinds of degradation in printed Gurmukhi script.” Computing: Theory and Applications, 2007. ICCTA’07. International Conference on. IEEE,

  29. 29

    Jindal M K, Lehal G S and . Sharma R K. 2005 “Segmentation problems and solutions in printed Degraded Gurmukhi Script.” International Journal of Signal Processing 2(4): pp 258-267

    Google Scholar 

  30. 30

    Jindal, Manish Kumar, Rajendra Kumar Sharma and Gurpreet Singh Lehal 2009 “Segmentation of touching characters in upper zone in printed Gurmukhi script.” Proceedings of the 2nd Bangalore Annual Compute Conference. ACM,

  31. 31

    Lehal, Gurpreet S and Chandan Singh 1999 “Feature extraction and classification for OCR of Gurmukhi script.” VIVEK-BOMBAY- 12(2): pp 2-12

    Google Scholar 

  32. 32

    Kumar, Rajiv and Amardeep Singh 2011 “Character Segmentation in Gurmukhi Handwritten Text using Hybrid Approach.” International Journal of Computer Theory and Engineering 3(4):392

  33. 33

    Sharma, Rajiv K and Amardeep Singh 2008”Segmentation of Handwritten Text in Gurmukhi Script.” International Journal of Image Processing 2(3): pp. 12-17

  34. 34

    Kumar, Munish M K Jindal and Sharma R K 2014 “Segmentation of isolated and touching characters in offline handwritten Gurmukhi script recognition.” International Journal of Information Technology and Computer Science 6(2): pp. 58-63

  35. 35

    Kaur, Davinder and Rupinder Kaur Gurm 2016 “Machine Printed Gurmukhi Numerals Recognition using Convolutional Neural NetworksInternational Journal of Technology and Computing (IJTC). Vol. 2. No. 8 (August, 2016). Techlive Solutions, 2016

  36. 36

    Singh, Pritpal and Sumit Budhiraja 2011 “Feature extraction and classification techniques in OCR systems for handwritten Gurmukhi Script–a survey.” International Journal of Engineering Research and Applications (IJERA) 1(4): pp. 1736-1739

    Google Scholar 

  37. 37

    Davessar, Neena Madan, Sunil Madan and Hardeep Singh 2003 “A hybrid approach to character segmentation of Gurmukhi script characters.” Applied Imagery Pattern Recognition Workshop, 2003. Proceedings. 32nd. IEEE

  38. 38

    Kaur, Sandeep and Rekha Bhatia 2016 “Gurmukhi Printed Character Recognition using Hierarchical Centroid Method and SVM.” International Journal of Computer Applications 149(3):

  39. 39

    Liang, Su, Malayappan Shridhar and Majid Ahmadi 1994 “Segmentation of touching characters in printed document recognition.”

  40. 40

    Kahan, Simon, Theo Pavlidis and Henry S Baird 1987”On the recognition of printed characters of any font and size.” IEEE Transactions on Pattern Analysis and Machine Intelligence 2: pp 274-288

  41. 41

    Tsujimoto, Shuichi, and Haruo Asada 1992”Resolving ambiguity in segmenting touching characters.” Structured Document Image Analysis. Springer, Berlin, Heidelberg, pp 203-215

  42. 42

    Bose, Chinmoy B and Shyh-Shiaw Kuo 1994”Connected and degraded text recognition using hidden Markov model.” Pattern Recognition 27(10): pp 1345-1363

Download references

Acknowledgements

We thank Padmashri Vijay Sharma, Artist, and Takri expert, Bhuri Singh Museum, Chamba, H.P. for sharing his valuable expertise on this rare script and providing immense help for gathering rare books/material written in the script. We also thank Dr Shiv Nirmohi, a Dogri writer and Dr Sangeeta Sharma, State Archival Department, Jammu for providing all kinds of support needed for researching Takri script.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Shikha Magotra.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Magotra, S., Kaushik, B. & Kaul, A. A Comparative analysis for identification and classification of text segmentation challenges in Takri Script. Sādhanā 45, 146 (2020). https://doi.org/10.1007/s12046-020-01384-4

Download citation

Keywords

  • Connected component segmentation
  • Gurmukhi
  • Takri script
  • touching characters
  • half- forms
  • overlapping bounding boxes