A Comparative analysis for identification and classification of text segmentation challenges in Takri Script


Takri is an Indian regional class of scripts, used in hilly areas of north-west India which include Jammu and Kashmir (J & K), Himachal Pradesh (H.P.), Punjab and Uttarakhand. This script has immense variations; almost 13 identified in the whole region of North-west India. It has been observed that no work for text identification and recognition of Takri script has been done so far. Therefore, our work focuses on identifying and classifying the various challenges in the script based on comparative analysis of existing text segmentation approaches, as correct segmentation of text leads to more accurate machine recognition. As there were no metal fonts available for the script, it is required to collect the machine-printed form of data for solving the text identification problem in Takri script. The paper surveys for different text segmentation approaches and based on the structural properties of the script, shows an implementation of these on Takri data in three steps- Gurmukhi segmentation technique, Connected Component segmentation approach, and Gurmukhi touching characters segmentation approach. Results are analyzed for Segmentation Accuracy and Challenges are identified along with their statistical analysis. Further, the challenges identified as half- forms, numerous types of touching characters, overlapping bounding boxes, are classified. The effectiveness of these challenges was evaluated using Naïve-Bayesian machine learning algorithm. The results showed 80% accuracy in text identification and classification of Takri script.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12


We thank Padmashri Vijay Sharma, Artist, and Takri expert, Bhuri Singh Museum, Chamba, H.P. for sharing his valuable expertise on this rare script and providing immense help for gathering rare books/material written in the script. We also thank Dr Shiv Nirmohi, a Dogri writer and Dr Sangeeta Sharma, State Archival Department, Jammu for providing all kinds of support needed for researching Takri script.

  • Connected component segmentation
  • Gurmukhi
  • Takri script
  • touching characters
  • half- forms
  • overlapping bounding boxes