POS Tagging and Structural Annotation of Handwritten Text Image Corpus of Devnagari Script

  • Maninder Singh NehraEmail author
  • Neeta Nain
  • Mushtaq Ahmed
  • Deepa Modi
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 985)


Natural Language Processing (NLP) germaneness required a large benchmark annotated dataset. Handwritten and impressed text corpus plays a momentous role in pattern recognition algorithm for benchmarking. Part-of-speech tagging is very recurrent and subjugated types of annotation. Because POS tagging is significant to many linguistic annotations like lemmatization, syntactic parsing, semantic annotation, etc. Part-of-Speech tagging together with the structural annotations of handwritten text image corpus of Devnagari script of 1300 handwritten form collected from different geographical location and demographics are narrating in this paper.


Corpus Hindi Handwritten POS and annotation 


  1. 1.
    Swaran: Challenges of Multilingual Web in India: Technology Development & Standardization Perspective. Department Information technology Government of IndiaGoogle Scholar
  2. 2.
    McCarthy, M.: From Corpus to Course Book. Cambridge University Press (2004).
  3. 3.
    Tim, B., et al.: A corpus for the evaluation of lossless compression algorithms. In: IEEE Conference, pp. 201–210 (1997)Google Scholar
  4. 4.
    Prakash, C., et al.: An annotated urdu corpus of handwritten text image and benchmarking of corpus. In: MIPRO 2014, Croatia (2014)Google Scholar
  5. 5.
    Deepa, M., et al.: A survey of techniques for two-level corpus annotation for Hindi. Int. Bull. Math. Res. 2, 194–206 (2015)Google Scholar
  6. 6.
    Francis, W.N.: Brown Corpus Manual. Brown University, July 1979Google Scholar
  7. 7.
    Marti, U.-V., Bunke, H.: A full English sentence database for off-line handwriting recognition. In: Proceedings of the 5th ICDAR, pp. 705–708 (1999)Google Scholar
  8. 8.
    Raza, A., Abidi, A.: An unconstrained benchmark urdu handwritten sentence database with automatic line segmentation. In: ICFHR, pp. 489–494 (2012)Google Scholar
  9. 9.
    Marti, U.-V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. In: ICDAR, pp. 35–39 (2002)CrossRefGoogle Scholar
  10. 10.
    Johansson, S., Leech, G.: The Tagged Lob Corpus. Norwegian Computing Centre for the Humanities, Bergen (1986)Google Scholar
  11. 11.
    Sutat, S., Methasate, L.: Thai handwritten character corpus. In: ISCIT, pp. 486–491 (2004)Google Scholar
  12. 12.
    Francisco, V., Jesús, A.: Off-line handwritten signature GPDS-960 corpus. In: Proceedings of Ninth International Conference on Document Analysis and Recognition, pp. 764–768 (2007)Google Scholar
  13. 13.
    Christian, V.-C., et al.: The IRESTE On/Off (IRONOFF) dual handwriting database. In: Proceedings of 5th ICDAR (1999)Google Scholar
  14. 14.
    Choudhary, P., Nain, N.: A four-tier annotated urdu handwritten text image dataset for multidisciplinary research on urdu script. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 15(4), 26 (2016)CrossRefGoogle Scholar
  15. 15.
    Garg, N., et al.: Rule- based Hindi part-of-speech tagger. In: COLING 2012, pp. 163–174 (2012)Google Scholar
  16. 16.
    Sawant, U., et al.: Hindi part-of-speech tagging and chunking: a maximum entropy approach. In: Proceeding of the NLPAI Machine Learning, Mumbai, India (2006)Google Scholar
  17. 17.
    Shrivastava, M., et al.: Hindi pos tagger using naïve stemming: harnessing morphological information without extensive linguistic knowledge (2008)Google Scholar
  18. 18.
    Kuhoo, G., et al.: Morphological richness offsets resource poverty-an experience in building a pos tagger for Hindi. In: COLING, pp. 779–786 (2006)Google Scholar
  19. 19.
    Dalal, A., et al.: Building feature rich pos tagger for morphologically rich languages: experiences in Hindi. In: ICON 2007Google Scholar
  20. 20.
    Fayyad, U.M., et al.: Advances in Knowledge Discovery and Data Mining. American Association for Artificial Intelligence, Menlo Park (1996)Google Scholar
  21. 21.
    Mittal, N., Agarwal, B., Chouhan, G., Bania, N., Pareek, P.: Sentiment analysis of hindi reviews based on negation and discourse relation. In: The Proceedings of 11th Workshop on Asian Language Resources, (in conjunction with IJCNLP-2013), pp. 45–50 (2013)Google Scholar
  22. 22.
    Yadav, M., Purwar, R.K., Mittal, M.: Handwritten hindi character recognition-a review. IET Image Proc. 12(11), 1919–1933 (2018). Scholar
  23. 23.
    Mittal, N., Agarwal, B., Chouhan, G., Pareek, P., Bania, N.: Discourse based sentiment analysis for hindi reviews. In: Maji, P., Ghosh, A., Murty, M.N., Ghosh, K., Pal, S.K. (eds.) PReMI 2013. LNCS, vol. 8251, pp. 720–725. Springer, Heidelberg (2013). Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Maninder Singh Nehra
    • 1
    Email author
  • Neeta Nain
    • 1
  • Mushtaq Ahmed
    • 1
  • Deepa Modi
    • 2
  1. 1.Malaviya National Institute of TechnologyJaipurIndia
  2. 2.Swami Keshvanand Institute of Technology, Management and GramothanJaipurIndia

Personalised recommendations