Skip to main content

Comparison of different POS Tagging Techniques (n-gram, HMM and Brill’s tagger) for Bangla

  • Conference paper
Book cover Advances and Innovations in Systems, Computing Sciences and Software Engineering

Abstract

There are different approaches to the problem of assigning each word of a text with a parts-of-speech tag, which is known as Part-Of-Speech (POS) tagging. In this paper we compare the performance of a few POS tagging techniques for Bangla language, e.g. statistical approach (n-gram, HMM) and transformation based approach (Brill’s tagger). A supervised POS tagging approach requires a large amount of annotated training corpus to tag properly. At this initial stage of POS-tagging for Bangla, we have very limited resource of annotated corpus. We tried to see which technique maximizes the performance with this limited resource. We also checked the performance for English and tried to conclude how these techniques might perform if we can manage a substantial amount of annotated corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. The Summer Institute for Linguistics (SIL) Ethnologue Survey (1999).

    Google Scholar 

  2. Daniel Jurafsky and James H. Martin, Chapter 8: Word classes and Part-Of-Speech Tagging, Speech and Language Processing, Prentice Hall, 2000.

    Google Scholar 

  3. Yair Halevi, Part of Speech Tagging, Seminar in Natural Language Processing and Computational Linguistics (Prof. Nachum Dershowitz), School of Computer Science, Tel Aviv University, Israel, April 2006.

    Google Scholar 

  4. B. Greene and G. Rubin, Automatic Grammatical Tagging of English, Technical Report, Department of Linguistics, Brown University, Providence, Rhode Island, 1971.

    Google Scholar 

  5. S. Klein and R. Simmons, A computational approach to grammatical coding of English words, JACM 10, 1963.

    Google Scholar 

  6. Z. Harris, String Analysis of Language Structure, Mouton and Co., The Hague, 1962.

    Google Scholar 

  7. L. Bahl and R. L. Mercer, Part-Of-Speech assignment by a statistical decision algorithm, IEEE International Symposium on Information Theory, pages: 88 - 89, 1976.

    Google Scholar 

  8. K. W. Church, A stochastic parts program and noun phrase parser for unrestricted test, In proceeding of the Second Conference on Applied Natural Language Processing, pages: 136 - 143, 1988.

    Google Scholar 

  9. D. Cutting, J. Kupiec, J. Pederson and P. Sibun, A practical Part-Of-Speech Tagger, In proceedings of the Third Conference on Applied Natural Language Processing, pages: 133 - 140, ACL, Trento, Italy, 1992.

    Book  Google Scholar 

  10. S. J. DeRose, Grammatical Category Disambiguation by Statistical Optimization, Computational Linguistics, 14 (1), 1988

    Google Scholar 

  11. Helmut Schmid, Probabilistic Part-Of-Speech Tagging using Decision Trees, In Proceedings of The International Conference on new methods in language processing, page 44 - 49, Manchester, UK, 1994.

    Google Scholar 

  12. Eric Brill, A simple rule based part of speech tagger, In Proceedings of the Third Conference on Applied Natural Language Processing, ACL, Trento, Italy, 1992.

    Google Scholar 

  13. Eric Brill, Automatic grammar induction and parsing free text: A transformation based approach, In proceedings of 31st Meeting of the Association of Computational Linguistics, Columbus, Oh, 1993.

    Google Scholar 

  14. Eric Brill, Transformation based error driven parsing, In Proceedings of the Third International Workshop on Parsing Technologies, Tilburg, The Netherlands, 1993.

    Google Scholar 

  15. Eric Brill, Some advances in rule based part of speech tagging, In Proceedings of The Twelfth National Conference on Artificial Intelligence (AAAI-94), Seattle, Washington, 1994.

    Google Scholar 

  16. Robbert Prins and Gertjan van Noord, Unsupervised Pos-Tagging Improves Parsing Accuracy And Parsing Efficiency, In Proceedings of the International Workshop on Parsing Technologies, 2001.

    Google Scholar 

  17. Mihai Pop, Unsupervised Part-of-speech Tagging, Department of Computer Science, Johns Hopkins University, 1996.

    Google Scholar 

  18. Eric Brill, Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging, In Proceeding of The Natural Language Processing Using Very Large Corpora, Boston, MA, 1997.

    Google Scholar 

  19. Linda Van Guilder, Automated Part of Speech Tagging: A Brief Overview, Handout for LING361, Fall 1995, Georgetown University.

    Google Scholar 

  20. Sandipan Dandapat, Sudeshna Sarkar and Anupan Basu, A Hybrid Model for Part-Of-Speech Tagging and its Application to Bengali, In Proceedings of the International Journal of Information Technology, Volume 1, Number 4.

    Google Scholar 

  21. [21] Md. Shahnur Azad Chowdhury, Nahid Mohammad Minhaz Uddin, Mohammad Imran, Mohammad Mahadi Hassan, and Md. Emdadul Haque, Parts of Speech Tagging of Bangla Sentence, In Proceeding of the 7th International Conference on Computer and Information Technology (ICCIT), Bangladesh, 2004.

    Google Scholar 

  22. Md. Hanif Seddiqui, A. K. Muhammad Shohel Rana, Abdullah Al Mahmud and Taufique Sayeed, Parts of Speech Tagging Using Morphological Analysis in Bangla, In Proceeding of the 6$th$ International Conference on Computer and Information Technology (ICCIT), Bangladesh, 2003.

    Google Scholar 

  23. Brown Tagset, available online at: http://www.scs.leeds.ac.uk/amalgam/tagsets/brown.html

    Google Scholar 

  24. Mitchell P. Marcus, Beatrice Santorini and Mary Ann Marcinkiewicz, Building a Large Annotated Corpus of English: The Penn Treebank, Computational Linguistics Journal, Volume 19,Number 2, Pages: 313-330, 1994. Available online at: http://www.ldc.upenn.edu/Catalog/docs/treebank2/cl93.html

    Google Scholar 

  25. NLTK, The Natural Language Toolkit, available online at: http://nltk.sourceforge.net/index.html

    Google Scholar 

  26. NLTK’s tagger documentation, available online at: http://nltk.sourceforge.net/tutorial/tagging.pdf

    Google Scholar 

  27. Bangla Newspaper, Prothom-Alo. Online version available online at: http://www.prothom-alo.net

    Google Scholar 

  28. Bangla POS Tagset used in our Bangla POS tagger, available online at http://www.naushadzaman.com/bangla_tagset.pdf

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer

About this paper

Cite this paper

Hasan, F.M., UzZaman, N., Khan, M. (2007). Comparison of different POS Tagging Techniques (n-gram, HMM and Brill’s tagger) for Bangla. In: Elleithy, K. (eds) Advances and Innovations in Systems, Computing Sciences and Software Engineering. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-6264-3_23

Download citation

  • DOI: https://doi.org/10.1007/978-1-4020-6264-3_23

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-1-4020-6263-6

  • Online ISBN: 978-1-4020-6264-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics