“Part of Speech Tagging – A Corpus Based Approach”

Rashmi, S.; Hanumanthappa, M.

doi:10.1007/978-981-10-3433-6_11

S. Rashmi¹⁹ &
M. Hanumanthappa¹⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 628))

Included in the following conference series:

International Conference on Smart Trends for Information Technology and Computer Communications

2151 Accesses

Abstract

POS tagging, an ideal way to augment a corpus is an imperative abstraction for text mining. However with an increase in the amount of linguistic errors and distinctive fashion of language ambiguities, the data filtered by POS tagging is noisier. In this paper, probabilistic tagging and tagging based on Markov models are combined to estimate the association probabilities. Based on this combined approach, error estimation model is defined. Comparison study is made on different corpus available in NLTK such as Crubadan, Brown and INSPEC. The results obtained by the proposed methodologies show a drastic increase in the accuracy rate of about 98% when compared to the existing algorithms which shows an average of 96% accurate. The performance measure is plotted to calculate the error ratio across the maximum-likelihood estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Das, D.: Unsupervised part-of-speech tagging with bilingual graph-based projections. In: The 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, USA, pp. 600–609, June 2011
Google Scholar
Goldwater, S.: A fully Bayesian approach to unsupervised part-of-speech tagging. In: Association for Computational Linguistics, vol. 45, p. 744 (2007)
Google Scholar
Lee, Y.K.: Simple type-level unsupervised POS tagging. In: Association for Computational Linguistics Conference on Empirical Methods in Natural Language Processing, Cambridge, MA, pp. 853–861, October 2010
Google Scholar
de Gruyter, W.: Corpus Linguistics: An International Handbook, vol. 1, ISBN 978-3-11-021142-9
Google Scholar
Derczynski, L.: Twitter part-of-speech tagging for all: overcoming sparse and noisy data. In: Recent Advances in Natural Language Processing, Hissar, Bulgaria, pp. 198–206, pp. 7–13, September 2013
Google Scholar
Ritter, A.: Named entity recognition in tweets: an experimental study. In: Association for Computational Linguistics Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Applications, Bangalore University, Bangalore, 560056, India
S. Rashmi & M. Hanumanthappa

Authors

S. Rashmi
View author publications
You can also search for this author in PubMed Google Scholar
M. Hanumanthappa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Rashmi .

Editor information

Editors and Affiliations

Stanford University, Stanford, CA, USA
Aynur Unal
IT Buzz Limited, Dagenham, UK
Malaya Nayak
Microsoft Innovation Centre, Sri Aurobindo Institute of Technology, Indore, Madhya Pradesh, India
Durgesh Kumar Mishra
Namibia University of Science and Technology, Windhoek, Namibia
Dharm Singh
Sabar Institute of Technology, Sabarkantha, India
Amit Joshi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rashmi, S., Hanumanthappa, M. (2016). “Part of Speech Tagging – A Corpus Based Approach”. In: Unal, A., Nayak, M., Mishra, D.K., Singh, D., Joshi, A. (eds) Smart Trends in Information Technology and Computer Communications. SmartCom 2016. Communications in Computer and Information Science, vol 628. Springer, Singapore. https://doi.org/10.1007/978-981-10-3433-6_11

Download citation

DOI: https://doi.org/10.1007/978-981-10-3433-6_11
Published: 27 December 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3432-9
Online ISBN: 978-981-10-3433-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics