N-Gram Features for Unsupervised WSD with an Underlying Naïve Bayes Model

Hristea, Florentina T.

doi:10.1007/978-3-642-33693-5_5

Florentina T. Hristea²

Part of the book series: SpringerBriefs in Statistics ((BRIEFSSTATIST))

1261 Accesses

Abstract

The feature selection method we are presenting in this chapter relies on web scale N-gram counts. It uses counts collected from the web in order to rank candidates. Features are thus created from unlabeled data, a strategy which is part of a growing trend in natural language processing. Disambiguation results obtained by web N-gram feature selection will be compared to those of previous approaches that equally rely on an underlying Naïve Bayes model but on completely different feature sets. Test results corresponding to the main parts of speech (nouns, adjectives, verbs) will show that web N-gram feature selection for the Naïve Bayes model is a reliable alternative to other existing approaches, provided that a “quality list” of features, adapted to the part of speech, is used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 29.99; Price excludes VAT (USA)

Softcover Book: USD 39.95; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Which are the same as those considered in Chap. 3.
2.
Which are the same as those considered in Chaps. 3 and 4.
3.
Which are the same as those considered in Chap. 3.
4.
Reprinted here from (Preoţiuc and Hristea 2012).
5.
Reprinted here from (Preoţiuc and Hristea 2012).
6.
Reprinted here from (Preoţiuc and Hristea 2012).
7.
Reprinted here from (Preoţiuc and Hristea 2012).
8.
Together with Daniel Preoţiuc.

References

Bergsma, S., Lin, D., Goebel, R.: Web-scale N-gram models for lexical disambiguation. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence, pp. 1507–1512. Pasadena, California (2009)
Google Scholar
Bergsma, S., Pitler, E., Lin, D.: Creating robust supervised classifiers via web-scale N-gram data. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL ’10), pp. 865–874. Uppsala, Sweden (2010)
Google Scholar
Brants, T., Franz, A.: Web 1T 5-gram corpus version 1.1. Technical Report, Google Research (2006)
Google Scholar
Brants, T., Franz, A.: Web 1T 5-gram, 10 European languages version 1. Technical Report, Linguistic Data Consortium, Philadelphia (2009)
Google Scholar
Bruce, R., Wiebe, J., Pedersen, T.: The Measure of a Model, CoRR, cmp-lg/9604018 (1996)
Google Scholar
Chang, C.Y., Clark, S.: Linguistic steganography using automatically generated paraphrases. In: Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT ’10), pp. 591–599. Los Angeles, California (2010)
Google Scholar
Hristea, F.: Recent advances concerning the usage of the Naïve Bayes model in unsupervised word sense disambiguation. Int. Rev. Comput. Softw. 4(1), 58–67 (2009)
Google Scholar
Hristea, F., Popescu, M., Dumitrescu, M.: Performing word sense disambiguation at the border between unsupervised and knowledge-based techniques. Artif. Intell. Rev. 30(1), 67–86 (2008)
Article Google Scholar
Hristea, F., Popescu, M.: Adjective sense disambiguation at the border between unsupervised and knowledge-based techniques. Fundam. Inform. 91(3–4), 547–562 (2009)
MATH MathSciNet Google Scholar
Islam, A., Inkpen, D.: Real-word spelling correction using Google Web IT 3-grams. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP ’09), pp. 1241–1249. Singapore (2009)
Google Scholar
Leacock, C., Towell, G., Voorhees, E.: Corpus-based statistical sense resolution. In: Proceedings of the ARPA Workshop on Human Language Technology, pp. 260–265. Princeton, New Jersey (1993)
Google Scholar
Pedersen, T., Bruce, R.: Knowledge lean word-sense disambiguation. In: Proceedings of the 15th National Conference on Artificial Intelligence, pp. 800–805. Madison, Wisconsin (1998)
Google Scholar
Preoţiuc-Pietro, D., Hristea, F.: Unsupervised word sense disambiguation with N-gram features. Artif. Intell. Rev. doi:10.1007/s10462-011-9306-y (2012)
Yuret, D.: KU: Word sense disambiguation by substitution. In: Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval ’07), pp. 207–214. Prague (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Mathematics and Computer Science, Department of Computer Science, University of Bucharest, Emil Racovita 12, 041758, Bucharest, Romania
Florentina T. Hristea

Authors

Florentina T. Hristea
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Florentina T. Hristea .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hristea, F.T. (2013). N-Gram Features for Unsupervised WSD with an Underlying Naïve Bayes Model . In: The Naïve Bayes Model for Unsupervised Word Sense Disambiguation. SpringerBriefs in Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33693-5_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-33693-5_5
Published: 08 November 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33692-8
Online ISBN: 978-3-642-33693-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics