Abstract
The feature selection method we are presenting in this chapter relies on web scale N-gram counts. It uses counts collected from the web in order to rank candidates. Features are thus created from unlabeled data, a strategy which is part of a growing trend in natural language processing. Disambiguation results obtained by web N-gram feature selection will be compared to those of previous approaches that equally rely on an underlying Naïve Bayes model but on completely different feature sets. Test results corresponding to the main parts of speech (nouns, adjectives, verbs) will show that web N-gram feature selection for the Naïve Bayes model is a reliable alternative to other existing approaches, provided that a “quality list” of features, adapted to the part of speech, is used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Which are the same as those considered in Chap. 3.
- 2.
- 3.
Which are the same as those considered in Chap. 3.
- 4.
Reprinted here from (Preoţiuc and Hristea 2012).
- 5.
Reprinted here from (Preoţiuc and Hristea 2012).
- 6.
Reprinted here from (Preoţiuc and Hristea 2012).
- 7.
Reprinted here from (Preoţiuc and Hristea 2012).
- 8.
Together with Daniel Preoţiuc.
References
Bergsma, S., Lin, D., Goebel, R.: Web-scale N-gram models for lexical disambiguation. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence, pp. 1507–1512. Pasadena, California (2009)
Bergsma, S., Pitler, E., Lin, D.: Creating robust supervised classifiers via web-scale N-gram data. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL ’10), pp. 865–874. Uppsala, Sweden (2010)
Brants, T., Franz, A.: Web 1T 5-gram corpus version 1.1. Technical Report, Google Research (2006)
Brants, T., Franz, A.: Web 1T 5-gram, 10 European languages version 1. Technical Report, Linguistic Data Consortium, Philadelphia (2009)
Bruce, R., Wiebe, J., Pedersen, T.: The Measure of a Model, CoRR, cmp-lg/9604018 (1996)
Chang, C.Y., Clark, S.: Linguistic steganography using automatically generated paraphrases. In: Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT ’10), pp. 591–599. Los Angeles, California (2010)
Hristea, F.: Recent advances concerning the usage of the Naïve Bayes model in unsupervised word sense disambiguation. Int. Rev. Comput. Softw. 4(1), 58–67 (2009)
Hristea, F., Popescu, M., Dumitrescu, M.: Performing word sense disambiguation at the border between unsupervised and knowledge-based techniques. Artif. Intell. Rev. 30(1), 67–86 (2008)
Hristea, F., Popescu, M.: Adjective sense disambiguation at the border between unsupervised and knowledge-based techniques. Fundam. Inform. 91(3–4), 547–562 (2009)
Islam, A., Inkpen, D.: Real-word spelling correction using Google Web IT 3-grams. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP ’09), pp. 1241–1249. Singapore (2009)
Leacock, C., Towell, G., Voorhees, E.: Corpus-based statistical sense resolution. In: Proceedings of the ARPA Workshop on Human Language Technology, pp. 260–265. Princeton, New Jersey (1993)
Pedersen, T., Bruce, R.: Knowledge lean word-sense disambiguation. In: Proceedings of the 15th National Conference on Artificial Intelligence, pp. 800–805. Madison, Wisconsin (1998)
Preoţiuc-Pietro, D., Hristea, F.: Unsupervised word sense disambiguation with N-gram features. Artif. Intell. Rev. doi:10.1007/s10462-011-9306-y (2012)
Yuret, D.: KU: Word sense disambiguation by substitution. In: Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval ’07), pp. 207–214. Prague (2007)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2013 The Author(s)
About this chapter
Cite this chapter
Hristea, F.T. (2013). N-Gram Features for Unsupervised WSD with an Underlying Naïve Bayes Model . In: The Naïve Bayes Model for Unsupervised Word Sense Disambiguation. SpringerBriefs in Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33693-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-33693-5_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33692-8
Online ISBN: 978-3-642-33693-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)