Skip to main content

N-Gram Features for Unsupervised WSD with an Underlying Naïve Bayes Model

  • Chapter
  • First Online:
The Naïve Bayes Model for Unsupervised Word Sense Disambiguation

Part of the book series: SpringerBriefs in Statistics ((BRIEFSSTATIST))

  • 1261 Accesses

Abstract

The feature selection method we are presenting in this chapter relies on web scale N-gram counts. It uses counts collected from the web in order to rank candidates. Features are thus created from unlabeled data, a strategy which is part of a growing trend in natural language processing. Disambiguation results obtained by web N-gram feature selection will be compared to those of previous approaches that equally rely on an underlying Naïve Bayes model but on completely different feature sets. Test results corresponding to the main parts of speech (nouns, adjectives, verbs) will show that web N-gram feature selection for the Naïve Bayes model is a reliable alternative to other existing approaches, provided that a “quality list” of features, adapted to the part of speech, is used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 29.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 39.95
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Which are the same as those considered in Chap. 3.

  2. 2.

    Which are the same as those considered in Chaps. 3 and 4.

  3. 3.

    Which are the same as those considered in Chap. 3.

  4. 4.

    Reprinted here from (Preoţiuc and Hristea 2012).

  5. 5.

    Reprinted here from (Preoţiuc and Hristea 2012).

  6. 6.

    Reprinted here from (Preoţiuc and Hristea 2012).

  7. 7.

    Reprinted here from (Preoţiuc and Hristea 2012).

  8. 8.

    Together with Daniel Preoţiuc.

References

  • Bergsma, S., Lin, D., Goebel, R.: Web-scale N-gram models for lexical disambiguation. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence, pp. 1507–1512. Pasadena, California (2009)

    Google Scholar 

  • Bergsma, S., Pitler, E., Lin, D.: Creating robust supervised classifiers via web-scale N-gram data. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL ’10), pp. 865–874. Uppsala, Sweden (2010)

    Google Scholar 

  • Brants, T., Franz, A.: Web 1T 5-gram corpus version 1.1. Technical Report, Google Research (2006)

    Google Scholar 

  • Brants, T., Franz, A.: Web 1T 5-gram, 10 European languages version 1. Technical Report, Linguistic Data Consortium, Philadelphia (2009)

    Google Scholar 

  • Bruce, R., Wiebe, J., Pedersen, T.: The Measure of a Model, CoRR, cmp-lg/9604018 (1996)

    Google Scholar 

  • Chang, C.Y., Clark, S.: Linguistic steganography using automatically generated paraphrases. In: Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT ’10), pp. 591–599. Los Angeles, California (2010)

    Google Scholar 

  • Hristea, F.: Recent advances concerning the usage of the Naïve Bayes model in unsupervised word sense disambiguation. Int. Rev. Comput. Softw. 4(1), 58–67 (2009)

    Google Scholar 

  • Hristea, F., Popescu, M., Dumitrescu, M.: Performing word sense disambiguation at the border between unsupervised and knowledge-based techniques. Artif. Intell. Rev. 30(1), 67–86 (2008)

    Article  Google Scholar 

  • Hristea, F., Popescu, M.: Adjective sense disambiguation at the border between unsupervised and knowledge-based techniques. Fundam. Inform. 91(3–4), 547–562 (2009)

    MATH  MathSciNet  Google Scholar 

  • Islam, A., Inkpen, D.: Real-word spelling correction using Google Web IT 3-grams. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP ’09), pp. 1241–1249. Singapore (2009)

    Google Scholar 

  • Leacock, C., Towell, G., Voorhees, E.: Corpus-based statistical sense resolution. In: Proceedings of the ARPA Workshop on Human Language Technology, pp. 260–265. Princeton, New Jersey (1993)

    Google Scholar 

  • Pedersen, T., Bruce, R.: Knowledge lean word-sense disambiguation. In: Proceedings of the 15th National Conference on Artificial Intelligence, pp. 800–805. Madison, Wisconsin (1998)

    Google Scholar 

  • Preoţiuc-Pietro, D., Hristea, F.: Unsupervised word sense disambiguation with N-gram features. Artif. Intell. Rev. doi:10.1007/s10462-011-9306-y (2012)

  • Yuret, D.: KU: Word sense disambiguation by substitution. In: Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval ’07), pp. 207–214. Prague (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Florentina T. Hristea .

Rights and permissions

Reprints and permissions

Copyright information

© 2013 The Author(s)

About this chapter

Cite this chapter

Hristea, F.T. (2013). N-Gram Features for Unsupervised WSD with an Underlying Naïve Bayes Model . In: The Naïve Bayes Model for Unsupervised Word Sense Disambiguation. SpringerBriefs in Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33693-5_5

Download citation

Publish with us

Policies and ethics