Skip to main content

Semantic WordNet-Based Feature Selection

  • Chapter
  • First Online:

Part of the book series: SpringerBriefs in Statistics ((BRIEFSSTATIST))

Abstract

The feature selection method we are presenting in this chapter makes use of the semantic network WordNet as knowledge source for feature selection. The method makes ample use of the WordNet semantic relations which are typical of each part of speech, thus placing the disambiguation process at the border between unsupervised and knowledge-based techniques. Test results corresponding to the main parts of speech (nouns, adjectives, verbs) will be compared to previously existing disambiguation results, obtained when performing a completely different type of feature selection. Our main conclusion will be that the Naïve Bayes model reacts well in the presence of semantic knowledge provided by WN-based feature selection when acting as clustering technique for unsupervised WSD.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   29.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   39.95
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    See the mathematical model presented in Chap. 2.

  2. 2.

    In 1986 George Miller has the initiative of creating WordNet and designs its structure, which was meant to serve testing current theories concerning human semantic memory. Verbs are added to the network the following year (1987) and its first version (1.0) is released in 1991. Already in 2006 approximately 8000 download operations were registered on a daily basis and similar, more or less developed, semantic networks of type WordNet existed for some 40 other languages.

  3. 3.

    The Collins and Quillian model proposed a hierarchical structure of concepts, where more specific concepts inherit information from their superordinate, more general concepts. That is why only knowledge particular to more specific concepts needs to be stored with such concepts.

  4. 4.

    For a comprehensive description of WN see also Fellbaum (1998).

  5. 5.

    WordNet divides adjectives into two major classes: descriptive and relational. Descriptive adjectives are organized into clusters on the basis of binary opposition (antonymy) and similarity of meaning (Fellbaum 1998). Descriptive adjectives that do not have direct antonyms are said to have indirect antonyms by virtue of their semantic similarity to adjectives that do have direct antonyms. Relational adjectives are assumed to be stylistic variants of modifying nouns and are cross-referenced to the noun files (see the relation “relating-or-pertaining-to”). The function such adjectives play is usually that of classifying their head nouns (Fellbaum 1998).

  6. 6.

    The entailment relation between verbs resembles meronymy between nouns, but meronymy is better suited to nouns than to verbs (Fellbaum 1998).

  7. 7.

    The causal relation (Fellbaum 1998) picks out two verb concepts, one causative (like give), the other what might be called the “resultative” (like have).

  8. 8.

    See the mathematical model presented in Chap. 2.

  9. 9.

    Experiment referred to in Tables 3.5 and 3.7 as “Synonyms + Glosses”.

  10. 10.

    Experiment referred to in Tables 3.5 and 3.7 as “+Hyponyms + Meronyms”.

  11. 11.

    Experiment referred to in Tables 3.5 and 3.7 as “+Hyponyms + Glosses + Meronyms + Glosses”.

  12. 12.

    Experiment referred to in Tables 3.5 and 3.7 as “+Hyponyms + Hypernyms + Meronyms + Holonyms”.

  13. 13.

    Experiment referred to in Tables 3.5 and 3.7 as “+Hyponyms + Glosses + Hypernyms + Glosses + Meronyms + Glosses + Holonyms + Glosses”.

  14. 14.

    This is synset \(\{\)common\(\}\) having the gloss ‘belonging to or participated in by a community as a whole; public’.

  15. 15.

    This is synset \(\{\)common, mutual\(\}\) having the gloss ‘common to or shared by two or more parties’.

  16. 16.

    This is synset \(\{\)common\(\}\) having the gloss ‘to be expected; standard’.

  17. 17.

    This is synset \(\{\)common, usual\(\}\) having the gloss ‘commonly encountered’.

  18. 18.

    This is synset \(\{\)public\(\}\) having the gloss ‘affecting the people or community as a whole’.

  19. 19.

    This is synset \(\{\)public\(\}\) having the gloss ‘not private; open to or concerning the people as a whole’.

  20. 20.

    Referred to in Tables 3.8 and 3.9 as “all”.

  21. 21.

    Referred to in Tables 3.8 and 3.9 as “all-antonyms”.

  22. 22.

    These are the following:

    • synset {help, aid} having the ID 200082081 and the gloss ‘improve the condition of’;

    • synset {help} having the ID 200206998 and the gloss ‘improve; change for the better’;

    • synset {serve, help} having the ID 201181295 and the gloss ‘help to some food; help with food or drink’;

    • synset {avail, help} having the ID 201193569 and the gloss ‘take or use’;

    • synset {help, assist, aid} having the ID 202547586 and the gloss ‘give help or assistance; be of service’;

    • synset {help} having the ID 202555434 and the gloss ‘contribute to the furtherance of’.

  23. 23.

    In order to conduct their experiments the mentioned authors have chosen a number of sense groups equal to the number of sense tags existing in the corpus. Therefore a number of \(K!\) possible mappings (with \(K\) denoting the number of senses of the target word) should be taken into account. For a fixed mapping, its accuracy is given by the number of correct labellings (identical to the corresponding corpus sense tags) divided by the total number of instances. From the \(K!\) possible mappings, the one with maximum accuracy has been chosen.

  24. 24.

    Reprinted here from (Hristea et al. 2008).

  25. 25.

    The choice of this context window size is based on the suggestion of Lesk (1986) that the quantity of data available to the algorithm is one of the biggest factors to influence the quality of disambiguation. In this case, a larger context window allows the occurrence of a greater number of WN relevant words (with respect to the target), which are the only ones to participate in the creation of the disambiguation vocabulary.

  26. 26.

    See the mathematical model presented in Chap. 2.

  27. 27.

    Sense “product” occurs in 53,47 % of the line corpus instances.

  28. 28.

    Reprinted here from (Hristea et al. 2008).

  29. 29.

    Pedersen and Bruce (1998) also make use of Gibbs sampling for parameter estimation, without results improving significantly.

References

  • Banerjee, S., Pedersen, T.: An adapted Lesk algorithm for word sense disambiguation using WordNet. In: Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics, pp. 136–145 Mexico City (2002)

    Google Scholar 

  • Banerjee, S., Pedersen, T.: Extended gloss overlaps as a measure of semantic relatedness. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence, PP.805–810 Acapulco, Mexico (2003)

    Google Scholar 

  • Bruce, R., Wiebe, J., Pedersen, T.: The measure of a model. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 101–112 Philadelphia, PA (1996)

    Google Scholar 

  • Collins, A.H., Quillian, M.R.: Retrieval time from semantic memory. J. Verb. Learn. Verb. Be. 8, 240–247 (1969)

    Article  Google Scholar 

  • Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. The MIT Press, Cambridge (1998)

    Google Scholar 

  • Hristea, F.: Recent advances concerning the usage of the Naïve Bayes model in unsupervised word sense disambiguation. Int. Rev. Comput. Softw. 4(1), 58–67 (2009)

    Google Scholar 

  • Hristea, F., Popescu, M., Dumitrescu, M.: Performing word sense disambiguation at the border between unsupervised and knowledge-based techniques. Artif. Intell. Rev. 30(1), 67–86 (2008)

    Article  Google Scholar 

  • Hristea, F., Popescu, M.: Adjective sense disambiguation at the border between unsupervised and knowledge-based techniques. Fundam. Inf. 91(3–4), 547–562 (2009)

    Google Scholar 

  • Kay, M.: The concrete lexicon and the abstract dictionary. In: Proceedings of the Fifth Annual Conference of the UW Centre for the New Oxford English Dictionary, pp. 35–41 (1989)

    Google Scholar 

  • Leacock, C., Towell, G., Voorhees, E.: Corpus-based statistical sense resolution. In: Proceedings of the ARPA Workshop on Human Language Technology, Princeton, pp. 260–265 New Jersey (1993)

    Google Scholar 

  • Miller, G.A.: Nouns in WordNet: a lexical inheritance system. Int. J. Lexicogr. 3(4), 245–264 (1990)

    Article  Google Scholar 

  • Miller, G.A.: WordNet: a lexical database. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  • Miller, G.A.: Nouns in WordNet. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database, pp. 23–46. The MIT Press, Cambridge (1998)

    Google Scholar 

  • Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: WordNet: an on-line lexical database. J. Lexicogr. 3(4), 234–244 (1990)

    Google Scholar 

  • Miller, G.A., Hristea, F.: WordNet nouns: classes and instances. Comput. Linguist. 32(1), 1–3 (2006)

    Article  Google Scholar 

  • Pedersen, T., Bruce, R.: Knowledge lean word-sense disambiguation. In: Proceedings of the 15th National Conference on Artificial Intelligence, pp. 800–805 Madison, Wisconsin (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Florentina T. Hristea .

Rights and permissions

Reprints and permissions

Copyright information

© 2013 The Author(s)

About this chapter

Cite this chapter

Hristea, F.T. (2013). Semantic WordNet-Based Feature Selection. In: The Naïve Bayes Model for Unsupervised Word Sense Disambiguation. SpringerBriefs in Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33693-5_3

Download citation

Publish with us

Policies and ethics