Skip to main content

Typing Relations in Distributional Thesauri

  • Chapter
  • First Online:
Language Production, Cognition, and the Lexicon

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 48))

Abstract

Dictionaries are important tools for language producers but they are rarely organized for an easy access to words from concepts. Such access can be facilitated by the presence of relations between words in dictionaries for implementing associative lookup. Lexical associations can be quite easily extracted from a corpus as first or second order co-occurrence relations. However, these associations face two related problems: they are noisy and the type of relations on which they are based is implicit. In this article, we propose to address to some extent the second problem by studying the type of relations that can be found in distributional thesauri. This study is more precisely performed by relying on a reference lexical network, WordNet in our case, in which the type of the relations is known. This reference network is first used for identifying directly the relations of the thesauri that are present in this network but also for characterizing, through the detection of patterns of composition of known relations, new kinds of relations that do not appear explicitly in it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The experiments were performed on 24 months of the French Le Monde newspaper.

  2. 2.

    Although the Moby thesaurus contains not only synonyms, we will sometimes use the term synonym for referring to all the words associated to one of its entries.

  3. 3.

    http://webdocs.cs.ualberta.ca/lindek/Downloads/sim.tgz.

  4. 4.

    As for A2ST, the weighting function of co-occurrents was PMI, only the co-occurrents with one occurrence were filtered and the Cosine measure was applied for comparing distributional contexts.

  5. 5.

    http://nlp.stanford.edu/~socherr/ACL2012_wordVectorsTextFile.zip.

  6. 6.

    We thank more particularly Adrian Popescu for having given access to these data.

  7. 7.

    A path type is made of a sequence of elementary relations while a path occurrence also includes the specific words that are linked.

References

  • Blanco, E., & Moldovan, D. (2011). Unsupervised learning of semantic relation composition. In 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT 2011) (pp. 1456–1465). Portland, Oregon.

    Google Scholar 

  • Budanitsky, A., & Hirst, G. (2006). Evaluating WordNet-based measures of lexical semantic relatedness. Computational Linguistics, 32(1), 13–47.

    Article  MATH  Google Scholar 

  • Curran, J. R. (2003). From distributional to semantic similarity. Ph.D. thesis, University of Edinburgh.

    Google Scholar 

  • Curran, J. R., & Moens, M. (2002). Improvements in automatic thesaurus extraction. In Workshop of the ACL Special Interest Group on the Lexicon (SIGLEX) (pp. 5966). Philadelphia, USA.

    Google Scholar 

  • Fellbaum, C. (Ed.). (1998). WordNet: An electronic lexical database. Cambridge: The MIT Press.

    MATH  Google Scholar 

  • Ferret, O. (2002). Using collocations for topic segmentation and link detection. In 19th International Conference on Computational Linguistics (COLING 2002) (pp. 260266). Taipei, Taiwan.

    Google Scholar 

  • Ferret, O. (2006). Building a network of topical relations from a corpus. In 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 575580). Genova, Italy.

    Google Scholar 

  • Ferret, O. (2010). Testing semantic similarity measures for extracting synonyms from a corpus. In Seventh conference on international language resources and evaluation (LREC’10). Valletta, Malta.

    Google Scholar 

  • Ferret, O., & Zock, M. (2006). Enhancing electronic dictionaries with an index based on associations. In 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL 2006) (pp. 281–288). Sydney, Australia.

    Google Scholar 

  • Firth, J. R. (1957). A synopsis of linguistic theory 1930–1955. In Studies in linguistic analysis (pp. 132). Oxford: Blackwell.

    Google Scholar 

  • Freitag, D., Blume, M., Byrnes, J., Chow, E., Kapadia, S., & Rohwer, R., et al. (2005). New experiments in distributional representations of synonymy. In Ninth Conference on Computational Natural Language Learning (CoNLL) (pp. 2532). Ann Arbor, MI, USA.

    Google Scholar 

  • Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In 20th International Joint Conference on Artificial Intelligence (IJCAI 2007) (pp. 612).

    Google Scholar 

  • Grefenstette, G. (1994). Explorations in automatic thesaurus discovery. Boston: Kluwer Academic Publishers.

    Book  MATH  Google Scholar 

  • Halliday, M. A. K., & Hasan, R. (1976). Cohesion in English. London: Longman.

    Google Scholar 

  • Harabagiu, S., & Moldovan, D. (1998). Knowledge processing on extended WordNet. In C. Fellbaum (Ed.), WordNet: An electronic lexical database (pp. 379–405). Cambridge: MIT Press.

    Google Scholar 

  • Hearst, M. A. (1994). Multi-paragraph segmentation of expository text. In 32th Annual Meeting of the Association for Computational Linguistics (ACL’94) (pp. 9–16). Las Cruces, New Mexico, USA.

    Google Scholar 

  • Heylen, K., Peirsmany, Y., Geeraerts, D., & Speelman, D. (2008). Modelling word similarity: An evaluation of automatic synonymy extraction algorithms. In Sixth Conference on International Language Resources and Evaluation (LREC 2008). Marrakech, Morocco.

    Google Scholar 

  • Hindle, D. (1990). Noun classification from predicate-argument structures. In 28th Annual Meeting of the Association for Computational Linguistics (ACL 1990) (pp. 268–275). Pittsburgh, Pennsylvania, USA.

    Google Scholar 

  • Hirst, G., & St-Onge, D. (1998). Lexical chains as representations of context for the detection and correction of malapropisms. In C. Fellbaum (Ed.), WordNet: an electronic lexical database (pp. 305–332). Cambridge: MIT Press.

    Google Scholar 

  • Huang, E. H., Socher, R., Manning, C. D., & Ng, A. Y. (2012). Improving word representations via global context and multiple word prototypes. In 50th Annual Meeting of the Association for Computational Linguistics (ACL’12) (pp. 873882).

    Google Scholar 

  • Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211–240.

    Article  Google Scholar 

  • Lin, D. (1994). PRINCIPAR: An efficient, broad-coverage, principle-based parser. In COLING’94 (pp 4248). Kyoto, Japan.

    Google Scholar 

  • Lin, D. (1998). Automatic retrieval and clustering of similar words. In 17th International Conference on Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics (ACL-COLING’98) (pp. 768774). Montréal, Canada.

    Google Scholar 

  • Marton, Y. (2013). Distributional phrasal paraphrase generation for statistical machine translation. ACM Transactions on Intelligent Systems and Technology, 4(3), 1–32.

    Article  Google Scholar 

  • Mazuel, L., & Sabouret, N. (2008). Semantic relatedness measure using object properties in an ontology. In 7th International Conference on The Semantic Web (ISWC’08) (pp. 681694). Springer, Karlsruhe, Germany.

    Google Scholar 

  • Mel’čuk, I. A., & Polguère, A. (1987). A formal lexicon in the meaning-text theory or (how to do lexica with words). Computational Linguistics, 13(3–4), 261–275.

    Google Scholar 

  • Mikolov, T., Yih, W. -T., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. In 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2013) (pp. 746751). Atlanta, Georgia.

    Google Scholar 

  • Min, B., Shi, S., Grishman, R., & Lin, C. Y. (2012). Ensemble semantics for large-scale unsupervised relation extraction. In 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2012) (pp. 1027–1037). Jeju Island, Korea.

    Google Scholar 

  • Morlane-Hondère, F., & Fabre, C. (2012). Le test de substituabilité à l’épreuve des corpus : utiliser l’analyse distributionnelle automatique pour l’étude des relations lexicales. In Congrès Mondial de Linguistique Française (CMLF 2012) (pp. 10011015). EDP Sciences, Lyon, France.

    Google Scholar 

  • Morris, J., & Hirst, G. (2004). Non-classical lexical semantic relations. In Workshop on Computational Lexical Semantics of Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (pp, 4651). Boston, MA.

    Google Scholar 

  • Popescu, A., & Grefenstette, G. (2011). Social media driven image retrieval. In 1st ACM International Conference on Multimedia Retrieval (ICMR’11), ACM (pp. 1–8). Trento, Italy.

    Google Scholar 

  • Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In International Conference on New Methods in Language Processing.

    Google Scholar 

  • Van de Cruys, T. (2010). Mining for meaning. The extraction of Lexico-semantic knowledge from text. Ph.D. thesis. The Netherlands: University of Groningen.

    Google Scholar 

  • Ward, G. (1996). Moby thesaurus. Moby Project.

    Google Scholar 

  • Weeds, J. (2003). Measures and applications of lexical distributional similarity. Ph.D. thesis, Department of Informatics, University of Sussex.

    Google Scholar 

  • Wu, Z., & Palmer, M. (1994). Verbs semantics and lexical selection. In 32nd Annual Meeting of the Association for Computational Linguistics (ACL’94) (pp. 133138). Las Cruces, New Mexico, USA.

    Google Scholar 

  • Zesch, T., & Gurevych, I. (2010). Wisdom of crowds versus wisdom of linguists: Measuring the semantic relatedness of words. Natural Language Engineering, 16(1), 25–59.

    Article  Google Scholar 

  • Zock, M. (1996). The power of words in message planning. In 16th International Conference on Computational Linguistics (COLING 1996) (pp. 990995). Copenhagen, Denmark.

    Google Scholar 

  • Zock, M. (2002). Sorry, what was your name again, or how to overcome the tip-of-the tongue problem with the help of a computer? In SEMANET’02: Workshop on Building and Using Semantic Networks (pp. 1–6), Taipei, Taiwan.

    Google Scholar 

  • Zock, M., & Bilac, S. (2004). Word lookup on the basis of associations: From an idea to a roadmap. In COLING 2004 Workshop: Enhancing and Using Electronic Dictionaries (pp. 2935). Geneva, Switzerland.

    Google Scholar 

  • Zock, M., & Quint, J. (2004). Why have them work for peanuts, when it is so easy to provide reward? Motivations for converting a dictionary into a drill tutor. In 5th Workshop on Multilingual Lexical Databases. Grenoble, France.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Olivier Ferret .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Ferret, O. (2015). Typing Relations in Distributional Thesauri. In: Gala, N., Rapp, R., Bel-Enguix, G. (eds) Language Production, Cognition, and the Lexicon. Text, Speech and Language Technology, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-319-08043-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08043-7_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08042-0

  • Online ISBN: 978-3-319-08043-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics