Typing Relations in Distributional Thesauri

Ferret, Olivier

doi:10.1007/978-3-319-08043-7_8

Olivier Ferret⁵

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 48))

1494 Accesses
3 Citations

Abstract

Dictionaries are important tools for language producers but they are rarely organized for an easy access to words from concepts. Such access can be facilitated by the presence of relations between words in dictionaries for implementing associative lookup. Lexical associations can be quite easily extracted from a corpus as first or second order co-occurrence relations. However, these associations face two related problems: they are noisy and the type of relations on which they are based is implicit. In this article, we propose to address to some extent the second problem by studying the type of relations that can be found in distributional thesauri. This study is more precisely performed by relying on a reference lexical network, WordNet in our case, in which the type of the relations is known. This reference network is first used for identifying directly the relations of the thesauri that are present in this network but also for characterizing, through the detection of patterns of composition of known relations, new kinds of relations that do not appear explicitly in it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The experiments were performed on 24 months of the French Le Monde newspaper.
2.
Although the Moby thesaurus contains not only synonyms, we will sometimes use the term synonym for referring to all the words associated to one of its entries.
3.
http://webdocs.cs.ualberta.ca/lindek/Downloads/sim.tgz.
4.
As for A2ST, the weighting function of co-occurrents was PMI, only the co-occurrents with one occurrence were filtered and the Cosine measure was applied for comparing distributional contexts.
5.
http://nlp.stanford.edu/~socherr/ACL2012_wordVectorsTextFile.zip.
6.
We thank more particularly Adrian Popescu for having given access to these data.
7.
A path type is made of a sequence of elementary relations while a path occurrence also includes the specific words that are linked.

References

Blanco, E., & Moldovan, D. (2011). Unsupervised learning of semantic relation composition. In 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT 2011) (pp. 1456–1465). Portland, Oregon.
Google Scholar
Budanitsky, A., & Hirst, G. (2006). Evaluating WordNet-based measures of lexical semantic relatedness. Computational Linguistics, 32(1), 13–47.
Article MATH Google Scholar
Curran, J. R. (2003). From distributional to semantic similarity. Ph.D. thesis, University of Edinburgh.
Google Scholar
Curran, J. R., & Moens, M. (2002). Improvements in automatic thesaurus extraction. In Workshop of the ACL Special Interest Group on the Lexicon (SIGLEX) (pp. 59–66). Philadelphia, USA.
Google Scholar
Fellbaum, C. (Ed.). (1998). WordNet: An electronic lexical database. Cambridge: The MIT Press.
MATH Google Scholar
Ferret, O. (2002). Using collocations for topic segmentation and link detection. In 19th International Conference on Computational Linguistics (COLING 2002) (pp. 260–266). Taipei, Taiwan.
Google Scholar
Ferret, O. (2006). Building a network of topical relations from a corpus. In 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 575–580). Genova, Italy.
Google Scholar
Ferret, O. (2010). Testing semantic similarity measures for extracting synonyms from a corpus. In Seventh conference on international language resources and evaluation (LREC’10). Valletta, Malta.
Google Scholar
Ferret, O., & Zock, M. (2006). Enhancing electronic dictionaries with an index based on associations. In 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL 2006) (pp. 281–288). Sydney, Australia.
Google Scholar
Firth, J. R. (1957). A synopsis of linguistic theory 1930–1955. In Studies in linguistic analysis (pp. 1–32). Oxford: Blackwell.
Google Scholar
Freitag, D., Blume, M., Byrnes, J., Chow, E., Kapadia, S., & Rohwer, R., et al. (2005). New experiments in distributional representations of synonymy. In Ninth Conference on Computational Natural Language Learning (CoNLL) (pp. 25–32). Ann Arbor, MI, USA.
Google Scholar
Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In 20th International Joint Conference on Artificial Intelligence (IJCAI 2007) (pp. 6–12).
Google Scholar
Grefenstette, G. (1994). Explorations in automatic thesaurus discovery. Boston: Kluwer Academic Publishers.
Book MATH Google Scholar
Halliday, M. A. K., & Hasan, R. (1976). Cohesion in English. London: Longman.
Google Scholar
Harabagiu, S., & Moldovan, D. (1998). Knowledge processing on extended WordNet. In C. Fellbaum (Ed.), WordNet: An electronic lexical database (pp. 379–405). Cambridge: MIT Press.
Google Scholar
Hearst, M. A. (1994). Multi-paragraph segmentation of expository text. In 32th Annual Meeting of the Association for Computational Linguistics (ACL’94) (pp. 9–16). Las Cruces, New Mexico, USA.
Google Scholar
Heylen, K., Peirsmany, Y., Geeraerts, D., & Speelman, D. (2008). Modelling word similarity: An evaluation of automatic synonymy extraction algorithms. In Sixth Conference on International Language Resources and Evaluation (LREC 2008). Marrakech, Morocco.
Google Scholar
Hindle, D. (1990). Noun classification from predicate-argument structures. In 28th Annual Meeting of the Association for Computational Linguistics (ACL 1990) (pp. 268–275). Pittsburgh, Pennsylvania, USA.
Google Scholar
Hirst, G., & St-Onge, D. (1998). Lexical chains as representations of context for the detection and correction of malapropisms. In C. Fellbaum (Ed.), WordNet: an electronic lexical database (pp. 305–332). Cambridge: MIT Press.
Google Scholar
Huang, E. H., Socher, R., Manning, C. D., & Ng, A. Y. (2012). Improving word representations via global context and multiple word prototypes. In 50th Annual Meeting of the Association for Computational Linguistics (ACL’12) (pp. 873–882).
Google Scholar
Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211–240.
Article Google Scholar
Lin, D. (1994). PRINCIPAR: An efficient, broad-coverage, principle-based parser. In COLING’94 (pp 42–48). Kyoto, Japan.
Google Scholar
Lin, D. (1998). Automatic retrieval and clustering of similar words. In 17th International Conference on Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics (ACL-COLING’98) (pp. 768–774). Montréal, Canada.
Google Scholar
Marton, Y. (2013). Distributional phrasal paraphrase generation for statistical machine translation. ACM Transactions on Intelligent Systems and Technology, 4(3), 1–32.
Article Google Scholar
Mazuel, L., & Sabouret, N. (2008). Semantic relatedness measure using object properties in an ontology. In 7th International Conference on The Semantic Web (ISWC’08) (pp. 681–694). Springer, Karlsruhe, Germany.
Google Scholar
Mel’čuk, I. A., & Polguère, A. (1987). A formal lexicon in the meaning-text theory or (how to do lexica with words). Computational Linguistics, 13(3–4), 261–275.
Google Scholar
Mikolov, T., Yih, W. -T., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. In 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2013) (pp. 746–751). Atlanta, Georgia.
Google Scholar
Min, B., Shi, S., Grishman, R., & Lin, C. Y. (2012). Ensemble semantics for large-scale unsupervised relation extraction. In 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2012) (pp. 1027–1037). Jeju Island, Korea.
Google Scholar
Morlane-Hondère, F., & Fabre, C. (2012). Le test de substituabilité à l’épreuve des corpus : utiliser l’analyse distributionnelle automatique pour l’étude des relations lexicales. In Congrès Mondial de Linguistique Française (CMLF 2012) (pp. 1001–1015). EDP Sciences, Lyon, France.
Google Scholar
Morris, J., & Hirst, G. (2004). Non-classical lexical semantic relations. In Workshop on Computational Lexical Semantics of Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (pp, 46–51). Boston, MA.
Google Scholar
Popescu, A., & Grefenstette, G. (2011). Social media driven image retrieval. In 1st ACM International Conference on Multimedia Retrieval (ICMR’11), ACM (pp. 1–8). Trento, Italy.
Google Scholar
Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In International Conference on New Methods in Language Processing.
Google Scholar
Van de Cruys, T. (2010). Mining for meaning. The extraction of Lexico-semantic knowledge from text. Ph.D. thesis. The Netherlands: University of Groningen.
Google Scholar
Ward, G. (1996). Moby thesaurus. Moby Project.
Google Scholar
Weeds, J. (2003). Measures and applications of lexical distributional similarity. Ph.D. thesis, Department of Informatics, University of Sussex.
Google Scholar
Wu, Z., & Palmer, M. (1994). Verbs semantics and lexical selection. In 32nd Annual Meeting of the Association for Computational Linguistics (ACL’94) (pp. 133–138). Las Cruces, New Mexico, USA.
Google Scholar
Zesch, T., & Gurevych, I. (2010). Wisdom of crowds versus wisdom of linguists: Measuring the semantic relatedness of words. Natural Language Engineering, 16(1), 25–59.
Article Google Scholar
Zock, M. (1996). The power of words in message planning. In 16th International Conference on Computational Linguistics (COLING 1996) (pp. 990–995). Copenhagen, Denmark.
Google Scholar
Zock, M. (2002). Sorry, what was your name again, or how to overcome the tip-of-the tongue problem with the help of a computer? In SEMANET’02: Workshop on Building and Using Semantic Networks (pp. 1–6), Taipei, Taiwan.
Google Scholar
Zock, M., & Bilac, S. (2004). Word lookup on the basis of associations: From an idea to a roadmap. In COLING 2004 Workshop: Enhancing and Using Electronic Dictionaries (pp. 29–35). Geneva, Switzerland.
Google Scholar
Zock, M., & Quint, J. (2004). Why have them work for peanuts, when it is so easy to provide reward? Motivations for converting a dictionary into a drill tutor. In 5th Workshop on Multilingual Lexical Databases. Grenoble, France.
Google Scholar

Download references

Author information

Authors and Affiliations

CEA LIST, Vision and Content Engineering Laboratory, 91191, Gif-Sur-Yvette, France
Olivier Ferret

Authors

Olivier Ferret
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Olivier Ferret .

Editor information

Editors and Affiliations

CNRS-LIF, UMR 7279, Aix-Marseille University, City, France
Núria Gala
CNRS-LIF, UMR 7279, Aix-Marseille University and University of Mainz, Marseille, France
Reinhard Rapp
CNRS-LIF, UMR 7279, Aix-Marseille University, Marseille, France
Gemma Bel-Enguix

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ferret, O. (2015). Typing Relations in Distributional Thesauri. In: Gala, N., Rapp, R., Bel-Enguix, G. (eds) Language Production, Cognition, and the Lexicon. Text, Speech and Language Technology, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-319-08043-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-08043-7_8
Published: 12 November 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08042-0
Online ISBN: 978-3-319-08043-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics