Abstract
Dictionaries are important tools for language producers but they are rarely organized for an easy access to words from concepts. Such access can be facilitated by the presence of relations between words in dictionaries for implementing associative lookup. Lexical associations can be quite easily extracted from a corpus as first or second order co-occurrence relations. However, these associations face two related problems: they are noisy and the type of relations on which they are based is implicit. In this article, we propose to address to some extent the second problem by studying the type of relations that can be found in distributional thesauri. This study is more precisely performed by relying on a reference lexical network, WordNet in our case, in which the type of the relations is known. This reference network is first used for identifying directly the relations of the thesauri that are present in this network but also for characterizing, through the detection of patterns of composition of known relations, new kinds of relations that do not appear explicitly in it.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The experiments were performed on 24 months of the French Le Monde newspaper.
- 2.
Although the Moby thesaurus contains not only synonyms, we will sometimes use the term synonym for referring to all the words associated to one of its entries.
- 3.
- 4.
As for A2ST, the weighting function of co-occurrents was PMI, only the co-occurrents with one occurrence were filtered and the Cosine measure was applied for comparing distributional contexts.
- 5.
- 6.
We thank more particularly Adrian Popescu for having given access to these data.
- 7.
A path type is made of a sequence of elementary relations while a path occurrence also includes the specific words that are linked.
References
Blanco, E., & Moldovan, D. (2011). Unsupervised learning of semantic relation composition. In 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT 2011) (pp. 1456–1465). Portland, Oregon.
Budanitsky, A., & Hirst, G. (2006). Evaluating WordNet-based measures of lexical semantic relatedness. Computational Linguistics, 32(1), 13–47.
Curran, J. R. (2003). From distributional to semantic similarity. Ph.D. thesis, University of Edinburgh.
Curran, J. R., & Moens, M. (2002). Improvements in automatic thesaurus extraction. In Workshop of the ACL Special Interest Group on the Lexicon (SIGLEX) (pp. 59–66). Philadelphia, USA.
Fellbaum, C. (Ed.). (1998). WordNet: An electronic lexical database. Cambridge: The MIT Press.
Ferret, O. (2002). Using collocations for topic segmentation and link detection. In 19th International Conference on Computational Linguistics (COLING 2002) (pp. 260–266). Taipei, Taiwan.
Ferret, O. (2006). Building a network of topical relations from a corpus. In 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 575–580). Genova, Italy.
Ferret, O. (2010). Testing semantic similarity measures for extracting synonyms from a corpus. In Seventh conference on international language resources and evaluation (LREC’10). Valletta, Malta.
Ferret, O., & Zock, M. (2006). Enhancing electronic dictionaries with an index based on associations. In 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL 2006) (pp. 281–288). Sydney, Australia.
Firth, J. R. (1957). A synopsis of linguistic theory 1930–1955. In Studies in linguistic analysis (pp. 1–32). Oxford: Blackwell.
Freitag, D., Blume, M., Byrnes, J., Chow, E., Kapadia, S., & Rohwer, R., et al. (2005). New experiments in distributional representations of synonymy. In Ninth Conference on Computational Natural Language Learning (CoNLL) (pp. 25–32). Ann Arbor, MI, USA.
Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In 20th International Joint Conference on Artificial Intelligence (IJCAI 2007) (pp. 6–12).
Grefenstette, G. (1994). Explorations in automatic thesaurus discovery. Boston: Kluwer Academic Publishers.
Halliday, M. A. K., & Hasan, R. (1976). Cohesion in English. London: Longman.
Harabagiu, S., & Moldovan, D. (1998). Knowledge processing on extended WordNet. In C. Fellbaum (Ed.), WordNet: An electronic lexical database (pp. 379–405). Cambridge: MIT Press.
Hearst, M. A. (1994). Multi-paragraph segmentation of expository text. In 32th Annual Meeting of the Association for Computational Linguistics (ACL’94) (pp. 9–16). Las Cruces, New Mexico, USA.
Heylen, K., Peirsmany, Y., Geeraerts, D., & Speelman, D. (2008). Modelling word similarity: An evaluation of automatic synonymy extraction algorithms. In Sixth Conference on International Language Resources and Evaluation (LREC 2008). Marrakech, Morocco.
Hindle, D. (1990). Noun classification from predicate-argument structures. In 28th Annual Meeting of the Association for Computational Linguistics (ACL 1990) (pp. 268–275). Pittsburgh, Pennsylvania, USA.
Hirst, G., & St-Onge, D. (1998). Lexical chains as representations of context for the detection and correction of malapropisms. In C. Fellbaum (Ed.), WordNet: an electronic lexical database (pp. 305–332). Cambridge: MIT Press.
Huang, E. H., Socher, R., Manning, C. D., & Ng, A. Y. (2012). Improving word representations via global context and multiple word prototypes. In 50th Annual Meeting of the Association for Computational Linguistics (ACL’12) (pp. 873–882).
Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211–240.
Lin, D. (1994). PRINCIPAR: An efficient, broad-coverage, principle-based parser. In COLING’94 (pp 42–48). Kyoto, Japan.
Lin, D. (1998). Automatic retrieval and clustering of similar words. In 17th International Conference on Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics (ACL-COLING’98) (pp. 768–774). Montréal, Canada.
Marton, Y. (2013). Distributional phrasal paraphrase generation for statistical machine translation. ACM Transactions on Intelligent Systems and Technology, 4(3), 1–32.
Mazuel, L., & Sabouret, N. (2008). Semantic relatedness measure using object properties in an ontology. In 7th International Conference on The Semantic Web (ISWC’08) (pp. 681–694). Springer, Karlsruhe, Germany.
Mel’čuk, I. A., & Polguère, A. (1987). A formal lexicon in the meaning-text theory or (how to do lexica with words). Computational Linguistics, 13(3–4), 261–275.
Mikolov, T., Yih, W. -T., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. In 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2013) (pp. 746–751). Atlanta, Georgia.
Min, B., Shi, S., Grishman, R., & Lin, C. Y. (2012). Ensemble semantics for large-scale unsupervised relation extraction. In 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2012) (pp. 1027–1037). Jeju Island, Korea.
Morlane-Hondère, F., & Fabre, C. (2012). Le test de substituabilité à l’épreuve des corpus : utiliser l’analyse distributionnelle automatique pour l’étude des relations lexicales. In Congrès Mondial de Linguistique Française (CMLF 2012) (pp. 1001–1015). EDP Sciences, Lyon, France.
Morris, J., & Hirst, G. (2004). Non-classical lexical semantic relations. In Workshop on Computational Lexical Semantics of Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (pp, 46–51). Boston, MA.
Popescu, A., & Grefenstette, G. (2011). Social media driven image retrieval. In 1st ACM International Conference on Multimedia Retrieval (ICMR’11), ACM (pp. 1–8). Trento, Italy.
Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In International Conference on New Methods in Language Processing.
Van de Cruys, T. (2010). Mining for meaning. The extraction of Lexico-semantic knowledge from text. Ph.D. thesis. The Netherlands: University of Groningen.
Ward, G. (1996). Moby thesaurus. Moby Project.
Weeds, J. (2003). Measures and applications of lexical distributional similarity. Ph.D. thesis, Department of Informatics, University of Sussex.
Wu, Z., & Palmer, M. (1994). Verbs semantics and lexical selection. In 32nd Annual Meeting of the Association for Computational Linguistics (ACL’94) (pp. 133–138). Las Cruces, New Mexico, USA.
Zesch, T., & Gurevych, I. (2010). Wisdom of crowds versus wisdom of linguists: Measuring the semantic relatedness of words. Natural Language Engineering, 16(1), 25–59.
Zock, M. (1996). The power of words in message planning. In 16th International Conference on Computational Linguistics (COLING 1996) (pp. 990–995). Copenhagen, Denmark.
Zock, M. (2002). Sorry, what was your name again, or how to overcome the tip-of-the tongue problem with the help of a computer? In SEMANET’02: Workshop on Building and Using Semantic Networks (pp. 1–6), Taipei, Taiwan.
Zock, M., & Bilac, S. (2004). Word lookup on the basis of associations: From an idea to a roadmap. In COLING 2004 Workshop: Enhancing and Using Electronic Dictionaries (pp. 29–35). Geneva, Switzerland.
Zock, M., & Quint, J. (2004). Why have them work for peanuts, when it is so easy to provide reward? Motivations for converting a dictionary into a drill tutor. In 5th Workshop on Multilingual Lexical Databases. Grenoble, France.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Ferret, O. (2015). Typing Relations in Distributional Thesauri. In: Gala, N., Rapp, R., Bel-Enguix, G. (eds) Language Production, Cognition, and the Lexicon. Text, Speech and Language Technology, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-319-08043-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-08043-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08042-0
Online ISBN: 978-3-319-08043-7
eBook Packages: Computer ScienceComputer Science (R0)