Abstract
A collocation is a type of multiword expression formed by two parts: a base and a collocate. Usually, in a collocation, the base has a denotative or literal meaning, while the collocate has a connotative meaning. Examples of collocations: pay attention, easy as pie, strongly condemn, lend support, etc. The Meaning-Text Theory created the lexical functions to, among other objectives, represent the meaning existing between the base and the collocate or to represent the relation between the base and a support verb. For example, the lexical function Magn represents the meaning intensification, while the lexical function Caus, applied to a base, returns the support verb that represents the causality of the action expressed in the collocation. In a dependency parsing, each word (dependent) is directly associated with its governor in a phrase. In this paper, we show how we combine dependency parsing to extract collocation candidates and a lexical network based on lexical functions to identify the true collocations from the candidates. The candidates are extracted from a French corpus according to 14 dependency relations. The collocations identified are classified according to the semantic group of the lexical functions modeling them. We obtained a general precision (for all dependency types) of 76.3%, with a precision higher than 95% for collocations having certain dependency relations. We also found that about 86% of collocations identified belong to only four semantic categories: qualification, support verb, location and action/event.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
References
Mel’čuk, I.: Collocations and Lexical Functions. Phraseology. Theory, Analysis and Applications, pp. 23–53 (1998)
Mel’čuk, I.: Vers une linguistique sens-texte. Leçon Inaugurale. Collège de France, Paris (1997)
Mel’čuk, I.: Actants in Semantics and Syntax II: Actants in Syntax, vol. 42, pp. 247–291. de Gruyter, Berlin
Orliac, B.: Colex: Un outil d’extraction de collocations spécialisées basé sur les fonctions lexicales. Terminology 12, 261–280 (2006)
Sahlgren, M.: The Word-Space Model: Using Distributional Analysis to Represent Syntagmatic and Paradigmatic Relations Between Words in High-dimensional Vector Spaces. SICS Dissertation Series. Department of Linguistics, Stockholm University (2006)
Choueka, Y.: Looking for needles in a haystack or locating interesting collocational expressions in large textual databases. In: Fluhr, C., Walker, D.E. (eds.) RIAO, pp. 609–624. CID (1988)
Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. Linguist. 16(1), 22–29 (1990)
Smadja, F., McKeown, K.R., Hatzivassiloglou, V.: Translating collocations for bilingual lexicons: a statistical approach. Comput. Linguist. 22(1), 1–38 (1996)
Garcia, M., García-Salido, M., Alonso-Ramos, M.: Using bilingual word-embeddings for multilingual collocation extraction. In: Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp. 21–30. Association for Computational Linguistics, Valencia, April 2017
McDonald, R.T., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K.B., Petrov, S., Zhang, H., Täckström, O., Bedini, C., Castelló, N.B., Lee., J.: Universal dependency annotation for multilingual parsing. In: ACL (2), pp. 92–97. The Association for Computer Linguistics (2013)
Buchholz, S., Marsi, E.: CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the Tenth Conference on Computational Natural Language Learning, CoNLL-X 2006, pp. 149–164. Association for Computational Linguistics, Stroudsburg (2006)
Berard, A., Servan, C., Pietquin, O., Besacier, L.: Multivec: a multilingual and multilevel representation learning toolkit for NLP. In: Calzolari, N., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association (ELRA), Paris, May 2016
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. CoRR, abs/1301.3781 (2013)
Luong, T., Pham, H., Manning, C.D.: Bilingual word representations with monolingual quality in mind. In: Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, pp. 151–159 (2015)
Kolesnikova, O.: Automatic Extraction of Lexical Functions. Ph.D. thesis directed by Alexander Gelbukh, Instituto Politecnico Nacional - Centro de Investigacion en Computacion, Mexico, DF (2011)
Ramos, M.A., Rambow, O., Wanner, L.: Using semantically annotated corpora to build collocation resources. In: Proceedings of LREC, pp. 1154–1158 (2008)
Fillmore, C.J.: Scenes-and-Frames Semantics. Fundamental Studies in Computer Science, vol. 59. North Holland Publishing, Dordrecht (1977)
Wanner, L., Bohnet, B., Giereth, M.: What is beyond collocations? Insights from machine learning experiments. In: Corino, C.O.E., Marello, C. (eds.) Proceedings of the 12th EURALEX International Congress, pp. 1071–1087. Edizioni dell’Orso, Torino, September 2006
Jousse, A.-L.: Modèle de structuration des relations lexicales fondé sur le formalisme des fonctions lexicales. Ph.D. thesis. Directed by Sylvain Kahane et Alain Polguere, Université de Montréal et Université Paris Diderot (Paris 7) (2010)
Lux-Pogodalla, V., Polguère, A.: Construction of a french lexical network: methodological issues. In: First InternationalWorkshop on Lexical Resources, WoLeR 2011, Ljubljana, Slovenia, pp. 54–61, August 2011
Mel’čuk, I., Clas, A., Polguère, A.: Introduction à la lexicologie explicative et combinatoire. Duculot, Louvain-la-Neuve (1995)
Dendien, J., Pierrel, J.-M.: Le trésor de la langue française informatisé: un exemple d’informatisation d’un dictionnaire de langue de référence. Traitement Autom. Lang. (TAL) 44, 11–37 (2003)
Fonseca, A., Sadat, F., Lareau, F.: Lexfom: a lexical functions ontology model. In: Proceedings of the Fifth Workshop on Cognitive Aspects of the Lexicon (CogALex), COLING, Osaka, pp. 145–155 (2016)
Candito, M., Nivre, J., Denis, P., Anguiano, E.H.: Benchmarking of statistical dependency parsers for french. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING 2010, pp. 108–116. ACL, Stroudsburg (2010)
Candito, M., Anguiano, E.H., Seddah, D.: A word clustering approach to domain adaptation: effective parsing of biomedical texts. In: Proceedings of the 12th International Conference on Parsing Technologies, Vancouver, Canada, pp. 37–42 (2011)
Raganato, A., Bovi, C.D., Collados, J.C., Navigli, R.: Eurosense: automatic harvesting of multilingual sense annotations from parallel text. In: Proceedings of 55th annual meeting of the Association for Computational Linguistics (ACL 2017), Vancouver, Canada (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Fonseca, A., Sadat, F., Lareau, F. (2017). Combining Dependency Parsing and a Lexical Network Based on Lexical Functions for the Identification of Collocations. In: Mitkov, R. (eds) Computational and Corpus-Based Phraseology. EUROPHRAS 2017. Lecture Notes in Computer Science(), vol 10596. Springer, Cham. https://doi.org/10.1007/978-3-319-69805-2_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-69805-2_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69804-5
Online ISBN: 978-3-319-69805-2
eBook Packages: Computer ScienceComputer Science (R0)