Skip to main content

Combining Dependency Parsing and a Lexical Network Based on Lexical Functions for the Identification of Collocations

  • Conference paper
  • First Online:
Computational and Corpus-Based Phraseology (EUROPHRAS 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10596))

Included in the following conference series:

  • 1208 Accesses

Abstract

A collocation is a type of multiword expression formed by two parts: a base and a collocate. Usually, in a collocation, the base has a denotative or literal meaning, while the collocate has a connotative meaning. Examples of collocations: pay attention, easy as pie, strongly condemn, lend support, etc. The Meaning-Text Theory created the lexical functions to, among other objectives, represent the meaning existing between the base and the collocate or to represent the relation between the base and a support verb. For example, the lexical function Magn represents the meaning intensification, while the lexical function Caus, applied to a base, returns the support verb that represents the causality of the action expressed in the collocation. In a dependency parsing, each word (dependent) is directly associated with its governor in a phrase. In this paper, we show how we combine dependency parsing to extract collocation candidates and a lexical network based on lexical functions to identify the true collocations from the candidates. The candidates are extracted from a French corpus according to 14 dependency relations. The collocations identified are classified according to the semantic group of the lexical functions modeling them. We obtained a general precision (for all dependency types) of 76.3%, with a precision higher than 95% for collocations having certain dependency relations. We also found that about 86% of collocations identified belong to only four semantic categories: qualification, support verb, location and action/event.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 74.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 95.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://universaldependencies.org/introduction.html.

  2. 2.

    http://www.atilf.fr/spip.php?article908.

  3. 3.

    http://www.atilf.fr/spip.php?rubrique77.

  4. 4.

    http://www.maltparser.org/.

  5. 5.

    http://www.eecs.berkeley.edu/~petrov/berkeleyParser.

  6. 6.

    http://mstparser.sourceforge.net.

  7. 7.

    http://maltparser.org/mco/french_parser/fremalt.html.

  8. 8.

    https://opennlp.apache.org/.

  9. 9.

    https://dkpro.github.io/.

  10. 10.

    http://www.maltparser.org/.

References

  1. Mel’čuk, I.: Collocations and Lexical Functions. Phraseology. Theory, Analysis and Applications, pp. 23–53 (1998)

    Google Scholar 

  2. Mel’čuk, I.: Vers une linguistique sens-texte. Leçon Inaugurale. Collège de France, Paris (1997)

    Google Scholar 

  3. Mel’čuk, I.: Actants in Semantics and Syntax II: Actants in Syntax, vol. 42, pp. 247–291. de Gruyter, Berlin

    Google Scholar 

  4. Orliac, B.: Colex: Un outil d’extraction de collocations spécialisées basé sur les fonctions lexicales. Terminology 12, 261–280 (2006)

    Article  Google Scholar 

  5. Sahlgren, M.: The Word-Space Model: Using Distributional Analysis to Represent Syntagmatic and Paradigmatic Relations Between Words in High-dimensional Vector Spaces. SICS Dissertation Series. Department of Linguistics, Stockholm University (2006)

    Google Scholar 

  6. Choueka, Y.: Looking for needles in a haystack or locating interesting collocational expressions in large textual databases. In: Fluhr, C., Walker, D.E. (eds.) RIAO, pp. 609–624. CID (1988)

    Google Scholar 

  7. Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. Linguist. 16(1), 22–29 (1990)

    Google Scholar 

  8. Smadja, F., McKeown, K.R., Hatzivassiloglou, V.: Translating collocations for bilingual lexicons: a statistical approach. Comput. Linguist. 22(1), 1–38 (1996)

    Google Scholar 

  9. Garcia, M., García-Salido, M., Alonso-Ramos, M.: Using bilingual word-embeddings for multilingual collocation extraction. In: Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp. 21–30. Association for Computational Linguistics, Valencia, April 2017

    Google Scholar 

  10. McDonald, R.T., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K.B., Petrov, S., Zhang, H., Täckström, O., Bedini, C., Castelló, N.B., Lee., J.: Universal dependency annotation for multilingual parsing. In: ACL (2), pp. 92–97. The Association for Computer Linguistics (2013)

    Google Scholar 

  11. Buchholz, S., Marsi, E.: CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the Tenth Conference on Computational Natural Language Learning, CoNLL-X 2006, pp. 149–164. Association for Computational Linguistics, Stroudsburg (2006)

    Google Scholar 

  12. Berard, A., Servan, C., Pietquin, O., Besacier, L.: Multivec: a multilingual and multilevel representation learning toolkit for NLP. In: Calzolari, N., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association (ELRA), Paris, May 2016

    Google Scholar 

  13. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. CoRR, abs/1301.3781 (2013)

    Google Scholar 

  14. Luong, T., Pham, H., Manning, C.D.: Bilingual word representations with monolingual quality in mind. In: Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, pp. 151–159 (2015)

    Google Scholar 

  15. Kolesnikova, O.: Automatic Extraction of Lexical Functions. Ph.D. thesis directed by Alexander Gelbukh, Instituto Politecnico Nacional - Centro de Investigacion en Computacion, Mexico, DF (2011)

    Google Scholar 

  16. Ramos, M.A., Rambow, O., Wanner, L.: Using semantically annotated corpora to build collocation resources. In: Proceedings of LREC, pp. 1154–1158 (2008)

    Google Scholar 

  17. Fillmore, C.J.: Scenes-and-Frames Semantics. Fundamental Studies in Computer Science, vol. 59. North Holland Publishing, Dordrecht (1977)

    Google Scholar 

  18. Wanner, L., Bohnet, B., Giereth, M.: What is beyond collocations? Insights from machine learning experiments. In: Corino, C.O.E., Marello, C. (eds.) Proceedings of the 12th EURALEX International Congress, pp. 1071–1087. Edizioni dell’Orso, Torino, September 2006

    Google Scholar 

  19. Jousse, A.-L.: Modèle de structuration des relations lexicales fondé sur le formalisme des fonctions lexicales. Ph.D. thesis. Directed by Sylvain Kahane et Alain Polguere, Université de Montréal et Université Paris Diderot (Paris 7) (2010)

    Google Scholar 

  20. Lux-Pogodalla, V., Polguère, A.: Construction of a french lexical network: methodological issues. In: First InternationalWorkshop on Lexical Resources, WoLeR 2011, Ljubljana, Slovenia, pp. 54–61, August 2011

    Google Scholar 

  21. Mel’čuk, I., Clas, A., Polguère, A.: Introduction à la lexicologie explicative et combinatoire. Duculot, Louvain-la-Neuve (1995)

    Google Scholar 

  22. Dendien, J., Pierrel, J.-M.: Le trésor de la langue française informatisé: un exemple d’informatisation d’un dictionnaire de langue de référence. Traitement Autom. Lang. (TAL) 44, 11–37 (2003)

    Google Scholar 

  23. Fonseca, A., Sadat, F., Lareau, F.: Lexfom: a lexical functions ontology model. In: Proceedings of the Fifth Workshop on Cognitive Aspects of the Lexicon (CogALex), COLING, Osaka, pp. 145–155 (2016)

    Google Scholar 

  24. Candito, M., Nivre, J., Denis, P., Anguiano, E.H.: Benchmarking of statistical dependency parsers for french. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING 2010, pp. 108–116. ACL, Stroudsburg (2010)

    Google Scholar 

  25. Candito, M., Anguiano, E.H., Seddah, D.: A word clustering approach to domain adaptation: effective parsing of biomedical texts. In: Proceedings of the 12th International Conference on Parsing Technologies, Vancouver, Canada, pp. 37–42 (2011)

    Google Scholar 

  26. Raganato, A., Bovi, C.D., Collados, J.C., Navigli, R.: Eurosense: automatic harvesting of multilingual sense annotations from parallel text. In: Proceedings of 55th annual meeting of the Association for Computational Linguistics (ACL 2017), Vancouver, Canada (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexsandro Fonseca .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Fonseca, A., Sadat, F., Lareau, F. (2017). Combining Dependency Parsing and a Lexical Network Based on Lexical Functions for the Identification of Collocations. In: Mitkov, R. (eds) Computational and Corpus-Based Phraseology. EUROPHRAS 2017. Lecture Notes in Computer Science(), vol 10596. Springer, Cham. https://doi.org/10.1007/978-3-319-69805-2_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69805-2_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69804-5

  • Online ISBN: 978-3-319-69805-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics