Skip to main content

Definitions and Characteristics

  • Chapter
  • First Online:
Multiword Expressions Acquisition
  • 1012 Accesses

Abstract

In this chapter, we discuss definitions and properties of MWEs and we present a brief introduction to the research field of automatic MWE treatment. Although we include pointers toward linguistic and psycholinguistic studies, most of the related work cited in this chapter has a strong computational background.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    En exagérant quelque peu, on pourrait même dire que l’ensemble des lexies [le lexique] est la langue.

  2. 2.

    See Sect. 2.2 for a clarification on the difference between MWE and collocation.

  3. 3.

    Le DEC [Dictionnaire Explicatif Combinatoire] ne décrit pas tous les phrasèmes de la même façon. Les phrasèmes complets […] et les quasi-phrasèmes […], c’est-à-dire les phrasèmes qui ne peuvent pas être complètement décrits en fonction d’au moins un de leurs constituants, forment des entrées indépendantes — tout comme les lexèmes. Les semi-phrasèmes ( = les collocations […]) sont décrits sous l’entrée d’un de leurs constituants — par ce qu’on appelle les fonctions lexicales.

  4. 4.

    http://mwe.stanford.edu/

  5. 5.

    on connaît tout aussi bien le caractère rétif du mot mot, qui, jusqu’à présent, a échappé aux tentatives de le circonscrire avec précision et a fait couler beaucoup d’encre pendant des décennies.

  6. 6.

    Although this definition refers to sequences of words, thus assuming that MWEs are contiguous, we prefer to seem them as word combinations or groups for greater generality.

  7. 7.

    See Sect. 2.4.

  8. 8.

    See Sect. 2.4.

  9. 9.

    http://mwe.stanford.edu/

  10. 10.

    One can say The Dow Jones average of 30 industrials, The Dow average, The Dow industrials or The Dow Jones industrial, but never?The Jones industrials,?The industrial Dow,?The Dow of 30 industrials nor?The Dow industrial.

  11. 11.

    This classification is largely incomplete. It does not cover the whole set of MWEs defined in our work.

  12. 12.

    Sometimes, the opposite may occur, that is, the simple denominal verb may come from the corresponding noun in the construction, for instance, give an example = exemplify.

  13. 13.

    To present may also mean make a gift, and the use of the analytic expression using the same support verb helps disambiguating.

  14. 14.

    http://clic2.cimec.unitn.it/starsem2013/

  15. 15.

    http://typo.uni-konstanz.de/parseme/

  16. 16.

    http://multiword.sourceforge.net/

References

  • Anastasiou D, Hashimoto C, Nakov P, Kim SN (eds) (2009) Proceedings of the ACL workshop on multiword expressions: identification, interpretation, disambiguation, applications (MWE 2009), Singapore. Association for Computational Linguistics/Suntec, 70p. http://aclweb.org/anthology-new/W/W09/W09-29

  • Attia M, Toral A, Tounsi L, Pecina P, van Genabith J (2010) Automatic extraction of Arabic multiword expressions. In: Laporte É, Nakov P, Ramisch C, Villavicencio A (eds) Proceedings of the COLING workshop on multiword expressions: from theory to applications (MWE 2010), Beijing. Association for Computational Linguistics, pp 18–26

    Google Scholar 

  • Baldwin T, Kim SN (2010) Multiword expressions. In: Indurkhya N, Damerau FJ (eds) Handbook of natural language processing, 2nd edn. CRC/Taylor and Francis Group, Boca Raton, pp 267–292

    Google Scholar 

  • Bond F, Korhonen A, McCarthy D, Villavicencio A (eds) (2003) Proceedings of the ACL workshop on multiword expressions: analysis, acquisition and treatment (MWE 2003), Sapporo. Association for Computational Linguistics, 104p. http://aclweb.org/anthology-new/W/W03/W03-1800

  • Bu F, Zhu X, Li M (2010) Measuring the non-compositionality of multiword expressions. In: Huang CR, Jurafsky D (eds) Proceedings of the 23rd international conference on computational linguistics (COLING 2010), Beijing. The Coling 2010 Organizing Committee, pp 116–124. http://www.aclweb.org/anthology/C10-1014

  • Butnariu C, Kim SN, Nakov P, Séaghdha DO, Szpakowicz S, Veale T (2010) Semeval-2 task 9: the interpretation of noun compounds using paraphrasing verbs and prepositions. In: Erk K, Strapparava C (eds) Proceedings of the 5th international workshop on semantic evaluation (SemEval 2010), Uppsala. Association for Computational Linguistics, pp 39–44. http://www.aclweb.org/anthology/S10-1007

  • Cabré MT (1992) La terminologia. La teoria, els mètodes, les aplicacions. Empúries, Barcelona, 527p

    Google Scholar 

  • Calzolari N, Fillmore C, Grishman R, Ide N, Lenci A, Macleod C, Zampolli A (2002) Towards best practice for multiword expressions in computational lexicons. In: Proceedings of the third international conference on language resources and evaluation (LREC 2002), Las Palmas. European Language Resources Association, pp 1934–1940

    Google Scholar 

  • Carpuat M, Diab M (2010) Task-based evaluation of multiword expressions: a pilot study in statistical machine translation. In: Proceedings of human language technology: the 2010 annual conference of the North American chapter of the Association for Computational Linguistics (NAACL 2003), Los Angeles. Association for Computational Linguistics, pp 242–245. http://www.aclweb.org/anthology/N10-1029

  • Choueka Y (1988) Looking for needles in a haystack or locating interesting collocational expressions in large textual databases. In: Fluhr C, Walker DE (eds) Proceedings of the 2nd international conference on computer-assisted information retrieval (Recherche d’Information et ses Applications – RIA 1988), Cambridge. CID, pp 609–624

    Google Scholar 

  • Church K (2013) How many multiword expressions do people know? ACM Trans Speech language processing Spec Issue Multiword Expr Theory Pract Use Part 1 (TSLP) 10(2):1–13

    Article  MathSciNet  Google Scholar 

  • Church K, Hanks P (1990) Word association norms mutual information, and lexicography. Comput Linguist 16(1):22–29

    Google Scholar 

  • Cruse DA (1986) Lexical semantics. Cambridge University Press, Cambridge, 310p

    Google Scholar 

  • Dagan I, Church K (1994) Termight: identifying and translating technical terminology. In: Proceedings of the 4th applied natural language processing conference (ANLP 1994), Stuttgart. Association for Computational Linguistics, pp 34–40. doi:10.3115/974358.974367, http://www.aclweb.org/anthology/A94-1006

  • de Medeiros Caseli H, Ramisch C, das Graças Volpe Nunes M, Villavicencio A (2010) Alignment-based extraction of multiword expressions. Lang Resour Eval Spec Issue Multiword Expr Hard Going or Plain Sailing 44(1–2):59–77. doi:10.1007/s10579-009-9097-9, http://www.springerlink.com/content/H7313427H78865MG

  • Devereux B, Costello F (2007) Learning to interpret novel noun-noun compounds: evidence from a category learning experiment. In: Buttery P, Villavicencio A, Korhonen A (eds) Proceedings of the ACL 2007 workshop on cognitive aspects of computational language acquisition, Prague. Association for Computational Linguistics, pp 89–96. http://www.aclweb.org/anthology/W/W07/W07-0612

  • Dunning T (1993) Accurate methods for the statistics of surprise and coincidence. Comput Linguist 19(1):61–74

    Google Scholar 

  • Evert S (2004) The statistics of word cooccurrences: word pairs and collocations. PhD thesis, Institut für maschinelle Sprachverarbeitung, University of Stuttgart, Stuttgart, 353p

    Google Scholar 

  • Fazly A, Stevenson S (2007) Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures. In: Grégoire N, Evert S, Kim SN (eds) Proceedings of the ACL workshop on a broader perspective on multiword expressions (MWE 2007), Prague. Association for Computational Linguistics, pp 9–16. http://www.aclweb.org/anthology/W/W07/W07-1102

  • Fillmore CJ, Kay P, O’Connor MC (1988) Regularity and idiomaticity in grammatical constructions: the case of let alone. Language 64:501–538. http://www.jstor.org/stable/414531

  • Firth JR (1957) Papers in linguistics 1934-1951. Oxford University Press, Oxford, 233p

    Google Scholar 

  • Frantzi K, Ananiadou S, Mima H (2000) Automatic recognition of multiword terms: the C-value/NC-value method. Int J Digit Libr 3(2):115–130

    Article  Google Scholar 

  • Grégoire N, Evert S, Kim SN (eds) (2007) Proceedings of the ACL workshop on a broader perspective on multiword expressions (MWE 2007), Prague. Association for Computational Linguistics, 80p. http://aclweb.org/anthology-new/W/W07/W07-11

  • Grégoire N, Evert S, Krenn B (eds) (2008) Proceedings of the LREC workshop towards a shared task for multiword expressions (MWE 2008), Marrakech, 57p. http://www.lrec-conf.org/proceedings/lrec2008/workshops/W20_Proceedings.pdf

  • Hendrickx I, Kim SN, Kozareva Z, Nakov P, Séaghdha DO, Padó S, Pennacchiotti M, Romano L, Szpakowicz S (2010) Semeval-2010 task 8: multi-way classification of semantic relations between pairs of nominals. In: Erk K, Strapparava C (eds) Proceedings of the 5th international workshop on semantic evaluation (SemEval 2010), Uppsala. Association for Computational Linguistics, pp 33–38. http://www.aclweb.org/anthology/S10-1006

  • Jackendoff R (1997) Twistin’ the night away. Language 73:534–559

    Article  Google Scholar 

  • Joshi A (2010) Multi-word expressions as discourse relation markers (DRMs). In: Laporte É, Nakov P, Ramisch C, Villavicencio A (eds) Proceedings of the COLING workshop on multiword expressions: from theory to applications (MWE 2010), Beijing. Association for Computational Linguistics, p 89

    Google Scholar 

  • Joyce T, Srdanović I (2008) Comparing lexical relationships observed within Japanese collocation data and Japanese word association norms. In: Zock M, Huang CR (eds) Proceedings of the COLING 2008 workshop on cognitive aspects of the lexicon (COGALEX 2008), Manchester. The Coling 2008 Organizing Committee, pp 1–8. http://www.aclweb.org/anthology/W08-1901

  • Justeson JS, Katz SM (1995) Technical terminology: some linguistic properties and an algorithm for identification in text. Nat Lang Eng 1(1):9–27

    Article  Google Scholar 

  • Kim SN, Medelyan O, Kan MY, Baldwin T (2010) Semeval-2010 task 5: automatic keyphrase extraction from scientific articles. In: Erk K, Strapparava C (eds) Proceedings of the 5th international workshop on semantic evaluation (SemEval 2010), Uppsala. Association for Computational Linguistics, pp 21–26. http://www.aclweb.org/anthology/S10-1004

  • Kordoni V, Ramisch C, Villavicencio A (eds) (2011) Proceedings of the ACL workshop on multiword expressions: from parsing and generation to the real world (MWE 2011), Portland. Association for Computational Linguistics, 144p. http://www.aclweb.org/anthology/W/W11/W11-08

  • Kordoni V, Ramisch C, Villavicencio A (eds) (2013) Proceedings of the 9th workshop on multiword expressions (MWE 2013), Atlanta. Association for Computational Linguistics, 144p. http://www.aclweb.org/anthology/W13-10

  • Kordoni V, Savary A, Egg M, Wehrli E, Evert S (eds) (2014) Proceedings of the 10th workshop on multiword expressions (MWE 2014), Gothenburg. Association for Computational Linguistics, 133p. http://www.aclweb.org/anthology/W14-08

  • Krieger M, Finatto MJB (2004) Introdução à Terminologia: teoria & prática. Editora Contexto, São Paulo, 223p

    Google Scholar 

  • Laporte É, Nakov P, Ramisch C, Villavicencio A (eds) (2010) Proceedings of the COLING workshop on multiword expressions: from theory to applications (MWE 2010), Beijing. Association for Computational Linguistics, 89p. http://aclweb.org/anthology-new/W/W10/W10-37

  • Lavagnino E, Park J (2010) Conceptual structure of automatically extracted multi-word terms from domain specific corpora: a case study for Italian. In: Zock M, Rapp R (eds) Proceedings of the 2nd workshop on cognitive aspects of the lexicon (COGALEX 2010), Beijing. The Coling 2010 Organizing Committee, pp 48–55. http://www.aclweb.org/anthology/W10-3408

  • Lin D (1998a) Automatic retrieval and clustering of similar words. In: Proceedings of the 36th annual meeting of the Association for Computational Linguistics and 17th international conference on computational linguistics, Montreal, vol 2. Association for Computational Linguistics, pp 768–774. doi:10.3115/980691.980696, http://www.aclweb.org/anthology/P98-2127

  • Lin D (1998b) Extracting collocations from text corpora. In: First workshop on computational terminology, Montreal, pp 57–63

    Google Scholar 

  • Lin D (1999) Automatic identification of non-compositional phrases. In: Proceedings of the 37th annual meeting of the Association for Computational Linguistics (ACL 1999), College Park. Association for Computational Linguistics, pp 317–324

    Google Scholar 

  • Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT, Cambridge, 620p

    MATH  Google Scholar 

  • Mel’čuk I, Polguère A (1987) A formal lexicon in the meaning-text theory or (how to do lexica with words). Comput Linguist 13(3–4):261–275

    Google Scholar 

  • Mel’čuk I, Arbatchewsky-Jumarie N, Elnitsky L, Iordanskaja L, Lessard A (1984) Dictionnaire explicatif et combinatoire du français contemporain. Recherches lexico-sémantiques I. Les presses de l’Université de Montréal, Montréal, 172p

    Google Scholar 

  • Mel’čuk I, Arbatchewsky-Jumarie N, Dagenais L, Elnitsky L, Iordanskaja L, Lefebvre MN, Mantha S (1988) Dictionnaire explicatif et combinatoire du français contemporain. Recherches lexico-sémantiques II. Les presses de l’Université de Montréal, Montréal, 332p

    Google Scholar 

  • Mel’čuk I, Arbatchewsky-Jumarie N, Iordanskaja L, Mantha S (1992) Dictionnaire explicatif et combinatoire du français contemporain. Recherches lexico-sémantiques III. Les presses de l’Université de Montréal, Montréal, 323p

    Google Scholar 

  • Mel’čuk I, Clas A, Polguère A (1995) Introduction à la lexicologie explicative et combinatoire. Editions Duculot, Louvain la Neuve, 256p

    Google Scholar 

  • Mel’čuk I, Arbatchewsky-Jumarie N, Clas A, Mantha S, Polguère A (1999) Dictionnaire explicatif et combinatoire du français contemporain. Recherches lexico-sémantiques IV. Les presses de l’Université de Montréal, Montréal, 347p

    Google Scholar 

  • Mitkov R, Monti J, Pastor GC, Seretan V (eds) (2013) Proceedings of the MT summit 2013 workshop on multi-word units in machine translation and translation technology (MUMTTT 2013), Nice.

    Google Scholar 

  • Moirón BV, Villavicencio A, McCarthy D, Evert S, Stevenson S (eds) (2006) Proceedings of the COLING/ACL workshop on multiword expressions: identifying and exploiting underlying properties (MWE 2006), Sidney. Association for Computational Linguistics, 61p. http://aclweb.org/anthology-new/W/W06/W06-12

  • Nakov P (2013) On the interpretation of noun compounds: syntax, semantics, and entailment. Nat Lang Eng Spec Issue Noun Compd 19(3):291–330. doi10.1017/S1351324913000065, http://journals.cambridge.org/article_S1351324913000065

  • Nematzadeh A, Fazly A, Stevenson S (2012, to appear) Child acquisition of multiword verbs: a computational investigation. In: Poibeau T, Villavicencio A, Korhonen A, Alishahi A (eds) Cognitive aspects of computational language acquisition, Springer, Heidelberg

    Google Scholar 

  • Pal S, Naskar SK, Pecina P, Bandyopadhyay S, Way A (2010) Handling named entities and compound verbs in phrase-based statistical machine translation. In: Laporte É, Nakov P, Ramisch C, Villavicencio A (eds) Proceedings of the COLING workshop on multiword expressions: from theory to applications (MWE 2010), Beijing. Association for Computational Linguistics, pp 45–53

    Google Scholar 

  • Pearce D (2001) Synonymy in collocation extraction. In: WordNet and other lexical resources: applications, extensions and customizations (NAACL 2001 workshop), Pittsburgh, pp 41–46

    Google Scholar 

  • Ramisch C (2009) Multiword terminology extraction for domain-specific documents. Master’s thesis, École Nationale Supérieure d’Informatique et de Mathématiques Appliquées, Grenoble, 79p

    Google Scholar 

  • Ramisch C, Villavicencio A, Moura L, Idiart M (2008) Picking them up and figuring them out: verb-particle constructions, noise and idiomaticity. In: Clark A, Toutanova K (eds) Proceedings of the twelfth conference on natural language learning (CoNLL 2008), Manchester. The Coling 2008 Organizing Committee, pp 49–56. http://www.aclweb.org/anthology/W08-2107

  • Ramisch C, Villavicencio A, Kordoni V (2013) Introduction to the special issue on multiword expressions: from theory to practice and use. ACM Trans Speech Lang Process Spec Issue Multiword Expr Theory Pract Use Part 1 (TSLP) 10(2):1–10

    Google Scholar 

  • Rapp R (2008) The computation of associative responses to multiword stimuli. In: Zock M, Huang CR (eds) Proceedings of the COLING 2008 workshop on cognitive aspects of the lexicon (COGALEX 2008), Manchester. The Coling 2008 Organizing Committee, pp 102–109. http://www.aclweb.org/anthology/W08-1914

  • Rayson P, Sharoff S, Adolphs S (eds) (2006) Proceedings of the EACL workshop on multiword expressions in multilingual context (EACL-MWE 2006), Trento. Association for Computational Linguistics, 79p. http://aclweb.org/anthology-new/W/W06/W06-2400

  • Rayson P, Piao S, Sharoff S, Evert S, Villada Moirón B (2010) Multiword expressions hard going or plain sailing? Lang Resour Eval Spec Issue Multiword Expr Hard Going Plain Sailing 44(1–2):1–5 Springer

    Google Scholar 

  • Sag I, Baldwin T, Bond F, Copestake A, Flickinger D (2002) Multiword expressions: a pain in the neck for NLP. In: Proceedings of the 3rd international conference on intelligent text processing and computational linguistics (CICLing-2002), Mexico-City. Lecture notes in computer science, vol 2276/2010. Springer, pp 1–15

    Google Scholar 

  • SanJuan E, Dowdall J, Ibekwe-SanJuan F, Rinaldi F (2005) A symbolic approach to automatic multiword term structuring. Comput Speech Lang Spec Issue MWEs 19(4):524–542

    Article  Google Scholar 

  • Seretan V (2008) Collocation extraction based on syntactic parsing. PhD thesis, University of Geneva, Geneva, 249p

    Google Scholar 

  • Seretan V (2011) Syntax-based collocation extraction. Text, speech and language technology, vol 44, 1st edn. Springer, Dordrecht, 212p

    Google Scholar 

  • Sinclair J (1991) Corpus, concordance, collocation. Describing English language, Oxford University Press, Oxford, 179p

    Google Scholar 

  • Smadja FA (1993) Retrieving collocations from text: Xtract. Comput Linguist 19(1):143–177

    Google Scholar 

  • Szpakowicz S, Bond F, Nakov P, Kim SN (2013) On the semantics of noun compounds. In: Nat Lang Eng Spec Issue Noun Compd 19(3):289–290. Cambridge University Press, Cambridge

    Google Scholar 

  • Tanaka T, Villavicencio A, Bond F, Korhonen A (eds) (2004) Proceedings of the ACL workshop on multiword expressions: integrating processing (MWE 2004), Barcelona. Association for Computational Linguistics, 103p. http://aclweb.org/anthology-new/W/W04/W04-0400

  • Villavicencio A, Bond F, Korhonen A, McCarthy D (2005) Introduction to the special issue on multiword expressions having a crack at a hard nut. Computer speech Lang Spec Issue MWEs 19(4):365–377 Elsevier

    Google Scholar 

  • Villavicencio A, Idiart M, Ramisch C, Araujo VD, Yankama B, Berwick R (2012) Get out but don’t fall down: verb-particle constructions in child language. In: Berwick R, Korhonen A, Poibeau T, Villavicencio A (eds) Proceedings of the EACL 2012 workshop on computational models of language acquisition and loss, Avignon. Association for Computational Linguistics, pp 43–50

    Google Scholar 

  • Yarowsky D (2001) One sense per collocation. In: Proceedings of the first international conference on human language technology research (HLT 2001), San Diego. Morgan Kaufmann Publishers, pp 266–271

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Ramisch, C. (2015). Definitions and Characteristics. In: Multiword Expressions Acquisition. Theory and Applications of Natural Language Processing. Springer, Cham. https://doi.org/10.1007/978-3-319-09207-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09207-2_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09206-5

  • Online ISBN: 978-3-319-09207-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics