Abstract
In this chapter, we discuss definitions and properties of MWEs and we present a brief introduction to the research field of automatic MWE treatment. Although we include pointers toward linguistic and psycholinguistic studies, most of the related work cited in this chapter has a strong computational background.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
En exagérant quelque peu, on pourrait même dire que l’ensemble des lexies [le lexique] est la langue.
- 2.
See Sect. 2.2 for a clarification on the difference between MWE and collocation.
- 3.
Le DEC [Dictionnaire Explicatif Combinatoire] ne décrit pas tous les phrasèmes de la même façon. Les phrasèmes complets […] et les quasi-phrasèmes […], c’est-à-dire les phrasèmes qui ne peuvent pas être complètement décrits en fonction d’au moins un de leurs constituants, forment des entrées indépendantes — tout comme les lexèmes. Les semi-phrasèmes ( = les collocations […]) sont décrits sous l’entrée d’un de leurs constituants — par ce qu’on appelle les fonctions lexicales.
- 4.
- 5.
on connaît tout aussi bien le caractère rétif du mot mot, qui, jusqu’à présent, a échappé aux tentatives de le circonscrire avec précision et a fait couler beaucoup d’encre pendant des décennies.
- 6.
Although this definition refers to sequences of words, thus assuming that MWEs are contiguous, we prefer to seem them as word combinations or groups for greater generality.
- 7.
See Sect. 2.4.
- 8.
See Sect. 2.4.
- 9.
- 10.
One can say The Dow Jones average of 30 industrials, The Dow average, The Dow industrials or The Dow Jones industrial, but never?The Jones industrials,?The industrial Dow,?The Dow of 30 industrials nor?The Dow industrial.
- 11.
This classification is largely incomplete. It does not cover the whole set of MWEs defined in our work.
- 12.
Sometimes, the opposite may occur, that is, the simple denominal verb may come from the corresponding noun in the construction, for instance, give an example = exemplify.
- 13.
To present may also mean make a gift, and the use of the analytic expression using the same support verb helps disambiguating.
- 14.
- 15.
- 16.
References
Anastasiou D, Hashimoto C, Nakov P, Kim SN (eds) (2009) Proceedings of the ACL workshop on multiword expressions: identification, interpretation, disambiguation, applications (MWE 2009), Singapore. Association for Computational Linguistics/Suntec, 70p. http://aclweb.org/anthology-new/W/W09/W09-29
Attia M, Toral A, Tounsi L, Pecina P, van Genabith J (2010) Automatic extraction of Arabic multiword expressions. In: Laporte É, Nakov P, Ramisch C, Villavicencio A (eds) Proceedings of the COLING workshop on multiword expressions: from theory to applications (MWE 2010), Beijing. Association for Computational Linguistics, pp 18–26
Baldwin T, Kim SN (2010) Multiword expressions. In: Indurkhya N, Damerau FJ (eds) Handbook of natural language processing, 2nd edn. CRC/Taylor and Francis Group, Boca Raton, pp 267–292
Bond F, Korhonen A, McCarthy D, Villavicencio A (eds) (2003) Proceedings of the ACL workshop on multiword expressions: analysis, acquisition and treatment (MWE 2003), Sapporo. Association for Computational Linguistics, 104p. http://aclweb.org/anthology-new/W/W03/W03-1800
Bu F, Zhu X, Li M (2010) Measuring the non-compositionality of multiword expressions. In: Huang CR, Jurafsky D (eds) Proceedings of the 23rd international conference on computational linguistics (COLING 2010), Beijing. The Coling 2010 Organizing Committee, pp 116–124. http://www.aclweb.org/anthology/C10-1014
Butnariu C, Kim SN, Nakov P, Séaghdha DO, Szpakowicz S, Veale T (2010) Semeval-2 task 9: the interpretation of noun compounds using paraphrasing verbs and prepositions. In: Erk K, Strapparava C (eds) Proceedings of the 5th international workshop on semantic evaluation (SemEval 2010), Uppsala. Association for Computational Linguistics, pp 39–44. http://www.aclweb.org/anthology/S10-1007
Cabré MT (1992) La terminologia. La teoria, els mètodes, les aplicacions. Empúries, Barcelona, 527p
Calzolari N, Fillmore C, Grishman R, Ide N, Lenci A, Macleod C, Zampolli A (2002) Towards best practice for multiword expressions in computational lexicons. In: Proceedings of the third international conference on language resources and evaluation (LREC 2002), Las Palmas. European Language Resources Association, pp 1934–1940
Carpuat M, Diab M (2010) Task-based evaluation of multiword expressions: a pilot study in statistical machine translation. In: Proceedings of human language technology: the 2010 annual conference of the North American chapter of the Association for Computational Linguistics (NAACL 2003), Los Angeles. Association for Computational Linguistics, pp 242–245. http://www.aclweb.org/anthology/N10-1029
Choueka Y (1988) Looking for needles in a haystack or locating interesting collocational expressions in large textual databases. In: Fluhr C, Walker DE (eds) Proceedings of the 2nd international conference on computer-assisted information retrieval (Recherche d’Information et ses Applications – RIA 1988), Cambridge. CID, pp 609–624
Church K (2013) How many multiword expressions do people know? ACM Trans Speech language processing Spec Issue Multiword Expr Theory Pract Use Part 1 (TSLP) 10(2):1–13
Church K, Hanks P (1990) Word association norms mutual information, and lexicography. Comput Linguist 16(1):22–29
Cruse DA (1986) Lexical semantics. Cambridge University Press, Cambridge, 310p
Dagan I, Church K (1994) Termight: identifying and translating technical terminology. In: Proceedings of the 4th applied natural language processing conference (ANLP 1994), Stuttgart. Association for Computational Linguistics, pp 34–40. doi:10.3115/974358.974367, http://www.aclweb.org/anthology/A94-1006
de Medeiros Caseli H, Ramisch C, das Graças Volpe Nunes M, Villavicencio A (2010) Alignment-based extraction of multiword expressions. Lang Resour Eval Spec Issue Multiword Expr Hard Going or Plain Sailing 44(1–2):59–77. doi:10.1007/s10579-009-9097-9, http://www.springerlink.com/content/H7313427H78865MG
Devereux B, Costello F (2007) Learning to interpret novel noun-noun compounds: evidence from a category learning experiment. In: Buttery P, Villavicencio A, Korhonen A (eds) Proceedings of the ACL 2007 workshop on cognitive aspects of computational language acquisition, Prague. Association for Computational Linguistics, pp 89–96. http://www.aclweb.org/anthology/W/W07/W07-0612
Dunning T (1993) Accurate methods for the statistics of surprise and coincidence. Comput Linguist 19(1):61–74
Evert S (2004) The statistics of word cooccurrences: word pairs and collocations. PhD thesis, Institut für maschinelle Sprachverarbeitung, University of Stuttgart, Stuttgart, 353p
Fazly A, Stevenson S (2007) Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures. In: Grégoire N, Evert S, Kim SN (eds) Proceedings of the ACL workshop on a broader perspective on multiword expressions (MWE 2007), Prague. Association for Computational Linguistics, pp 9–16. http://www.aclweb.org/anthology/W/W07/W07-1102
Fillmore CJ, Kay P, O’Connor MC (1988) Regularity and idiomaticity in grammatical constructions: the case of let alone. Language 64:501–538. http://www.jstor.org/stable/414531
Firth JR (1957) Papers in linguistics 1934-1951. Oxford University Press, Oxford, 233p
Frantzi K, Ananiadou S, Mima H (2000) Automatic recognition of multiword terms: the C-value/NC-value method. Int J Digit Libr 3(2):115–130
Grégoire N, Evert S, Kim SN (eds) (2007) Proceedings of the ACL workshop on a broader perspective on multiword expressions (MWE 2007), Prague. Association for Computational Linguistics, 80p. http://aclweb.org/anthology-new/W/W07/W07-11
Grégoire N, Evert S, Krenn B (eds) (2008) Proceedings of the LREC workshop towards a shared task for multiword expressions (MWE 2008), Marrakech, 57p. http://www.lrec-conf.org/proceedings/lrec2008/workshops/W20_Proceedings.pdf
Hendrickx I, Kim SN, Kozareva Z, Nakov P, Séaghdha DO, Padó S, Pennacchiotti M, Romano L, Szpakowicz S (2010) Semeval-2010 task 8: multi-way classification of semantic relations between pairs of nominals. In: Erk K, Strapparava C (eds) Proceedings of the 5th international workshop on semantic evaluation (SemEval 2010), Uppsala. Association for Computational Linguistics, pp 33–38. http://www.aclweb.org/anthology/S10-1006
Jackendoff R (1997) Twistin’ the night away. Language 73:534–559
Joshi A (2010) Multi-word expressions as discourse relation markers (DRMs). In: Laporte É, Nakov P, Ramisch C, Villavicencio A (eds) Proceedings of the COLING workshop on multiword expressions: from theory to applications (MWE 2010), Beijing. Association for Computational Linguistics, p 89
Joyce T, Srdanović I (2008) Comparing lexical relationships observed within Japanese collocation data and Japanese word association norms. In: Zock M, Huang CR (eds) Proceedings of the COLING 2008 workshop on cognitive aspects of the lexicon (COGALEX 2008), Manchester. The Coling 2008 Organizing Committee, pp 1–8. http://www.aclweb.org/anthology/W08-1901
Justeson JS, Katz SM (1995) Technical terminology: some linguistic properties and an algorithm for identification in text. Nat Lang Eng 1(1):9–27
Kim SN, Medelyan O, Kan MY, Baldwin T (2010) Semeval-2010 task 5: automatic keyphrase extraction from scientific articles. In: Erk K, Strapparava C (eds) Proceedings of the 5th international workshop on semantic evaluation (SemEval 2010), Uppsala. Association for Computational Linguistics, pp 21–26. http://www.aclweb.org/anthology/S10-1004
Kordoni V, Ramisch C, Villavicencio A (eds) (2011) Proceedings of the ACL workshop on multiword expressions: from parsing and generation to the real world (MWE 2011), Portland. Association for Computational Linguistics, 144p. http://www.aclweb.org/anthology/W/W11/W11-08
Kordoni V, Ramisch C, Villavicencio A (eds) (2013) Proceedings of the 9th workshop on multiword expressions (MWE 2013), Atlanta. Association for Computational Linguistics, 144p. http://www.aclweb.org/anthology/W13-10
Kordoni V, Savary A, Egg M, Wehrli E, Evert S (eds) (2014) Proceedings of the 10th workshop on multiword expressions (MWE 2014), Gothenburg. Association for Computational Linguistics, 133p. http://www.aclweb.org/anthology/W14-08
Krieger M, Finatto MJB (2004) Introdução à Terminologia: teoria & prática. Editora Contexto, São Paulo, 223p
Laporte É, Nakov P, Ramisch C, Villavicencio A (eds) (2010) Proceedings of the COLING workshop on multiword expressions: from theory to applications (MWE 2010), Beijing. Association for Computational Linguistics, 89p. http://aclweb.org/anthology-new/W/W10/W10-37
Lavagnino E, Park J (2010) Conceptual structure of automatically extracted multi-word terms from domain specific corpora: a case study for Italian. In: Zock M, Rapp R (eds) Proceedings of the 2nd workshop on cognitive aspects of the lexicon (COGALEX 2010), Beijing. The Coling 2010 Organizing Committee, pp 48–55. http://www.aclweb.org/anthology/W10-3408
Lin D (1998a) Automatic retrieval and clustering of similar words. In: Proceedings of the 36th annual meeting of the Association for Computational Linguistics and 17th international conference on computational linguistics, Montreal, vol 2. Association for Computational Linguistics, pp 768–774. doi:10.3115/980691.980696, http://www.aclweb.org/anthology/P98-2127
Lin D (1998b) Extracting collocations from text corpora. In: First workshop on computational terminology, Montreal, pp 57–63
Lin D (1999) Automatic identification of non-compositional phrases. In: Proceedings of the 37th annual meeting of the Association for Computational Linguistics (ACL 1999), College Park. Association for Computational Linguistics, pp 317–324
Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT, Cambridge, 620p
Mel’čuk I, Polguère A (1987) A formal lexicon in the meaning-text theory or (how to do lexica with words). Comput Linguist 13(3–4):261–275
Mel’čuk I, Arbatchewsky-Jumarie N, Elnitsky L, Iordanskaja L, Lessard A (1984) Dictionnaire explicatif et combinatoire du français contemporain. Recherches lexico-sémantiques I. Les presses de l’Université de Montréal, Montréal, 172p
Mel’čuk I, Arbatchewsky-Jumarie N, Dagenais L, Elnitsky L, Iordanskaja L, Lefebvre MN, Mantha S (1988) Dictionnaire explicatif et combinatoire du français contemporain. Recherches lexico-sémantiques II. Les presses de l’Université de Montréal, Montréal, 332p
Mel’čuk I, Arbatchewsky-Jumarie N, Iordanskaja L, Mantha S (1992) Dictionnaire explicatif et combinatoire du français contemporain. Recherches lexico-sémantiques III. Les presses de l’Université de Montréal, Montréal, 323p
Mel’čuk I, Clas A, Polguère A (1995) Introduction à la lexicologie explicative et combinatoire. Editions Duculot, Louvain la Neuve, 256p
Mel’čuk I, Arbatchewsky-Jumarie N, Clas A, Mantha S, Polguère A (1999) Dictionnaire explicatif et combinatoire du français contemporain. Recherches lexico-sémantiques IV. Les presses de l’Université de Montréal, Montréal, 347p
Mitkov R, Monti J, Pastor GC, Seretan V (eds) (2013) Proceedings of the MT summit 2013 workshop on multi-word units in machine translation and translation technology (MUMTTT 2013), Nice.
Moirón BV, Villavicencio A, McCarthy D, Evert S, Stevenson S (eds) (2006) Proceedings of the COLING/ACL workshop on multiword expressions: identifying and exploiting underlying properties (MWE 2006), Sidney. Association for Computational Linguistics, 61p. http://aclweb.org/anthology-new/W/W06/W06-12
Nakov P (2013) On the interpretation of noun compounds: syntax, semantics, and entailment. Nat Lang Eng Spec Issue Noun Compd 19(3):291–330. doi10.1017/S1351324913000065, http://journals.cambridge.org/article_S1351324913000065
Nematzadeh A, Fazly A, Stevenson S (2012, to appear) Child acquisition of multiword verbs: a computational investigation. In: Poibeau T, Villavicencio A, Korhonen A, Alishahi A (eds) Cognitive aspects of computational language acquisition, Springer, Heidelberg
Pal S, Naskar SK, Pecina P, Bandyopadhyay S, Way A (2010) Handling named entities and compound verbs in phrase-based statistical machine translation. In: Laporte É, Nakov P, Ramisch C, Villavicencio A (eds) Proceedings of the COLING workshop on multiword expressions: from theory to applications (MWE 2010), Beijing. Association for Computational Linguistics, pp 45–53
Pearce D (2001) Synonymy in collocation extraction. In: WordNet and other lexical resources: applications, extensions and customizations (NAACL 2001 workshop), Pittsburgh, pp 41–46
Ramisch C (2009) Multiword terminology extraction for domain-specific documents. Master’s thesis, École Nationale Supérieure d’Informatique et de Mathématiques Appliquées, Grenoble, 79p
Ramisch C, Villavicencio A, Moura L, Idiart M (2008) Picking them up and figuring them out: verb-particle constructions, noise and idiomaticity. In: Clark A, Toutanova K (eds) Proceedings of the twelfth conference on natural language learning (CoNLL 2008), Manchester. The Coling 2008 Organizing Committee, pp 49–56. http://www.aclweb.org/anthology/W08-2107
Ramisch C, Villavicencio A, Kordoni V (2013) Introduction to the special issue on multiword expressions: from theory to practice and use. ACM Trans Speech Lang Process Spec Issue Multiword Expr Theory Pract Use Part 1 (TSLP) 10(2):1–10
Rapp R (2008) The computation of associative responses to multiword stimuli. In: Zock M, Huang CR (eds) Proceedings of the COLING 2008 workshop on cognitive aspects of the lexicon (COGALEX 2008), Manchester. The Coling 2008 Organizing Committee, pp 102–109. http://www.aclweb.org/anthology/W08-1914
Rayson P, Sharoff S, Adolphs S (eds) (2006) Proceedings of the EACL workshop on multiword expressions in multilingual context (EACL-MWE 2006), Trento. Association for Computational Linguistics, 79p. http://aclweb.org/anthology-new/W/W06/W06-2400
Rayson P, Piao S, Sharoff S, Evert S, Villada Moirón B (2010) Multiword expressions hard going or plain sailing? Lang Resour Eval Spec Issue Multiword Expr Hard Going Plain Sailing 44(1–2):1–5 Springer
Sag I, Baldwin T, Bond F, Copestake A, Flickinger D (2002) Multiword expressions: a pain in the neck for NLP. In: Proceedings of the 3rd international conference on intelligent text processing and computational linguistics (CICLing-2002), Mexico-City. Lecture notes in computer science, vol 2276/2010. Springer, pp 1–15
SanJuan E, Dowdall J, Ibekwe-SanJuan F, Rinaldi F (2005) A symbolic approach to automatic multiword term structuring. Comput Speech Lang Spec Issue MWEs 19(4):524–542
Seretan V (2008) Collocation extraction based on syntactic parsing. PhD thesis, University of Geneva, Geneva, 249p
Seretan V (2011) Syntax-based collocation extraction. Text, speech and language technology, vol 44, 1st edn. Springer, Dordrecht, 212p
Sinclair J (1991) Corpus, concordance, collocation. Describing English language, Oxford University Press, Oxford, 179p
Smadja FA (1993) Retrieving collocations from text: Xtract. Comput Linguist 19(1):143–177
Szpakowicz S, Bond F, Nakov P, Kim SN (2013) On the semantics of noun compounds. In: Nat Lang Eng Spec Issue Noun Compd 19(3):289–290. Cambridge University Press, Cambridge
Tanaka T, Villavicencio A, Bond F, Korhonen A (eds) (2004) Proceedings of the ACL workshop on multiword expressions: integrating processing (MWE 2004), Barcelona. Association for Computational Linguistics, 103p. http://aclweb.org/anthology-new/W/W04/W04-0400
Villavicencio A, Bond F, Korhonen A, McCarthy D (2005) Introduction to the special issue on multiword expressions having a crack at a hard nut. Computer speech Lang Spec Issue MWEs 19(4):365–377 Elsevier
Villavicencio A, Idiart M, Ramisch C, Araujo VD, Yankama B, Berwick R (2012) Get out but don’t fall down: verb-particle constructions in child language. In: Berwick R, Korhonen A, Poibeau T, Villavicencio A (eds) Proceedings of the EACL 2012 workshop on computational models of language acquisition and loss, Avignon. Association for Computational Linguistics, pp 43–50
Yarowsky D (2001) One sense per collocation. In: Proceedings of the first international conference on human language technology research (HLT 2001), San Diego. Morgan Kaufmann Publishers, pp 266–271
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Ramisch, C. (2015). Definitions and Characteristics. In: Multiword Expressions Acquisition. Theory and Applications of Natural Language Processing. Springer, Cham. https://doi.org/10.1007/978-3-319-09207-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-09207-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09206-5
Online ISBN: 978-3-319-09207-2
eBook Packages: Computer ScienceComputer Science (R0)