Advertisement

State of the Art in MWE Processing

  • Carlos Ramisch
Chapter
Part of the Theory and Applications of Natural Language Processing book series (NLP)

Abstract

In the previous chapter, we provided the historical and theoretical foundations for the study of multiword expressions. The set of definitions, characteristics and types described give an idea of the difficulty of the computational tasks involving MWEs. The goal of the present chapter is to draw an overview of the state of the art in computational methods for MWE treatment, focusing on acquisition. State-of-the-art techniques to deal with MWEs are the starting point of the methodology proposed in Chap.  5. Information contained in the present chapter allows better comparison and contextualisation of the present work in the computational linguistics panorama.

Keywords

Word Sense Disambiguation Suffix Tree Association Measure Parallel Corpus Suffix Array 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Acosta O, Villavicencio A, Moreira V (2011) Identification and treatment of multiword expressions applied to information retrieval. In: Kordoni V, Ramisch C, Villavicencio A (eds) Proceedings of the ALC workshop on multiword expressions: from parsing and generation to the real world (MWE 2011), Association for Computational Linguistics, Portland, pp 101–109. http://www.aclweb.org/anthology/W/W11/W11-0815
  2. Anastasiou D, Hashimoto C, Nakov P, Kim SN (eds) (2009) Proceedings of the ACL workshop on multiword expressions: identification, interpretation, disambiguation, applications (MWE 2009), Singapore. Association for Computational Linguistics/Suntec. http://aclweb.org/anthology-new/W/W09/W09-29, 70 p.
  3. Apresian J, Boguslavsky I, Iomdin L, Tsinman L (2003) Lexical functions as a tool of ETAP-3. In: Proceedings of the first international conference on meaning-text theory (MTT 2003), ParisGoogle Scholar
  4. Attia M, Toral A, Tounsi L, Pecina P, van Genabith J (2010) Automatic extraction of Arabic multiword expressions. In: Laporte É, Nakov P, Ramisch C, Villavicencio A (eds) Proceedings of the COLING workshop on multiword expressions: from theory to applications (MWE 2010), Beijing. Association for Computational Linguistics, pp 18–26Google Scholar
  5. Baayen RH (2001) Word frequency distributions, text, speech and language technology, vol 18. Springer, Berlin/New YorkCrossRefGoogle Scholar
  6. Bai MH, You JM, Chen KJ, Chang JS (2009) Acquiring translation equivalences of multiword expressions by normalized correlation frequencies. In: Proceedings of the 2009 conference on empirical methods in natural language processing (EMNLP 2009), Singapore. Association for Computational Linguistics/Suntec, pp 478–486Google Scholar
  7. Baldwin T (2005) Deep lexical acquisition of verb-particle constructions. Comput Speech Lang Spec Issue MWEs 19(4):398–414CrossRefGoogle Scholar
  8. Baldwin T (2011) MWEs and topic modelling: enhancing machine learning with linguistics. In: Kordoni V, Ramisch C, Villavicencio A (eds) Proceedings of the ALC workshop on multiword expressions: from parsing and generation to the real world (MWE 2011), Portland. Association for Computational Linguistics, p 1. http://www.aclweb.org/anthology/W/W11/W11-0801
  9. Baldwin T, Tanaka T (2004) Translation by machine of complex nominals: getting it right. In: Tanaka T, Villavicencio A, Bond F, Korhonen A (eds) Proceedings of the ACL workshop on multiword expressions: integrating processing (MWE 2004), Barcelona. Association for Computational Linguistics, pp 24–31Google Scholar
  10. Baldwin T, Bannard C, Tanaka T, Widdows D (2003) An empirical model of multiword expression decomposability. In: Bond F, Korhonen A, McCarthy D, Villavicencio A (eds) Proceedings of the ACL workshop on multiword expressions: analysis, acquisition and treatment (MWE 2003), Sapporo. Association for Computational Linguistics, pp 89–96. doi:10.3115/1119282.1119294, http://www.aclweb.org/anthology/W03-1812
  11. Banerjee S, Pedersen T (2003) The design, implementation, and use of the Ngram Statistic Package. In: Proceedings of the fourth international conference on intelligent text processing and computational linguistics, Mexico City, pp 370–381Google Scholar
  12. Bannard C (2005) Learning about the meaning of verb-particle constructions from corpora. Comput Speech Lang Spec Issue MWEs 19(4):467–478CrossRefGoogle Scholar
  13. Bejček E, Stranak P, Pecina P (2013) Syntactic identification of occurrences of multiword expressions in text using a lexicon with dependency structures. In: Kordoni V, Ramisch C, Villavicencio A (eds) Proceedings of the 9th workshop on multiword expressions (MWE 2013), Atlanta. Association for Computational Linguistics, pp 106–115. http://www.aclweb.org/anthology/W13-1016
  14. Bonin F, Dell’Orletta F, Montemagni S, Venturi G (2010a) A contrastive approach to multi-word extraction from domain-specific corpora. In: Proceedings of the seventh international conference on language resources and evaluation (LREC 2010), Valetta. European Language Resources AssociationGoogle Scholar
  15. Bonin F, Dell’Orletta F, Venturi G, Montemagni S (2010b) Contrastive filtering of domain-specific multi-word terms from different types of corpora. In: Laporte É, Nakov P, Ramisch C, Villavicencio A (eds) Proceedings of the COLING workshop on multiword expressions: from theory to applications (MWE 2010), Beijing. Association for Computational Linguistics, pp 76–79Google Scholar
  16. Bouamor D, Semmar N, Zweigenbaum P (2012) Identifying bilingual multi-word expressions for statistical machine translation. In: Proceedings of the eigth international conference on language resources and evaluation (LREC 2012), Istanbul. European Language Resources AssociationGoogle Scholar
  17. Briscoe T, Carroll J, Watson R (2006) The second release of the RASP system. In: Curran J (ed) Proceedings of the COLING/ACL 2006 interactive presentation sessions, Sidney. Association for Computational Linguistics, pp 77–80. http://www.aclweb.org/anthology/P/P06/P06-4020
  18. Bungum L, Gambäck B, Lynum A, Marsi E (2013) Improving word translation disambiguation by capturing multiword expressions with dictionaries. In: Kordoni V, Ramisch C, Villavicencio A (eds) Proceedings of the 9th workshop on multiword expressions (MWE 2013), Atlanta. Association for Computational Linguistics, pp 21–30. http://www.aclweb.org/anthology/W13-1003
  19. Burnard L (2007) User reference guide for the British National Corpus. Technical report, Oxford University Computing ServicesGoogle Scholar
  20. Butnariu C, Kim SN, Nakov P, Séaghdha DO, Szpakowicz S, Veale T (2010) Semeval-2 task 9: the interpretation of noun compounds using paraphrasing verbs and prepositions. In: Erk K, Strapparava C (eds) Proceedings of the 5th international workshop on semantic evaluation (SemEval 2010), Uppsala. Association for Computational Linguistics, pp 39–44. http://www.aclweb.org/anthology/S10-1007
  21. Carpuat M, Diab M (2010) Task-based evaluation of multiword expressions: a pilot study in statistical machine translation. In: Proceedings of human language technology: the 2010 annual conference of the North American chapter of the Association for Computational Linguistics (NAACL 2003), Los Angeles. Association for Computational Linguistics, pp 242–245. http://www.aclweb.org/anthology/N10-1029
  22. Chen SF, Goodman J (1999) An empirical study of smoothing techniques for language modeling. Comput Speech Lang 13(4):359–394CrossRefGoogle Scholar
  23. Church K, Hanks P (1990) Word association norms mutual information, and lexicography. Comput Linguist 16(1):22–29Google Scholar
  24. Constant M, Sigogne A (2011) MWU-aware part-of-speech tagging with a CRF model and lexical resources. In: Kordoni V, Ramisch C, Villavicencio A (eds) Proceedings of the ALC workshop on multiword expressions: from parsing and generation to the real World (MWE 2011), Portland. Association for Computational Linguistics, pp 49–56. http://www.aclweb.org/anthology/W/W11/W11-0809
  25. Constant M, Roux JL, Sigogne A (2013) Combining compound recognition and PCFG-LA parsing with word lattices and conditional random fields. ACM Trans Speech Lang Process Spec Issue Multiword Expr Theory Pract Use Part 2 (TSLP) 10(3):1–24CrossRefGoogle Scholar
  26. Cook P, Stevenson S (2006) Classifying particle semantics in English verb-particle constructions. In: Moirón BV, Villavicencio A, McCarthy D, Evert S, Stevenson S (eds) Proceedings of the COLING/ACL workshop on multiword expressions: identifying and exploiting underlying properties (MWE 2006), Sidney. Association for Computational Linguistics, pp 45–53. http://www.aclweb.org/anthology/W/W06/W06-1207
  27. Cook P, Fazly A, Stevenson S (2007) Pulling their weight: exploiting syntactic forms for the automatic identification of idiomatic expressions in context. In: Grégoire N, Evert S, Kim SN (eds) Proceedings of the ACL workshop on a broader perspective on multiword expressions (MWE 2007), Prague. Association for Computational Linguistics, pp 41–48. http://www.aclweb.org/anthology/W/W07/W07-1106
  28. Cook P, Fazly A, Stevenson S (2008) The VNC-tokens dataset. In: Grégoire N, Evert S, Krenn B (eds) Proceedings of the LREC workshop towards a shared task for multiword expressions (MWE 2008), Marrakech, pp 19–22Google Scholar
  29. Daille B (2003) Conceptual structuring through term variations. In: Bond F, Korhonen A, McCarthy D, Villavicencio A (eds) Proceedings of the ACL workshop on multiword expressions: analysis, acquisition and treatment (MWE 2003), Sapporo. Association for Computational Linguistics, pp 9–16. doi:10.3115/1119282.1119284. http://www.aclweb.org/anthology/W03-1802
  30. Daille B, Dufour-Kowalski S, Morin E (2004) French-English multi-word term alignment based on lexical context analysis. In: Proceedings of the fourth international conference on language resources and evaluation (LREC 2004), Lisbon. European Language Resources Association, pp 919–922Google Scholar
  31. Déjean H, Gaussier É, Sadat F (2002) An approach based on multilingual thesauri and model combination for bilingual lexicon extraction. In: Proceedings of the 19th international conference on computational linguistics (COLING 2002), Taipei. http://aclweb.org/anthology-new/C/C02/C02-1166.pdf
  32. de Medeiros Caseli H, Villavicencio A, Machado A, Finatto MJ (2009) Statistically-driven alignment-based multiword expression identification for technical domains. In: Anastasiou D, Hashimoto C, Nakov P, Kim SN (eds) Proceedings of the ACL workshop on multiword expressions: identification, interpretation, disambiguation, applications (MWE 2009), Singapore. Association for Computational Linguistics/Suntec, pp 1–8Google Scholar
  33. de Medeiros Caseli H, Ramisch C, das Graças Volpe Nunes M, Villavicencio A (2010) Alignment-based extraction of multiword expressions. Lang Resour Eval Spec Issue Multiword Express Hard Going Plain Sail 44(1–2):59–77. doi:10.1007/s10579-009-9097-9, http://www.springerlink.com/content/H7313427H78865MG
  34. Dias G (2003) Multiword unit hybrid extraction. In: Bond F, Korhonen A, McCarthy D, Villavicencio A (eds) Proceedings of the ACL workshop on multiword expressions: analysis, acquisition and treatment (MWE 2003), Sapporo. Association for Computational Linguistics, pp 41–48. doi:10.3115/1119282.1119288. http://www.aclweb.org/anthology/W03-1806
  35. Duan J, Lu R, Wu W, Hu Y, Tian Y (2006) A bio-inspired approach for multi-word expression extraction. In: Curran J (ed) Proceedings of the COLING/ACL 2006 main conference poster sessions, Sidney. Association for Computational Linguistics, pp 176–182. http://www.aclweb.org/anthology/P/P06/P06-2023
  36. Dunning T (1993) Accurate methods for the statistics of surprise and coincidence. Comput Linguist 19(1):61–74Google Scholar
  37. Duran MS, Ramisch C, Aluísio SM, Villavicencio A (2011) Identifying and analyzing Brazilian Portuguese complex predicates. In: Kordoni V, Ramisch C, Villavicencio A (eds) Proceedings of the ALC workshop on multiword expressions: from parsing and generation to the real world (MWE 2011), Portland. Association for Computational Linguistics, pp 74–82. http://www.aclweb.org/anthology/W/W11/W11-0812
  38. Evert S (2004) The statistics of word cooccurrences: word pairs and collocations. PhD thesis, Institut für maschinelle Sprachverarbeitung, University of Stuttgart, Stuttgart, 353pGoogle Scholar
  39. Evert S, Krenn B (2005) Using small random samples for the manual evaluation of statistical association measures. Comput Speech Lang Spec Issue MWEs 19(4):450–466CrossRefGoogle Scholar
  40. Fazly A, Stevenson S (2007) Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures. In: Grégoire N, Evert S, Kim SN (eds) Proceedings of the ACL workshop on a broader perspective on multiword expressions (MWE 2007), Prague. Association for Computational Linguistics, pp 9–16. http://www.aclweb.org/anthology/W/W07/W07-1102
  41. Finlayson M, Kulkarni N (2011) Detecting multi-word expressions improves word sense disambiguation. In: Kordoni V, Ramisch C, Villavicencio A (eds) Proceedings of the ALC workshop on multiword expressions: from parsing and generation to the real world (MWE 2011), Portland. Association for Computational Linguistics, pp 20–24. http://www.aclweb.org/anthology/W/W11/W11-0805
  42. Frantzi K, Ananiadou S, Mima H (2000) Automatic recognition of multiword terms: the C-value/NC-value method. Int J Digit Libr 3(2):115–130CrossRefGoogle Scholar
  43. Fritzinger F, Weller M, Heid U (2010) A survey of idiomatic preposition-noun-verb triples on token level. In: Proceedings of the seventh international conference on language resources and evaluation (LREC 2010), Valetta. European Language Resources Association, pp 2908–2914Google Scholar
  44. Gil A, Dias G (2003) Using masks, suffix array-based data structures and multidimensional arrays to compute positional n-gram statistics from corpora. In: Bond F, Korhonen A, McCarthy D, Villavicencio A (eds) Proceedings of the ACL workshop on multiword expressions: analysis, acquisition and treatment (MWE 2003), Sapporo. Association for Computational Linguistics, pp 25–32. doi:10.3115/1119282.1119286, http://www.aclweb.org/anthology/W03-1804
  45. Girju R, Moldovan D, Tatu M, Antohe D (2005) On the semantics of noun compounds. Comput Speech Lang Spec Issue MWEs 19(4):479–496CrossRefGoogle Scholar
  46. Good IJ (1953) The population frequencies of species and the estimation of population parameters. Biometrika 40(3–4):237–264. doi:10.1093/biomet/40.3-4.237MathSciNetCrossRefzbMATHGoogle Scholar
  47. Graliński F, Savary A, Czerepowicka M, Makowiecki F (2010) Computational lexicography of multi-word units: how efficient can it be? In: Laporte É, Nakov P, Ramisch C, Villavicencio A (eds) Proceedings of the COLING workshop on multiword expressions: from theory to applications (MWE 2010), Beijing. Association for Computational Linguistics, pp 1–9Google Scholar
  48. Green S, de Marneffe MC, Bauer J, Manning CD (2011) Multiword expression identification with tree substitution grammars: a parsing tour de force with French. In: Barzilay R, Johnson M (eds) Proceedings of the 2011 conference on empirical methods in natural language processing (EMNLP 2011), Edinburgh. Association for Computational Linguistics, pp 725–735. http://www.aclweb.org/anthology/D11-1067
  49. Grefenstette G (1999) The world wide web as a resource for example-based machine translation tasks. In: Proceedings of the twenty-first international conference on translating and the computer, ASLIB, LondonGoogle Scholar
  50. Grégoire N (2007) Design and implementation of a lexicon of Dutch multiword expressions. In: Grégoire N, Evert S, Kim SN (eds) Proceedings of the ACL workshop on a broader perspective on multiword expressions (MWE 2007), Prague. Association for Computational Linguistics, pp 17–24. http://www.aclweb.org/anthology/W/W07/W07-1103
  51. Grégoire N (2010) DuELME: a Dutch electronic lexicon of multiword expressions. Lang Resour Eval Spec Issue Multiword Expr Hard Going Plain Sail 44(1–2):23–39. doi:10.1007/s10579-009-9094-z. http://www.springerlink.com/content/7308605442W17698
  52. Grégoire N, Evert S, Krenn B (eds) (2008) Proceedings of the LREC workshop towards a shared task for multiword expressions (MWE 2008), Marrakech, 57p. http://www.lrec-conf.org/proceedings/lrec2008/workshops/W20_Proceedings.pdf
  53. Gurrutxaga A, Alegria I (2011) Automatic extraction of NV expressions in Basque: basic issues on cooccurrence techniques. In: Kordoni V, Ramisch C, Villavicencio A (eds) Proceedings of the ALC workshop on multiword expressions: from parsing and generation to the real world (MWE 2011), Portland. Association for Computational Linguistics, pp 2–7. http://www.aclweb.org/anthology/W/W11/W11-0802
  54. Haugereid P, Bond F (2011) Extracting transfer rules for multiword expressions from parallel corpora. In: Kordoni V, Ramisch C, Villavicencio A (eds) Proceedings of the ALC workshop on multiword expressions: from parsing and generation to the real world (MWE 2011), Portland. Association for Computational Linguistics, pp 92–100. http://www.aclweb.org/anthology/W/W11/W11-0814
  55. Hendrickx I, Kim SN, Kozareva Z, Nakov P, Séaghdha DO, Padó S, Pennacchiotti M, Romano L, Szpakowicz S (2010) Semeval-2010 task 8: multi-way classification of semantic relations between pairs of nominals. In: Erk K, Strapparava C (eds) Proceedings of the 5th international workshop on semantic evaluation (SemEval 2010), Uppsala. Association for Computational Linguistics, pp 33–38. http://www.aclweb.org/anthology/S10-1006
  56. Hoang HH, Kim SN, Kan MY (2009) A re-examination of lexical association measures. In: Anastasiou D, Hashimoto C, Nakov P, Kim SN (eds) Proceedings of the ACL workshop on multiword expressions: identification, interpretation, disambiguation, applications (MWE 2009), Singapore. Association for Computational Linguistics/Suntec, pp 31–39Google Scholar
  57. Hogan D, Foster J, van Genabith J (2011) Decreasing lexical data sparsity in statistical syntactic parsing – experiments with named entities. In: Kordoni V, Ramisch C, Villavicencio A (eds) Proceedings of the ALC workshop on multiword expressions: from parsing and generation to the real world (MWE 2011), Portland. Association for Computational Linguistics, pp 14–19. http://www.aclweb.org/anthology/W/W11/W11-0804
  58. Izumi T, Imamura K, Kikui G, Sato S (2010) Standardizing complex functional expressions in Japanese predicates: applying theoretically-based paraphrasing rules. In: Laporte É, Nakov P, Ramisch C, Villavicencio A (eds) Proceedings of the COLING workshop on multiword expressions: from theory to applications (MWE 2010), Beijing. Association for Computational Linguistics, pp 63–71Google Scholar
  59. Jurafsky D, Martin JH (2008) Speech and language processing, 2nd edn. Prentice Hall, Upper Saddle River, 1024pGoogle Scholar
  60. Justeson JS, Katz SM (1995) Technical terminology: some linguistic properties and an algorithm for identification in text. Nat Lang Eng 1(1):9–27CrossRefGoogle Scholar
  61. Keller F, Lapata M (2003) Using the web to obtain frequencies for unseen bigrams. Comput Linguist Spec Issue Web Corpus 29(3):459–484CrossRefGoogle Scholar
  62. Kim SN, Baldwin T (2013) A lexical semantic approach to interpreting and bracketing English noun compounds. Nat Lang Eng Spec Issue Noun Compd 19(3):385–407. doi:10.1017/S1351324913000107, http://journals.cambridge.org/article_S1351324913000107
  63. Kim SN, Nakov P (2011) Large-scale noun compound interpretation using bootstrapping and the web as a corpus. In: Barzilay R, Johnson M (eds) Proceedings of the 2011 conference on empirical methods in natural language processing (EMNLP 2011), Edinburgh. Association for Computational Linguistics, pp 648–658. http://www.aclweb.org/anthology/D11-1060
  64. Kneser R, Ney H (1995) Improved backing-off for M-gram language modeling. In: Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP 1995), Detroit, vol 1, pp 181–184. doi:10.1109/ICASSP.1995.479394, http://dx.doi.org/10.1109/ICASSP.1995.479394
  65. Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Proceedings of the tenth machine translation summit (MT Summit 2005), Phuket. Asian-Pacific Association for Machine Translation, pp 79–86Google Scholar
  66. Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the Association for Computational Linguistics (ACL 2007), Prague. Association for Computational Linguistics, pp 177–180Google Scholar
  67. Korkontzelos I, Manandhar S (2010) Can recognising multiword expressions improve shallow parsing? In: Proceedings of human language technology: the 2010 annual conference of the North American chapter of the Association for Computational Linguistics (NAACL 2003), Los Angeles. Association for Computational Linguistics, pp 636–644. http://www.aclweb.org/anthology/N10-1089
  68. Kulkarni N, Finlayson M (2011) jMWE: a java toolkit for detecting multi-word expressions. In: Kordoni V, Ramisch C, Villavicencio A (eds) Proceedings of the ALC workshop on multiword expressions: from parsing and generation to the real world (MWE 2011), Portland. Association for Computational Linguistics, pp 122–124. http://www.aclweb.org/anthology/W/W11/W11-0818
  69. Lapata M (2002) The disambiguation of nominalizations. Comput Linguist 28(3):357–388CrossRefGoogle Scholar
  70. Laporte É, Voyatzi S (2008) An electronic dictionary of French multiword adverbs. In: Grégoire N, Evert S, Krenn B (eds) Proceedings of the LREC workshop towards a shared task for multiword expressions (MWE 2008), Marrakech, pp 31–34Google Scholar
  71. Laporte É, Nakamura T, Voyatzi S (2008) A French corpus annotated for multiword nouns. In: Grégoire N, Evert S, Krenn B (eds) Proceedings of the LREC workshop towards a shared task for multiword expressions (MWE 2008), Marrakech, pp 27–30Google Scholar
  72. Li Z, Callison-Burch C, Dyer C, Ganitkevitch J, Khudanpur S, Schwartz L, Thornton WNG, Weese J, Zaidan OF (2009) Joshua: an open source toolkit for parsing-based machine translation. In: Proceedingsof the fourth workshop on statistical machine translation (WMT 2009), Athens. Association for Computational Linguistics, pp 135–139Google Scholar
  73. Manber U, Myers G (1990) Suffix arrays: a new method for on-line string searches. In: SODA ’90: proceedings of the first annual ACM-SIAM symposium on discrete algorithms, San Francisco. Society for Industrial and Applied Mathematics, Philadelphia, pp 319–327Google Scholar
  74. Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT, Cambridge, 620pzbMATHGoogle Scholar
  75. Martens S (2010) Varro: an algorithm and toolkit for regular structure discovery in treebanks. In: Huang CR, Jurafsky D (eds) Proceedings of the 23rd international conference on computational linguistics (COLING 2010)—posters, Beijing. The Coling 2010 Organizing Committee, pp 810–818. http://www.aclweb.org/anthology/C10-2093
  76. Martens S, Vandeghinste V (2010) An efficient, generic approach to extracting multi-word expressions from dependency trees. In: Laporte É, Nakov P, Ramisch C, Villavicencio A (eds) Proceedings of the COLING workshop on multiword expressions: from theory to applications (MWE 2010), Beijing. Association for Computational Linguistics, pp 84–87Google Scholar
  77. McCarthy D, Keller B, Carroll J (2003) Detecting a continuum of compositionality in phrasal verbs. In: Bond F, Korhonen A, McCarthy D, Villavicencio A (eds) Proceedings of the ACL workshop on multiword expressions: analysis, acquisition and treatment (MWE 2003), Sapporo. Association for Computational Linguistics, pp 73–80. doi:10.3115/1119282.1119292, http://www.aclweb.org/anthology/W03-1810
  78. McCarthy D, Venkatapathy S, Joshi A (2007) Detecting compositionality of verb-object combinations using selectional preferences. In: Eisner J (ed) Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL 2007), Prague. Association for Computational Linguistics, pp 369–379. http://www.aclweb.org/anthology/D/D07/D07-1039
  79. Melamed ID (1997) Automatic discovery of non-compositional compounds in parallel data. In: Proceedings of the 2nd conference on empirical methods in natural language processing (EMNLP-2), Brown University, Providence. Association for Computational Linguistics, pp 97–108Google Scholar
  80. Michou A, Seretan V (2009) A tool for multi-word expression extraction in modern Greek using syntactic parsing. In: Proceedings of the demonstrations session at EACL 2009, Athens. Association for Computational Linguistics, pp 45–48Google Scholar
  81. Mikheev A (2002) Periods, capitalized words, etc. Comput Linguist 28(3):289–318CrossRefGoogle Scholar
  82. Mirroshandel SA, Nasr A, Roux JL (2012) Semi-supervised dependency parsing using lexical affinities. In: Proceedings of the 50th annual meeting of the Association for Computational Linguistics (vol 1: long papers), Jeju Island. Association for Computational Linguistics, pp 777–785. http://www.aclweb.org/anthology/P12-1082
  83. Mitkov R, Monti J, Pastor GC, Seretan V (eds) (2013) Proceedings of the MT summit 2013 workshop on multi-word units in machine translation and translation technology (MUMTTT 2013), Nice. European Association for Machine Translation, 71p. http://www.mtsummit2013.info/workshop4.asp
  84. Monti J, Barreiro A, Elia A, Marano F, Napoli A (2011) Taking on new challenges in multi-word unit processing for machine translation. In: Proceedings of the second international workshop on free/open-source rule-based machine translation, BarcelonaGoogle Scholar
  85. Morin E, Daille B (2010) Compositionality and lexical alignment of multi-word terms. Lang Resour Eval Spec Issue Multiword Express Hard Going Plain Sail 44(1–2):79–95. doi:10.1007/s10579-009-9098-8, http://www.springerlink.com/content/30264870R1K04744
  86. Nakov P (2007) Using the web as an implicit training set: application to noun compound syntax and semantics. PhD thesis, EECS Department, University of California, Berkeley, 392pGoogle Scholar
  87. Nakov P (2008a) Improved statistical machine translation using monolingual paraphrases. In: Ghallab M, Spyropoulos CD, Fakotakis N, Avouris NM (eds) Proceedings of the 18th European conference on artificial intelligence (ECAI 2008), Patras. Frontiers in Artificial Intelligence and Applications, vol 178. IOS Press, pp 338–342Google Scholar
  88. Nakov P (2008b) Paraphrasing verbs for noun compound interpretation. In: Grégoire N, Evert S, Krenn B (eds) Proceedings of the LREC workshop towards a shared task for multiword expressions (MWE 2008), Marrakech, pp 46–49Google Scholar
  89. Nakov P (2013) On the interpretation of noun compounds: syntax, semantics, and entailment. Nat Lang Eng Spec Issue Noun Compd 19(3):291–330. doi:10.1017/S1351324913000065, http://journals.cambridge.org/article_S1351324913000065
  90. Nakov P, Hearst MA (2005) Search engine statistics beyond the n-gram: application to noun compound bracketing. In: Dagan I, Gildea D (eds) Proceedings of the ninth conference on natural language learning (CoNLL-2005), University of Michigan, Ann Arbor. Association for Computational Linguistics, pp 17–24. http://www.aclweb.org/anthology/W/W05/W05-0603
  91. Nakov P, Hearst MA (2008) Solving relational similarity problems using the web as a corpus. In: Proceedings of the 46th annual meeting of the Association for Computational Linguistics: human language technology (ACL-08: HLT), Columbus. Association for Computational Linguistics, pp 452–460Google Scholar
  92. Nasr A, Bechet F, Rey JF, Favre B, Roux JL (2011) MACAON an NLP tool suite for processing word lattices. In: Proceedings of the ACL 2011 system demonstrations, Portland. Association for Computational Linguistics, pp 86–91. http://www.aclweb.org/anthology/P11-4015
  93. Newman MEJ (2005) Power laws, pareto distributions and zipf’s law. Contemp Phys 46:323–351CrossRefGoogle Scholar
  94. Nicholson J, Baldwin T (2006) Interpretation of compound nominalisations using corpus and web statistics. In: Moirón BV, Villavicencio A, McCarthy D, Evert S, Stevenson S (eds) Proceedings of the COLING/ACL workshop on multiword expressions: identifying and exploiting underlying properties (MWE 2006), Sidney. Association for Computational Linguistics, pp 54–61. http://www.aclweb.org/anthology/W/W06/W06-1208
  95. Nicholson J, Baldwin T (2008) Interpreting compound nominalisations. In: Grégoire N, Evert S, Krenn B (eds) Proceedings of the LREC workshop towards a shared task for multiword expressions (MWE 2008), Marrakech, pp 43–45Google Scholar
  96. Nulty P, Costello F (2010) UCD-PN: Selecting general paraphrases using conditional probability. In: Erk K, Strapparava C (eds) Proceedings of the 5th international workshop on semantic evaluation (SemEval 2010), Uppsala. Association for Computational Linguistics, pp 234–237. http://www.aclweb.org/anthology/S10-1052
  97. Nulty P, Costello F (2013) General and specific paraphrases of semantic relations between nouns. Nat Lang Eng Spec Issue Noun Compd 19(3):357–384. doi:10.1017/S1351324913000089, http://journals.cambridge.org/article_S1351324913000089
  98. Pal S, Naskar SK, Pecina P, Bandyopadhyay S, Way A (2010) Handling named entities and compound verbs in phrase-based statistical machine translation. In: Laporte É, Nakov P, Ramisch C, Villavicencio A (eds) Proceedings of the COLING workshop on multiword expressions: from theory to applications (MWE 2010), Beijing. Association for Computational Linguistics, pp 45–53Google Scholar
  99. Pearce D (2002) A comparative evaluation of collocation extraction techniques. In: Proceedings of the third international conference on language resources and evaluation (LREC 2002), Las Palmas. European Language Resources Association, pp 1530–1536Google Scholar
  100. Pecina P (2005) An extensive empirical study of collocation extraction methods. In: Proceedings of the ACL 2005 student research workshop, Ann Arbor. Association for Computational Linguistics, pp 13–18. http://www.aclweb.org/anthology/P/P05/P05-2003
  101. Pecina P (2008) Reference data for Czech collocation extraction. In: Grégoire N, Evert S, Krenn B (eds) Proceedings of the LREC workshop towards a shared task for multiword expressions (MWE 2008), Marrakech, pp 11–14Google Scholar
  102. Pedersen T, Banerjee S, McInnes B, Kohli S, Joshi M, Liu Y (2011) The n-gram statistics package (text::NSP): a flexible tool for identifying n-grams, collocations, and word associations. In: Kordoni V, Ramisch C, Villavicencio A (eds) Proceedings of the ALC workshop on multiword expressions: from parsing and generation to the real world (MWE 2011), Portland. Association for Computational Linguistics, pp 131–133. http://www.aclweb.org/anthology/W/W11/W11-0821
  103. Planas E, Furuse O (2000) Multi-level similar segment matching algorithm for translation memories and example-based machine translation. In: Proceedings of the 18th international conference on computational linguistics (COLING 2000), Saarbrücken. http://aclweb.org/anthology-new/C/C00/C00-2090.pdf
  104. Ramisch C (2009) Multiword terminology extraction for domain-specific documents. Master’s thesis, École Nationale Supérieure d’Informatique et de Mathématiques Appliquées, Grenoble, 79pGoogle Scholar
  105. Ramisch C, Villavicencio A, Moura L, Idiart M (2008) Picking them up and figuring them out: verb-particle constructions, noise and idiomaticity. In: Clark A, Toutanova K (eds) Proceedings of the twelfth conference on natural language learning (CoNLL 2008), Manchester. The Coling 2008 Organizing Committee, pp 49–56. http://www.aclweb.org/anthology/W08-2107
  106. Ramisch C, de Medeiros Caseli H, Villavicencio A, Machado A, Finatto MJ (2010) A hybrid approach for multiword expression identification. In: Proceedings of the 9th international conference on computational processing of Portuguese language (PROPOR 2010), Porto Alegre. Lecture notes in computer science (Lecture notes in artificail intelligence), vol 6001. Springer, pp 65–74. doi:10.1007/978-3-642-12320-7_9, http://www.springerlink.com/content/978-3-642-12319-1
  107. Ren Z, Lü Y, Cao J, Liu Q, Huang Y (2009) Improving statistical machine translation using domain bilingual multiword expressions. In: Anastasiou D, Hashimoto C, Nakov P, Kim SN (eds) Proceedings of the ACL workshop on multiword expressions: identification, interpretation, disambiguation, applications (MWE 2009), Singapore. Association for Computational Linguistics/Suntec, pp 47–54Google Scholar
  108. Roller S, im Walde SS, Scheible S (2013) The (un)expected effects of applying standard cleansing models to human ratings on compositionality. In: Kordoni V, Ramisch C, Villavicencio A (eds) Proceedings of the 9th workshop on multiword expressions (MWE 2013), Atlanta. Association for Computational Linguistics, pp 32–41. http://www.aclweb.org/anthology/W13-1005
  109. Sag I, Baldwin T, Bond F, Copestake A, Flickinger D (2002) Multiword expressions: a pain in the neck for NLP. In: Proceedings of the 3rd international conference on intelligent text processing and computational linguistics (CICLing-2002), Mexico City. Lecture notes in computer science, vol 2276/2010. Springer, pp 1–15Google Scholar
  110. SanJuan E, Dowdall J, Ibekwe-SanJuan F, Rinaldi F (2005) A symbolic approach to automatic multiword term structuring. Comput Speech Lang Spec Issue MWEs 19(4):524–542CrossRefGoogle Scholar
  111. Schmid H (1994) Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the international conference on new methods in language processing, Manchester, pp 44–49. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.1139
  112. Schone P, Jurafsky D (2001) Is knowledge-free induction of multiword unit dictionary headwords a solved problem? In: Lee L, Harman D (eds) Proceedings of the 2001 conference on empirical methods in natural language processing (EMNLP 2001), Pittsburgh. Association for Computational Linguistics, pp 100–108Google Scholar
  113. Schuler W, Joshi A (2011) Tree-rewriting models of multi-word expressions. In: Kordoni V, Ramisch C, Villavicencio A (eds) Proceedings of the ALC workshop on multiword expressions: from parsing and generation to the real world (MWE 2011), Portland. Association for Computational Linguistics, pp 25–30. http://www.aclweb.org/anthology/W/W11/W11-0806
  114. Séaghdha DÓ, Copestake A (2013) Interpreting compound nouns with kernel methods. Nat Lang Eng Spec Issue Noun Compd 19(3):331–356. doi:10.1017/S1351324912000368, http://journals.cambridge.org/article_S1351324912000368
  115. Seretan V (2008) Collocation extraction based on syntactic parsing. PhD thesis, University of Geneva, Geneva, 249pGoogle Scholar
  116. Seretan V (2011) Syntax-based Collocation extraction, text, speech and language technology, vol 44, 1st edn. Springer, Dordrecht, 212pGoogle Scholar
  117. Seretan V, Wehrli E (2006) Multilingual collocation extraction: issues and solutions. In: Witt A, Sérasset G, Armstrong S, Breen J, Heid U, Sasaki F (eds) Proceedings of the ACL workshop on multilingual language resources and interoperability, Sydney. Association for Computational Linguistics, pp 40–49. http://www.aclweb.org/anthology/W/W06/W06-1006
  118. Seretan V, Wehrli E (2009) Multilingual collocation extraction with a syntactic parser. Lang Resour Eval Spec Issue Multiling Lang Resour Interoper 43(1):71–85. doi:10.1007/s10579-008-9075-7, http://www.springerlink.com/content/341877K50497682X
  119. Seretan V, Wehrli E (2011) Fipscoview: on-line visualisation of collocations extracted from multilingual parallel corpora. In: Kordoni V, Ramisch C, Villavicencio A (eds) Proceedings of the ALC workshop on multiword expressions: from parsing and generation to the real world (MWE 2011), Portland. Association for Computational Linguistics, pp 125–127. http://www.aclweb.org/anthology/W/W11/W11-0819
  120. Silva J, Lopes G (1999) A local maxima method and a fair dispersion normalization for extracting multi-word units from corpora. In: Proceedings of the sixth meeting on mathematics of language (MOL6), Orlando, pp 369–381Google Scholar
  121. Silva J, Lopes G (2010) Towards automatic building of document keywords. In: Huang CR, Jurafsky D (eds) Proceedings of the 23rd international conference on computational linguistics (COLING 2010)—posters, Beijing. The Coling 2010 Organizing Committee, pp 1149–1157. http://www.aclweb.org/anthology/C10-2132
  122. da Silva JF, Dias G, Guilloré S, Lopes JGP (1999) Using localmaxs algorithm for the extraction of contiguous and non-contiguous multiword lexical units. In: Proceedings of the 9th Portuguese conference on artificial intelligence: progress in artificial intelligence, London. EPIA 1999, pp 113–132. Springer. http://dl.acm.org/citation.cfm?id=645377.651205
  123. Smadja FA (1993) Retrieving collocations from text: xtract. Comput Linguist 19(1):143–177Google Scholar
  124. Stymne S (2009) A comparison of merging strategies for translation of German compounds. In: Proceedings of the student research workshop at EACL 2009, Athens, pp 61–69Google Scholar
  125. Stymne S (2011) Pre- and postprocessing for statistical machine translation into Germanic languages. In: Proceedings of the ACL 2011 student research workshop, Portland. Association for Computational Linguistics, pp 12–17. http://www.aclweb.org/anthology/P11-3003
  126. Szpakowicz S, Bond F, Nakov P, Kim SN (2013) On the semantics of noun compounds. In: Nat Lang Eng Spec Issue Noun Compd 19(3):289–290. Cambridge Univesity Press, CambridgeGoogle Scholar
  127. Tanaka T, Baldwin T (2003) Noun-noun compound machine translation a feasibility study on shallow processing. In: Bond F, Korhonen A, McCarthy D, Villavicencio A (eds) Proceedings of the ACL workshop on multiword expressions: analysis, acquisition and treatment (MWE 2003), Sapporo. Association for Computational Linguistics, pp 17–24. doi:10.3115/1119282.1119285. http://www.aclweb.org/anthology/W03-1803
  128. Tsvetkov Y, Wintner S (2010) Extraction of multi-word expressions from small parallel corpora. In: Huang CR, Jurafsky D (eds) Proceedings of the 23rd international conference on computational linguistics (COLING 2010)—posters, Beijing. The Coling 2010 Organizing Committee, pp 1256–1264. http://www.aclweb.org/anthology/C10-2144
  129. Tsvetkov Y, Wintner S (2011) Identification of multi-word expressions by combining multiple linguistic information sources. In: Barzilay R, Johnson M (eds) Proceedings of the 2011 conference on empirical methods in natural language processing (EMNLP 2011), Edinburgh. Association for Computational Linguistics, pp 836–845. http://www.aclweb.org/anthology/D11-1077
  130. Uchiyama K, Baldwin T, Ishizaki S (2005) Disambiguating Japanese compound verbs. Comput Speech Lang Spec Issue MWEs 19(4):497–512CrossRefGoogle Scholar
  131. Uresova Z, Hajic J, Fucikova E, Sindlerova J (2013) An analysis of annotation of verb-noun idiomatic combinations in a parallel dependency corpus. In: Kordoni V, Ramisch C, Villavicencio A (eds) Proceedings of the 9th workshop on multiword expressions (MWE 2013), Atlanta. Association for Computational Linguistics, pp 58–63. http://www.aclweb.org/anthology/W13-1009
  132. Venkatapathy S, Joshi AK (2006) Using information about multi-word expressions for the word-alignment task. In: Moirón BV, Villavicencio A, McCarthy D, Evert S, Stevenson S (eds) Proceedings of the COLING/ACL workshop on multiword expressions: identifying and exploiting underlying properties (MWE 2006), Sidney. Association for Computational Linguistics, pp 20–27. http://www.aclweb.org/anthology/W/W06/W06-1204
  133. Villavicencio A, Bond F, Korhonen A, McCarthy D (2005) Introduction to the special issue on multiword expressions: having a crack at a hard nut. Comput Speech Lang Spec Issue MWEs 19(4):365–377CrossRefGoogle Scholar
  134. Villavicencio A, Kordoni V, Zhang Y, Idiart M, Ramisch C (2007) Validation and evaluation of automatically acquired multiword expressions for grammar engineering. In: Eisner J (ed) Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL 2007), Prague. Association for Computational Linguistics, pp 1034–1043. http://www.aclweb.org/anthology/D/D07/D07-1110
  135. Vincze V, Nagy TI, Berend G (2011) Detecting noun compounds and light verb constructions: a contrastive study. In: Kordoni V, Ramisch C, Villavicencio A (eds) Proceedings of the ALC workshop on multiword expressions: from parsing and generation to the real world (MWE 2011), Portland. Association for Computational Linguistics, pp 116–121. http://www.aclweb.org/anthology/W/W11/W11-0817
  136. Wehrli E (1998) Translating idioms. In: Proceedings of the 36th annual meeting of the Association for Computational Linguistics and 17th international conference on computational linguistics, Montreal, vol 2. Association for Computational Linguistics, pp 1388–1392. doi:10.3115/980691.980795. http://www.aclweb.org/anthology/P98-2226
  137. Wehrli E, Seretan V, Nerima L (2010) Sentence analysis and collocation identification. In: Laporte É, Nakov P, Ramisch C, Villavicencio A (eds) Proceedings of the COLING workshop on multiword expressions: from theory to applications (MWE 2010), Beijing. Association for Computational Linguistics, pp 27–35Google Scholar
  138. Wermter J, Hahn U (2006) You can’t beat frequency (unless you use linguistic knowledge) – a qualitative evaluation of association measures for collocation and term extraction. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics (COLING/ACL 2006), Sidney. Association for Computational Linguistics, pp 785–792Google Scholar
  139. Xu Y, Goebel R, Ringlstetter C, Kondrak G (2010) Application of the tightness continuum measure to Chinese information retrieval. In: Laporte É, Nakov P, Ramisch C, Villavicencio A (eds) Proceedings of the COLING workshop on multiword expressions: from theory to applications (MWE 2010), Beijing. Association for Computational Linguistics, pp 54–62Google Scholar
  140. Yamamoto M, Church K (2001) Using suffix arrays to compute term frequency and document frequency for all substrings in a corpus. Comput Linguist 27(1):1–30CrossRefGoogle Scholar
  141. Zarrieß S, Kuhn J (2009) Exploiting translational correspondences for pattern-independent MWE identification. In: Anastasiou D, Hashimoto C, Nakov P, Kim SN (eds) Proceedings of the ACL workshop on multiword expressions: identification, interpretation, disambiguation, applications (MWE 2009), Singapore. Association for Computational Linguistics/Suntec, pp 23–30Google Scholar
  142. Zhang Y, Kordoni V (2006) Automated deep lexical acquisition for robust open texts processing. In: Proceedings of the sixth international conference on language resources and evaluation (LREC 2006), Genoa. European Language Resources Association, pp 275–280Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Carlos Ramisch
    • 1
  1. 1.Aix Marseille UniversityMarseilleFrance

Personalised recommendations