Skip to main content

Evaluation of MWE Acquisition

  • Chapter
  • First Online:
Multiword Expressions Acquisition
  • 986 Accesses

Abstract

The result of automatic the MWE acquisition methods described in Sects. 3.2.1 and 3.2.2 can be viewed as a list of MWE candidates. We can evaluate the quality of a given approach for MWE acquisition by assessing the utility of the resulting MWE candidate list for a given application. This list has often an internal structure, and each candidate contains attached information, coming from corpora or from external resources. However, if we ignore this extra information (which is often the case), it is possible to define objective criteria for determining the quality of the list, and, indirectly, of the acquisition method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://multiword.sf.net

References

  • Acosta O, Villavicencio A, Moreira V (2011) Identification and treatment of multiword expressions applied to information retrieval. In: Kordoni V, Ramisch C, Villavicencio A (eds) Proceedings of the ALC workshop on multiword expressions: from parsing and generation to the real world (MWE 2011), Portland. Association for Computational Linguistics, pp 101–109. http://www.aclweb.org/anthology/W/W11/W11-0815

  • Artstein R, Poesio M (2008) Inter-coder agreement for computational linguistics. Comput Linguist 34(4):555–596

    Article  Google Scholar 

  • Baldwin T (2005) Deep lexical acquisition of verb-particle constructions. Comput Speech Lang Spec Issue MWEs 19(4):398–414

    Article  Google Scholar 

  • Baldwin T (2008) A resource for evaluating the deep lexical acquisition of English verb-particle constructions. In: Grégoire N, Evert S, Krenn B (eds) Proceedings of the LREC workshop towards a shared task for multiword expressions (MWE 2008), Marrakech, pp 1–2

    Google Scholar 

  • Bonin F, Dell’Orletta F, Venturi G, Montemagni S (2010) Contrastive filtering of domain-specific multi-word terms from different types of corpora. In: Laporte É, Nakov P, Ramisch C, Villavicencio A (eds) Proceedings of the COLING workshop on multiword expressions: from theory to applications (MWE 2010), Beijing. Association for Computational Linguistics, pp 76–79

    Google Scholar 

  • Carpuat M, Diab M (2010) Task-based evaluation of multiword expressions: a pilot study in statistical machine translation. In: Proceedings of human language technology: the 2010 annual conference of the North American chapter of the Association for Computational Linguistics (NAACL 2003), Los Angeles. Association for Computational Linguistics, pp 242–245. http://www.aclweb.org/anthology/N10-1029

  • Church K (2011) How many multiword expressions do people know? In: Kordoni V, Ramisch C, Villavicencio A (eds) Proceedings of the ALC workshop on multiword expressions: from parsing and generation to the real world (MWE 2011), Portland. Association for Computational Linguistics, pp 137–144. http://www.aclweb.org/anthology/W/W11/W11-0823

  • Cook P, Fazly A, Stevenson S (2007) Pulling their weight: exploiting syntactic forms for the automatic identification of idiomatic expressions in context. In: Grégoire N, Evert S, Kim SN (eds) Proceedings of the ACL workshop on a broader perspective on multiword expressions (MWE 2007), Prague. Association for Computational Linguistics, pp 41–48. http://www.aclweb.org/anthology/W/W07/W07-1106

  • Cook P, Fazly A, Stevenson S (2008) The VNC-tokens dataset. In: Grégoire N, Evert S, Krenn B (eds) Proceedings of the LREC workshop towards a shared task for multiword expressions (MWE 2008), Marrakech, pp 19–22

    Google Scholar 

  • da Silva JF, Dias G, Guilloré S, Lopes JGP (1999) Using localmaxs algorithm for the extraction of contiguous and non-contiguous multiword lexical units. In: Proceedings of the 9th Portuguese conference on artificial intelligence: progress in artificial intelligence, (EPIA 1999), Évora. Springer, London, pp 113–132. http://dl.acm.org/citation.cfm?id=645377.651205

  • Doucet A, Ahonen-Myka H (2004) Non-contiguous word sequences for information retrieval. In: Tanaka T, Villavicencio A, Bond F, Korhonen A (eds) Proceedings of the ACL workshop on multiword expressions: integrating processing (MWE 2004), Barcelona. Association for Computational Linguistics, pp 88–95

    Google Scholar 

  • Dunning T (1993) Accurate methods for the statistics of surprise and coincidence. Comput Linguist 19(1):61–74

    Google Scholar 

  • Duran MS, Ramisch C (2011) How do you feel? investigating lexical-syntactic patterns in sentiment expression. In: Proceedings of corpus linguistics 2011: discourse and corpus linguistics conference, Birmingham

    Google Scholar 

  • Eugenio BD, Glass M (2004) The kappa statistic: a second look. Comput Linguist 30(1):95–101

    Article  MATH  Google Scholar 

  • Evert S (2004) The statistics of word cooccurrences: word pairs and collocations. PhD thesis, Institut für maschinelle Sprachverarbeitung, University of Stuttgart, Stuttgart, 353p

    Google Scholar 

  • Evert S, Krenn B (2005) Using small random samples for the manual evaluation of statistical association measures. Comput Speech Lang Spec Issue MWEs 19(4):450–466

    Article  Google Scholar 

  • Fazly A, Stevenson S (2007) Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures. In: Grégoire N, Evert S, Kim SN (eds) Proceedings of the ACL workshop on a broader perspective on multiword expressions (MWE 2007), Prague. Association for Computational Linguistics, pp 9–16. http://www.aclweb.org/anthology/W/W07/W07-1102

  • Finlayson M, Kulkarni N (2011) Detecting multi-word expressions improves word sense disambiguation. In: Kordoni V, Ramisch C, Villavicencio A (eds) Proceedings of the ALC workshop on multiword expressions: from parsing and generation to the real world (MWE 2011), Portland. Association for Computational Linguistics, pp 20–24. http://www.aclweb.org/anthology/W/W11/W11-0805

  • Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378–382

    Article  Google Scholar 

  • Fritzinger F, Weller M, Heid U (2010) A survey of idiomatic preposition-noun-verb triples on token level. In: Proceedings of the seventh international conference on language resources and evaluation (LREC 2010), Valetta. European Language Resources Association, pp 2908–2914

    Google Scholar 

  • Green S, de Marneffe MC, Bauer J, Manning CD (2011) Multiword expression identification with tree substitution grammars: a parsing tour de force with French. In: Barzilay R, Johnson M (eds) Proceedings of the 2011 conference on empirical methods in natural language processing (EMNLP 2011), Edinburgh. Association for Computational Linguistics, pp 725–735. http://www.aclweb.org/anthology/D11-1067

  • Grégoire N, Evert S, Krenn B (eds) (2008) Proceedings of the LREC workshop towards a shared task for multiword expressions (MWE 2008), Marrakech. http://www.lrec-conf.org/proceedings/lrec2008/workshops/W20_Proceedings.pdf, 57p

  • Justeson JS, Katz SM (1995) Technical terminology: some linguistic properties and an algorithm for identification in text. Nat Lang Eng 1(1):9–27

    Article  Google Scholar 

  • Krenn B (2008) Description of evaluation resource – German PP-verb data. In: Grégoire N, Evert S, Krenn B (eds) Proceedings of the LREC workshop towards a shared task for multiword expressions (MWE 2008), Marrakech, pp 7–10

    Google Scholar 

  • Langer S (2004) A linguistic test battery for support verb constructions. Spec Issue Linguist Investig 27(2):171–184

    Article  Google Scholar 

  • Laporte É, Voyatzi S (2008) An electronic dictionary of French multiword adverbs. In: Grégoire N, Evert S, Krenn B (eds) Proceedings of the LREC workshop towards a shared task for multiword expressions (MWE 2008), Marrakech, pp 31–34

    Google Scholar 

  • Lee L, Aw A, Zhang M, Li H (2010) EM-based hybrid model for bilingual terminology extraction from comparable corpora. In: Huang CR, Jurafsky D (eds) Proceedings of the 23rd international conference on computational linguistics (COLING 2010)—Posters, The Coling 2010 Organizing Committee, Beijing, pp 639–646. http://www.aclweb.org/anthology/C10-2073

  • Linardaki E, Ramisch C, Villavicencio A, Fotopoulou A (2010) Towards the construction of language resources for Greek multiword expressions: extraction and evaluation. In: Piperidis S, Slavcheva M, Vertan C (eds) Proceedings of the LREC workshop on exploitation of multilingual resources and tools for central and (South) Eastern European languages, Valetta, pp 31–40

    Google Scholar 

  • McCarthy D, Keller B, Carroll J (2003) Detecting a continuum of compositionality in phrasal verbs. In: Bond F, Korhonen A, McCarthy D, Villavicencio A (eds) Proceedings of the ACL workshop on multiword expressions: analysis, acquisition and treatment (MWE 2003), Sapporo. Association for Computational Linguistics, pp 73–80. doi:10.3115/1119282.1119292, http://www.aclweb.org/anthology/W03-1810

  • Nakov P (2008) Paraphrasing verbs for noun compound interpretation. In: Grégoire N, Evert S, Krenn B (eds) Proceedings of the LREC workshop towards a shared task for multiword expressions (MWE 2008), Marrakech, pp 46–49

    Google Scholar 

  • Nicholson J, Baldwin T (2008) Interpreting compound nominalisations. In: Grégoire N, Evert S, Krenn B (eds) Proceedings of the LREC workshop towards a shared task for multiword expressions (MWE 2008), Marrakech, pp 43–45

    Google Scholar 

  • Pal S, Naskar SK, Pecina P, Bandyopadhyay S, Way A (2010) Handling named entities and compound verbs in phrase-based statistical machine translation. In: Laporte É, Nakov P, Ramisch C, Villavicencio A (eds) Proceedings of the COLING workshop on multiword expressions: from theory to applications (MWE 2010), Beijing. Association for Computational Linguistics, pp 45–53

    Google Scholar 

  • Pearce D (2002) A comparative evaluation of collocation extraction techniques. In: Proceedings of the third international conference on language resources and evaluation (LREC 2002), Las Palmas. European Language Resources Association, pp 1530–1536

    Google Scholar 

  • Pecina P (2005) An extensive empirical study of collocation extraction methods. In: Proceedings of the ACL 2005 student research workshop, Ann Arbor. Association for Computational Linguistics, pp 13–18. http://www.aclweb.org/anthology/P/P05/P05-2003

  • Pedersen T (1996) Fishing for exactness. In: Proceedings of the south-central SAS users group conference (SCSUG-96), Austin, pp 188–200

    Google Scholar 

  • Ramisch C (2009) Multiword terminology extraction for domain-specific documents. Master’s thesis, École Nationale Supérieure d’Informatique et de Mathématiques Appliquées, Grenoble, 79p

    Google Scholar 

  • Ramisch C (2012) A generic framework for multiword expressions treatment: from acquisition to applications. In: Proceedings of the ACL 2012 student research workshop, Jeju. Association for Computational Linguistics, pp 61–66. http://www.aclweb.org/anthology/W12-3311

  • Ramisch C, Schreiner P, Idiart M, Villavicencio A (2008a) An evaluation of methods for the extraction of multiword expressions. In: Grégoire N, Evert S, Krenn B (eds) Proceedings of the LREC workshop towards a shared task for multiword expressions (MWE 2008), Marrakech, pp 50–53

    Google Scholar 

  • Ramisch C, Villavicencio A, Moura L, Idiart M (2008b) Picking them up and figuring them out: verb-particle constructions, noise and idiomaticity. In: Clark A, Toutanova K (eds) Proceedings of the twelfth conference on natural language learning (CoNLL 2008), The Coling 2008 Organizing Committee, Manchester, pp 49–56. http://www.aclweb.org/anthology/W08-2107

  • Ramisch C, Villavicencio A, Boitet C (2010) Web-based and combined language models: a case study on noun compound identification. In: Huang CR, Jurafsky D (eds) Proceedings of the 23rd international conference on computational linguistics (COLING 2010)—Posters, The Coling 2010 Organizing Committee, Beijing, pp 1041–1049. http://www.aclweb.org/anthology/C10-2120

  • Schone P, Jurafsky D (2001) Is knowledge-free induction of multiword unit dictionary headwords a solved problem? In: Lee L, Harman D (eds) Proceedings of the 2001 conference on empirical methods in natural language processing (EMNLP 2001), Pittsburgh. Association for Computational Linguistics, pp 100–108

    Google Scholar 

  • Seretan V (2008) Collocation extraction based on syntactic parsing. PhD thesis, University of Geneva, Geneva, 249p

    Google Scholar 

  • Venkatsubramanyan S, Perez-Carballo J (2004) Multiword expression filtering for building knowledge. In: Tanaka T, Villavicencio A, Bond F, Korhonen A (eds) Proceedings of the ACL workshop on multiword expressions: integrating processing (MWE 2004), Barcelona. Association for Computational Linguistics, pp 40–47

    Google Scholar 

  • Villavicencio A, Bond F, Korhonen A, McCarthy D (2005) Introduction to the special issue on multiword expressions: having a crack at a hard nut. Comput Speech Lang Spec Issue MWEs 19(4):365–377

    Article  Google Scholar 

  • Villavicencio A, Idiart M, Ramisch C, Araujo VD, Yankama B, Berwick R (2012) Get out but don’t fall down: verb-particle constructions in child language. In: Berwick R, Korhonen A, Poibeau T, Villavicencio A (eds) Proceedings of the EACL 2012 workshop on computational models of language acquisition and loss, Avignon. Association for Computational Linguistics, pp 43–50

    Google Scholar 

  • Xu Y, Goebel R, Ringlstetter C, Kondrak G (2010) Application of the tightness continuum measure to Chinese information retrieval. In: Laporte É, Nakov P, Ramisch C, Villavicencio A (eds) Proceedings of the COLING workshop on multiword expressions: from theory to applications (MWE 2010), Beijing. Association for Computational Linguistics, pp 54–62

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Ramisch, C. (2015). Evaluation of MWE Acquisition. In: Multiword Expressions Acquisition. Theory and Applications of Natural Language Processing. Springer, Cham. https://doi.org/10.1007/978-3-319-09207-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09207-2_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09206-5

  • Online ISBN: 978-3-319-09207-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics