Advertisement

A New Framework for MWE Acquisition

  • Carlos Ramisch
Chapter
  • 830 Downloads
Part of the Theory and Applications of Natural Language Processing book series (NLP)

Abstract

In the previous chapters, we motivated the importance of MWEs for NLP applications and provided a bibliographic review of past and present research in the area. We are now ready to present our new methodological framework for MWE acquisition. This framework was motivated by the absence of one covering all the steps of MWE acquisition in a systematic and integrated way. Thus, we have developed a methodology in which the process of MWE acquisition is divided into several independent modules that can be chained together in several ways. Each module solves a specific task using multiple and complementary techniques.

Keywords

Mean Average Precision Verbal Expression Association Measure Suffix Array Lexical Resource 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Baldwin T (2005) Deep lexical acquisition of verb-particle constructions. Comput Speech Lang Spec Issue MWEs 19(4):398–414CrossRefGoogle Scholar
  2. Baldwin T (2008) A resource for evaluating the deep lexical acquisition of English verb-particle constructions. In: Grégoire N, Evert S, Krenn B (eds) Proceedings of the LREC workshop towards a shared task for multiword expressions (MWE 2008), Marrakech, pp 1–2Google Scholar
  3. Banerjee S, Pedersen T (2003) The design, implementation, and use of the Ngram statistic package. In: Proceedings of the fourth international conference on intelligent text processing and computational linguistics, Mexico City, pp 370–381Google Scholar
  4. Bonin F, Dell’Orletta F, Montemagni S, Venturi G (2010a) A contrastive approach to multi-word extraction from domain-specific corpora. In:Proceedings of the seventh international conference on language resources and evaluation (LREC 2010), Valetta. European Language Resources AssociationGoogle Scholar
  5. Bonin F, Dell’Orletta F, Venturi G, Montemagni S (2010b) Contrastive filtering of domain-specific multi-word terms from different types of corpora. In: Laporte É, Nakov P, Ramisch C, Villavicencio A (eds) Proceedings of the COLING workshop on multiword expressions: from theory to applications (MWE 2010), Beijing. Association for Computational Linguistics, pp 76–79Google Scholar
  6. Christ O (1994) A modular and flexible architecture for an integrated corpus query system. In: COMPLEX 1994, Budapest, pp 23–32Google Scholar
  7. Duran MS, Ramisch C, Aluísio SM, Villavicencio A (2011) Identifying and analyzing Brazilian Portuguese complex predicates. In: Kordoni V, Ramisch C, Villavicencio A (eds) Proceedings of the ALC workshop on multiword expressions: from parsing and generation to the real world (MWE 2011), Portland. Association for Computational Linguistics, pp 74–82. http://www.aclweb.org/anthology/W/W11/W11-0812
  8. Evert S (2004) The statistics of word cooccurrences: word pairs and collocations. PhD thesis, Institut für maschinelle Sprachverarbeitung, University of Stuttgart, Stuttgart, 353pGoogle Scholar
  9. Evert S, Krenn B (2005) Using small random samples for the manual evaluation of statistical association measures. Comput Speech Lang Spec Issue MWEs 19(4):450–466CrossRefGoogle Scholar
  10. Justeson JS, Katz SM (1995) Technical terminology: some linguistic properties and an algorithm for identification in text. Nat Lang Eng 1(1):9–27CrossRefGoogle Scholar
  11. Kilgarriff A (2007) Googleology is bad science. Comput Linguist 33(1):147–151CrossRefGoogle Scholar
  12. Kim JD, Ohta T, Teteisi Y, Tsujii J (2006) GENIA ontology. Technical report, Tsujii Laboratory, University of TokyoGoogle Scholar
  13. Kim SN, Baldwin T (2008) Standardised evaluation of English noun compound interpretation. In: Grégoire N, Evert S, Krenn B (eds) Proceedings of the LREC workshop towards a shared task for multiword expressions (MWE 2008), Marrakech, pp 39–42Google Scholar
  14. Mangeot M, Chalvin A (2006) Dictionary building with the jibiki platform: the GDEF case. In: Proceedings of the sixth international conference on language resources and evaluation (LREC 2006),Genoa. European Language Resources Association, pp 1666–1669Google Scholar
  15. Ohta T, Tateishi Y, Kim JD (2002) The GENIA corpus: an annotated research abstract corpus in molecular biology domain. In: Proceedings of the second human language technology conference (HLT 2002), San Diego. Morgan Kaufmann, pp 82–86Google Scholar
  16. Pearce D (2001) Synonymy in collocation extraction. In: WordNet and other lexical resources: applications, extensions and customizations (NAACL 2001 Workshop), Pittsburgh, pp 41–46Google Scholar
  17. Pecina P (2008) Reference data for Czech collocation extraction. In: Grégoire N, Evert S, Krenn B (eds) Proceedings of the LREC workshop towards a shared task for multiword expressions (MWE 2008), Marrakech, pp 11–14Google Scholar
  18. Pecina P (2010) Lexical association measures and collocation extraction. Lang Resour Eval (Spec issue Multiword Expr: Hard Going Plain Sail) 44(1–2):137–158. doi:10.1007/s10579-009-9101-4. http://www.springerlink.com/content/DRH83N312U658331
  19. Ramisch C (2009) Multiword terminology extraction for domain-specific documents. Master’s thesis, École Nationale Supérieure d’Informatique et de Mathématiques Appliquées, Grenoble, 79pGoogle Scholar
  20. Ramisch C, Schreiner P, Idiart M, Villavicencio A (2008a) An evaluation of methods for the extraction of multiword expressions. In: Grégoire N, Evert S, Krenn B (eds) Proceedings of the LREC workshop towards a shared task for multiword expressions (MWE 2008), Marrakech, pp 50–53Google Scholar
  21. Ramisch C, Villavicencio A, Moura L, Idiart M (2008b) Picking them up and figuring them out: verb-particle constructions, noise and idiomaticity. In: Clark A, Toutanova K (eds) Proceedings of the twelfth conference on natural language learning (CoNLL 2008), Manchester. The Coling 2008 Organizing Committee, pp 49–56. http://www.aclweb.org/anthology/W08-2107
  22. Ramisch C, de Medeiros Caseli H, Villavicencio A, Machado A, Finatto MJ (2010a) A hybrid approach for multiword expression identification. In: Proceedings of the 9th international conference on computational processing of Portuguese language (PROPOR 2010), Porto Alegre. Lecture notes in computer science (Lecture notes in artificail intelligence), vol 6001. Springer, pp 65–74. doi:10.1007/978-3-642-12320-7_9. http://www.springerlink.com/content/978-3-642-12319-1
  23. Ramisch C, Villavicencio A, Boitet C (2010b) Multiword expressions in the wild? The mwetoolkit comes in handy. In: Liu Y, Liu T (eds) Proceedings of the 23rd international conference on computational linguistics (COLING 2010)—Demonstrations, Beijing. The Coling 2010 Organizing Committee, pp 57–60. http://www.aclweb.org/anthology/C10-3015
  24. Ramisch C, Villavicencio A, Boitet C (2010c) mwetoolkit: a framework for multiword expression identification. In: Proceedings of the seventh international conference on language resources and evaluation (LREC 2010), Valetta. European Language Resources Association, pp 662–669Google Scholar
  25. Ramisch C, Villavicencio A, Boitet C (2010d) Web-based and combined language models: a case study on noun compound identification. In: Huang CR, Jurafsky D (eds) Proceedings of the 23rd international conference on computational linguistics (COLING 2010)—Posters, Beijing. The Coling 2010 Organizing Committee, pp 1041–1049. http://www.aclweb.org/anthology/C10-2120
  26. Rychlý P, Smrz P (2004) Manatee, bonito and word sketches for Czech. In: Proceedings of the second international conference on corpus linguisitcs, Saint-Petersburg, pp 124–131. http://www.fit.vutbr.cz/research/view_pub.php?id=7700
  27. Seretan V (2008) Collocation extraction based on syntactic parsing. PhD thesis, University of Geneva, Geneva, 249pGoogle Scholar
  28. Villavicencio A, Kordoni V, Zhang Y, Idiart M, Ramisch C (2007) Validation and evaluation of automatically acquired multiword expressions for grammar engineering. In: Eisner J (ed) Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL 2007), Prague. Association for Computational Linguistics, pp 1034–1043. http://www.aclweb.org/anthology/D/D07/D07-1110
  29. Zhang Y, Kordoni V, Villavicencio A, Idiart M (2006) Automated multiword expression prediction for grammar engineering. In: Moirón BV, Villavicencio A, McCarthy D, Evert S, Stevenson S (eds) Proceedings of the COLING/ACL workshop on multiword expressions: identifying and exploiting underlying properties (MWE 2006), Sidney. Association for Computational Linguistics, pp 36–44. http://www.aclweb.org/anthology/W/W06/W06-1206

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Carlos Ramisch
    • 1
  1. 1.Aix Marseille UniversityMarseilleFrance

Personalised recommendations