Abstract
Multi-word terms pose many challenges in Natural Language Processing (NLP) because of their structure ambiguity. Although the structural disambiguation of multi-word expressions, also known as bracketing, has been widely studied, no definitive solution has as yet been found. Although linguists, terminologists, and translators must deal with bracketing problems, they generally must resolve problems without using advanced NLP systems. This paper describes a series of manual steps for the bracketing of multi-word terms (MWTs) based on their linguistic properties and recent advances in NLP. After analyzing 100 three- and four-term combinations, a set of criteria for MWT bracketing was devised and arranged in a step-by-step protocol based on frequency and reliability. Also presented is a case study that illustrates the procedure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lauer, M.: Designing statistical language learners: experiments on noun compounds. Ph.D. Macquarie University, Australia (1995)
Girju, R., Moldovan, D., Tatu, M., Antohe, D.: On the semantics of noun compounds. Comput. Speech Lang. 19(4), 479–496 (2005)
Nakov, P.: On the interpretation of noun compounds: syntax, semantics, and entailment. Nat. Lang. Eng. 19(03), 291–330 (2013)
Kim, S.N., Baldwin, T.: A lexical semantic approach to interpreting and bracketing English noun compounds. Nat. Lang. Eng. 19(3), 385–407 (2013)
Barrière, C., Ménard, P.A.: Multiword noun compound bracketing using Wikipedia. In: Proceedings of the First Workshop on Computational Approaches to Compound Analysis, Dublin, Ireland, pp. 72–80 (2014)
Marsh, E.: A computational analysis of complex noun phrases in navy messages. In: Proceedings of the 10th International Conference on Computational Linguistics, Standford, CA, pp. 505–508 (1984)
Nakov, P., Hearst, M.: Search engine statistics beyond the n-gram: application to noun compound bracketing. In: Proceedings of the Ninth Conference on Computational Natural Language Learning, CoNLL 2005, Ann Arbor, MI, pp. 17–24 (2005)
Utsumi, A.: A semantic space approach to the computational semantics of noun compounds. Nat. Lang. Eng. 20, 185–234 (2014)
Johnston, M., Busa, F.: Qualia structure and the compositional interpretation of compounds. In: Viegas, E. (ed.) Breadth and Depth of Semantic Lexicons, pp. 167–187. Springer, Dordrecht (1999). https://doi.org/10.1007/978-94-017-0952-1_9
Cabezas-García, M., Faber, P.: A semantic approach to the inclusion of complex nominals in english terminographic resources. In: Mitkov, R. (ed.) EUROPHRAS 2017. LNCS (LNAI), vol. 10596, pp. 145–159. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69805-2_11
Marcus, M.: A Theory of Syntactic Recognition for Natural Language. MIT Press, Cambridge (1980)
Pustejovsky, J., Anick, P., Bergler, S.: Lexical semantic techniques for corpus analysis. Comput. Linguist. 19(2), 331–358 (1993)
Grefenstette, G.: Explorations in Automatic Thesaurus Discovery. Kluwer Academic Press, Boston (1994)
Kilgarriff, A., et al.: The sketch engine: ten years on. Lexicography 1(1), 7–36 (2014)
Levi, J.: The Syntax and Semantics of Complex Nominals. Academic Press, New York (1978)
Sager, J.C., Dungworth, D., McDonald, P.F.: English Special Languages. Principles and Practice in Science and Technology. Brandstetter Verlag, Wiesbaden (1980)
Faber, P.: A Cognitive Linguistics View of Terminology and Specialized Language. De Gruyter Mouton, Berlin/Boston (2012)
San Martín, A., Cabezas-García, M., Buendía, M., Sánchez-Cárdenas, B., León-Araúz, P., Faber, P.: Recent advances in EcoLexicon. Dictionaries: J. Dictionary Soc. North Am. 38(1), 96–115 (2017)
Acknowledgements
This research was carried out as part of project FFI2017-89127-P, Translation-Oriented Terminology Tools for Environmental Texts (TOTEM), funded by the Spanish Ministry of Economy and Competitiveness. Funding was also provided by an FPU grant given by the Spanish Ministry of Education to the first author.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Cabezas-García, M., León-Araúz, P. (2019). On the Structural Disambiguation of Multi-word Terms. In: Corpas Pastor, G., Mitkov, R. (eds) Computational and Corpus-Based Phraseology. EUROPHRAS 2019. Lecture Notes in Computer Science(), vol 11755. Springer, Cham. https://doi.org/10.1007/978-3-030-30135-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-30135-4_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30134-7
Online ISBN: 978-3-030-30135-4
eBook Packages: Computer ScienceComputer Science (R0)