Abstract
MT systems that use only superficial representations, including the current generation of statistical MT systems, have been successful and useful. However, they will experience a plateau in quality, much like other “silver bullet” approaches to MT. We pursue work on the development of interlingual representations for use in symbolic or hybrid MT systems. In this paper, we describe the creation of an interlingua and the development of a corpus of semantically annotated text, to be validated in six languages and evaluated in several ways. We have established a distributed, well-functioning research methodology, designed a preliminary interlingua notation, created annotation manuals and tools, developed a test collection in six languages with associated English translations, annotated some 150 translations, and designed and applied various annotation metrics. We describe the data sets being annotated and the interlingual (IL) representation language which uses two ontologies and a systematic theta-role list. We present the annotation tools built and outline the annotation process. Following this, we describe our evaluation methodology and conclude with a summary of issues that have arisen.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bateman, J.A., Kasper, R.T., Moore, J.D., Whitney, R.A.: A General Organization of Knowledge for Natural Language Processing: The Penman Upper Model. Unpublished research report, USC/Information Sciences Institute, Marina del Rey, CA (1989)
Carletta, J.C.: Assessing agreement on classification tasks: the kappa statistic. Computational Linguistics 22(2), 249–254 (1996)
Dorr, B.: Machine Translation: A View from the Lexicon. MIT Press, Cambridge, MA (1993)
Dorr, B.: LCS Verb Database, Online Software Database of Lexical Conceptual Structures and Documentation, University of Maryland (2001), http://www.umiacs.umd.edu/~bonnie/LCS_Database_Documentation.html
Farwell, D., Helmreich, S., Dorr, B., Habash, N., Reeder, F., Miller, K., Levin, L., Mitamura, T., Hovy, E., Rambow, O., Siddharthan, A.: Interlingual Annotation of Multilingual Text Corpora. In: Proceedings of Workshop on Frontiers in Corpus Annotation. NAACL/HLT (2004)
Fellbaum, C. (ed.): WordNet: An On-line Lexical Database and Some of its Applications. MIT Press, Cambridge (1998)
Habash, N.: Matador: A Large Scale Spanish-English GHMT System. In: Proceedings of the MT Summit, New Orleans, LA (2003)
Habash, N., Dorr, B., Traum, D.: Efficient Language Independent Generation from Lexical Conceptual Structures.Machine Translation 17(4) (2002)
Haji, J., Vidová-Hladká, B., Pajas, P.: The Prague Dependency Treebank: Annotation Structure and Support. In: Proceeding of the IRCS Workshop on Linguistic Databases, University of Pennsylvania, Philadelphia, USA, pp. 105-114 (2001)
Hirst, G.: Paraphrasing paraphrased. Invited talk at Second International Workshop on Paraphrasing, 41st Annual Meeting of the ACL, Sapporo, Japan (2003)
Knight, K., Langkilde, I.: Preserving Ambiguities in Generation via Automata Intersection. American Association for Artificial Intelligence conference AAAI (2000)
Knight, K., Luk, S.K.: Building a Large-Scale Knowledge Base for Machine Translation. In:Proceedings of AAAI, Seattle, WA (1994)
Kozlowski, R., McCoy, K., Vijay-Shanker, K.: Generation of Single-Sentence Paraphrases from Predicate/argument Structure using Lexico-grammatical Resources. In:Second International Workshop on Paraphrasing, 41st ACL, Sapporo, Japan (2003)
Mahesh, K., Nirenberg, S.: A Situated Ontology for Practical NLP. Proc. of Workshop on Basic Ontological Issues in Knowledge Sharing at IJCAI 1995, Montreal, Canada (1995)
Mitamura, T., Nyberg, E., Carbonell, J.: An Efficient Interlingua Translation System for Multilingual Document Production. In:Proc. of 3rd MT Summit. Washington, DC (1991)
Philpot, A., Fleischman, M., Hovy, E.H.: Semi-Automatic Construction of a General Purpose Ontology.In: Proc. of the International Lisp Conference. New York, NY (2003) (invited)
Rinaldi, F., Dowdall, J., Kaljurand, K., Hess, M., Molla, D.: Exploiting Paraphrases in a Question Answering System.In: 2nd International Workshop on Paraphrasing, 41st ACL (2003)
Tapanainen, P.: T Jarvinen, A non-projective dependency parser. In: the 5th Conference on Applied Natural Language Processing, Washington, DC (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Reeder, F. et al. (2004). Interlingual Annotation for MT Development. In: Frederking, R.E., Taylor, K.B. (eds) Machine Translation: From Real Users to Research. AMTA 2004. Lecture Notes in Computer Science(), vol 3265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30194-3_26
Download citation
DOI: https://doi.org/10.1007/978-3-540-30194-3_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23300-8
Online ISBN: 978-3-540-30194-3
eBook Packages: Springer Book Archive