Skip to main content

Interlingual Annotation for MT Development

  • Conference paper
Book cover Machine Translation: From Real Users to Research (AMTA 2004)

Abstract

MT systems that use only superficial representations, including the current generation of statistical MT systems, have been successful and useful. However, they will experience a plateau in quality, much like other “silver bullet” approaches to MT. We pursue work on the development of interlingual representations for use in symbolic or hybrid MT systems. In this paper, we describe the creation of an interlingua and the development of a corpus of semantically annotated text, to be validated in six languages and evaluated in several ways. We have established a distributed, well-functioning research methodology, designed a preliminary interlingua notation, created annotation manuals and tools, developed a test collection in six languages with associated English translations, annotated some 150 translations, and designed and applied various annotation metrics. We describe the data sets being annotated and the interlingual (IL) representation language which uses two ontologies and a systematic theta-role list. We present the annotation tools built and outline the annotation process. Following this, we describe our evaluation methodology and conclude with a summary of issues that have arisen.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bateman, J.A., Kasper, R.T., Moore, J.D., Whitney, R.A.: A General Organization of Knowledge for Natural Language Processing: The Penman Upper Model. Unpublished research report, USC/Information Sciences Institute, Marina del Rey, CA (1989)

    Google Scholar 

  2. Carletta, J.C.: Assessing agreement on classification tasks: the kappa statistic. Computational Linguistics 22(2), 249–254 (1996)

    Google Scholar 

  3. Dorr, B.: Machine Translation: A View from the Lexicon. MIT Press, Cambridge, MA (1993)

    Google Scholar 

  4. Dorr, B.: LCS Verb Database, Online Software Database of Lexical Conceptual Structures and Documentation, University of Maryland (2001), http://www.umiacs.umd.edu/~bonnie/LCS_Database_Documentation.html

  5. Farwell, D., Helmreich, S., Dorr, B., Habash, N., Reeder, F., Miller, K., Levin, L., Mitamura, T., Hovy, E., Rambow, O., Siddharthan, A.: Interlingual Annotation of Multilingual Text Corpora. In: Proceedings of Workshop on Frontiers in Corpus Annotation. NAACL/HLT (2004)

    Google Scholar 

  6. Fellbaum, C. (ed.): WordNet: An On-line Lexical Database and Some of its Applications. MIT Press, Cambridge (1998)

    Google Scholar 

  7. Habash, N.: Matador: A Large Scale Spanish-English GHMT System. In: Proceedings of the MT Summit, New Orleans, LA (2003)

    Google Scholar 

  8. Habash, N., Dorr, B., Traum, D.: Efficient Language Independent Generation from Lexical Conceptual Structures.Machine Translation 17(4) (2002)

    Google Scholar 

  9. Haji, J., Vidová-Hladká, B., Pajas, P.: The Prague Dependency Treebank: Annotation Structure and Support. In: Proceeding of the IRCS Workshop on Linguistic Databases, University of Pennsylvania, Philadelphia, USA, pp. 105-114 (2001)

    Google Scholar 

  10. Hirst, G.: Paraphrasing paraphrased. Invited talk at Second International Workshop on Paraphrasing, 41st Annual Meeting of the ACL, Sapporo, Japan (2003)

    Google Scholar 

  11. Knight, K., Langkilde, I.: Preserving Ambiguities in Generation via Automata Intersection. American Association for Artificial Intelligence conference AAAI (2000)

    Google Scholar 

  12. Knight, K., Luk, S.K.: Building a Large-Scale Knowledge Base for Machine Translation. In:Proceedings of AAAI, Seattle, WA (1994)

    Google Scholar 

  13. Kozlowski, R., McCoy, K., Vijay-Shanker, K.: Generation of Single-Sentence Paraphrases from Predicate/argument Structure using Lexico-grammatical Resources. In:Second International Workshop on Paraphrasing, 41st ACL, Sapporo, Japan (2003)

    Google Scholar 

  14. Mahesh, K., Nirenberg, S.: A Situated Ontology for Practical NLP. Proc. of Workshop on Basic Ontological Issues in Knowledge Sharing at IJCAI 1995, Montreal, Canada (1995)

    Google Scholar 

  15. Mitamura, T., Nyberg, E., Carbonell, J.: An Efficient Interlingua Translation System for Multilingual Document Production. In:Proc. of 3rd MT Summit. Washington, DC (1991)

    Google Scholar 

  16. Philpot, A., Fleischman, M., Hovy, E.H.: Semi-Automatic Construction of a General Purpose Ontology.In: Proc. of the International Lisp Conference. New York, NY (2003) (invited)

    Google Scholar 

  17. Rinaldi, F., Dowdall, J., Kaljurand, K., Hess, M., Molla, D.: Exploiting Paraphrases in a Question Answering System.In: 2nd International Workshop on Paraphrasing, 41st ACL (2003)

    Google Scholar 

  18. Tapanainen, P.: T Jarvinen, A non-projective dependency parser. In: the 5th Conference on Applied Natural Language Processing, Washington, DC (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Reeder, F. et al. (2004). Interlingual Annotation for MT Development. In: Frederking, R.E., Taylor, K.B. (eds) Machine Translation: From Real Users to Research. AMTA 2004. Lecture Notes in Computer Science(), vol 3265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30194-3_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30194-3_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23300-8

  • Online ISBN: 978-3-540-30194-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics