Abstract
Recently the focus on temporal information in NLP applications has increased. Based on general temporal theories, annotations and standards, the paper presents the steps performed towards obtaining a parallel English-Romanian corpus, with the temporal information marked in both languages. The automatic import from English to Romanian of the TimeML markup has a success rate of 96.53%. The paper analyzes the main situations that appeared during the automatic import: perfect or impossible transfer, transfer with amendments or for the language specific phenomena. This corpus study permits to decide how import techniques can be used on the temporal domain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Allen, J.F.: Towards a General Theory of Action and Time. Artificial Intelligence 23, 123–154 (1984)
Armstrong, S.: Multext: Multilingual Text Tools and Corpora. Lexikon und Text, 107–119 (1996)
Brants, T.: TnT – a statistical part-of-speech tagger. In: Proceedings of the 6th Applied NLP Conference, ANLP-2000, Seattle, WA, pp. 224–231 (2000)
Boguraev, B., Ando, R.: Analysis of TimeBank as a Resource for TimeML Parsing. In: Proceedings of LREC 2006, Genoa, Italy, pp. 71–76 (2006)
Ceauşu, A.: Integrated platform for Statistical Machine Translation system development (MTkit). Microsoft Imagine Cup (2005)
Cristea, D., Ide, N., Romary, L.: Veins Theory. An Approach to Global Cohesion and Coherence. In: Proceedings of COLING/ACL- 1998, Montreal, Canada, pp. 281–285 (1998)
Ferro, L., Gerber, L., Mani, I., Sundheim, B., Wilson, G.: TIDES 2005 Standard for the Annotation of Temporal Expressions (2005)
Forăscu, C., Pistol, I., Cristea, D.: Temporality in Relation with Discourse Structure. In: Proceedings of LREC-2006, Genoa, Italy, pp. 65–70 (2006) ISBN 2-9517408-2-4
Forăscu, C., Solomon, D.: Towards a Time Tagger for Romanian. In: Proceedings of the ESSLLI Student Session, Nancy, France (2004)
Hobbs, J.: Toward an Ontology for Time for the Semantic Web. In: Proceedings of the LREC 2002 Workshop Annotation Standards for Temporal Information in Natural Language, Las Palmas, Spain, pp. 28–35 (2002)
Hobbs, J., Pustejovsky, J.: Annotating and Reasoning about Time and Events. In: Proceedings of the AAAI Spring Symposium on Logical Formalizations of Commonsense Reasoning, Stanford, California (2003)
Ide, N., Bonhomme, P., Romary, L.: XCES: An XML-based Encoding Standard for Linguistic Corpora. In: Proceedings of the Second International Language Resources and Evaluation Conference, pp. 825–830 (2000)
Ion, R.: Word Sense Disambiguation Methods Applied to English and Romanian. (in Romanian) PhD thesis. Romanian Academy, Bucharest (2007)
Katz, G., Arosio, F.: The Annotation of Temporal Information in Natural Language Sentences. In: Proceedings of the ACL-2001 Workshop on Temporal and Spatial Information Processing, ACL-2001, Toulose, France, pp. 104–111 (2001)
Mani, I., Pustejovsky, J., Gaizauskas, R. (eds.): The Language of Time: A Reader. Oxford University Press, Oxford (2005)
Mann, W.C., Thompson, S.A.: Rhetorical structure theory: Description and construction of texts structures. In: Kempen, G. (ed.) Natural Language Generation, pp. 85–96. Martinus Nijhoff Publisher, Dordrecht (1987)
Martin, J., Mihalcea, R., Pedersen, T.: Word Alignment for Languages with Scarce Resources. In: Proceeding of the ACL2005 Workshop on Building and Using Parallel Corpora: Datadriven Machine Translation and Beyond. Ann Arbor, Michigan, pp. 65–74 (2005)
Pustejovsky, J., Belanger, L., Castaño, J., Gaizauskas, R., Hanks, P., Ingria, B., Katz, G., Radev, D., Rumshisky, A., Sanfilippo, A., Sauri, R., Setzer, A., Sundheim, B., Verhagen, M.: NRRC Summer Workshop on Temporal and Event Recognition for QA Systems (2002)
Pustejovsky, J., Verhagen, M., Sauri, R., Littman, J., Gaizauskas, R., Katz, G., Mani, I., Knippen, B., Setzer, A.: TimeBank 1.2. Linguistic Data Consortium (2006)
Reichenbach., H.: The tenses of verbs. In: Reichenbach, H. (ed.) Elements of Symbolic Logic, Section 51, pp. 287–298. Macmillan, New York (1947)
Sauri, R., Littman, J., Knippen, B., Gaizauskas, R., Setzer, A., Pustejovsky, J.: TimeML Annotation Guidelines, Version 1.2.1 (2006)
Setzer, A.: Temporal Information in Newswire Articles: an Annotation Scheme and Corpus Study. PhD dissertation. University of Sheffield (2001)
Tufiş, D., Ion, R., Ceauşu, A., Ştefănescu, D.: Combined Aligners. In: Proceedings of the ACL 2005 Workshop on Building and Using Parallel Corpora: Data-driven Machine Translation and Beyond, Ann Arbor, Michigan pp. 107–110 (2005)
Tufiş, D., Ion, R., Ceauşu, A., Ştefănescu, D.: Improved Lexical Alignment by Combining Multiple Reified Alignments. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006) Trento, Italy pp. 153–160 (2006)
Tufiş, D., Barbu, A.M.: Revealing translators knowledge: statistical methods in constructing practical translation lexicons for language and speech processing. International Journal of Speech Technology (5), 199–209 (2002)
Verhagen, M., Mani, I., Sauri, R., Littman, J., Knippen, R., Bae Jang, S., Rumshisky, A., Phillips, J., Pustejovsky, J.: Automating Temporal Annotation with TARSQI. In: Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, Michigan, pp. 81–84 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Forăscu, C. (2008). Why Don’t Romanians Have a Five O’clock Tea, Nor Halloween, But Have a Kind of Valentines Day?. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2008. Lecture Notes in Computer Science, vol 4919. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78135-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-78135-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78134-9
Online ISBN: 978-3-540-78135-6
eBook Packages: Computer ScienceComputer Science (R0)