Abstract
This chapter presents the language specific adaptation of the TimeML annotation scheme to Italian and the creation of the Ita-TimeBank, a language resource composed of two corpora manually annotated with temporal and event information. Particular attention is given to the methodology followed in the development of the corpora: the annotation guidelines document the actual choices done during the annotation and address language specific issues while maintaining adherence to the specifications. The annotation guidelines are supplied with decision tree like instructions and tests grounded in linguistic analysis but theory independent. The results obtained show the reliability of the adaptation of the annotation specifications to Italian and of the methodology used for the creation of the resources.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
TempEval 2007 [45]: http://www.timeml.org/tempeval/; TempEval 2010 [46]: http://www.timeml.org/tempeval2/; TempEval 2013 [42]: http://www.cs.york.ac.uk/semeval-2013/task1/.
- 2.
- 3.
Using the full set of grammatical tense forms and viewpoints, the table would contain 64 combinations.
- 4.
firmato [signed] AFTER chiesto [asked].
- 5.
The corpora have been named after the research institutes where they have been initially developed, the “Center for the Evaluation of Language and Communication Technologies” (CELCT), and “Istituto di Linguistica Computazionale “A. Zampolli” - CNR Pisa” (ILC), respectively.
- 6.
- 7.
- 8.
Please note that in the CELCT Corpus the number of annotated temporal expressions is calculated on a total of 180,000 tokens (i.e. 525 files), while the number of events, signals and links is calculated on more than 90,000 tokens (i.e. 283 files).
- 9.
The DTD document is available at https://sites.google.com/site/ittimeml/documents.
- 10.
- 11.
The CELCT score is computed on the basis of the Dice coefficient. We did not report it here as it is not directly comparable with the kappa score.
- 12.
References
AA.VV.: Funzioni delle frasi subordinative. In: L. Renzi, G. Salvi, A. Cardinaletti (eds.) Grande Grammatica Italiana di Consultazione. I sintagmi verbale, aggettivale e avverbiale. La Subordinazione, vol. II, pp. 633–853. Il Mulino (2001)
Angeli, G., Uszkoreit, J.: Language-independent discriminative parsing of temporal expressions. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 83–92. Association for Computational Linguistics, Sofia (2013)
Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34(4), 555–596 (2008)
Bach, E.: The algebra of events. Linguist. Philos. 9, 5–16 (1986)
Bartalesi Lenzi, V., Moretti, G., Sprugnoli, R.: CAT: the CELCT annotation tool. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation, pp. 333–338 (2012)
Bertinetto, P.: Tempo, Aspetto e Azione nel verbo Italiano. Il sistema dell’indicativo. Accademia della Crusca, Firenze (1986)
Bertinetto, P.: Le strutture tempo-aspettuali dell’italiano e dell’inglese a confronto. In: Mocciaro, A.G., Soravia, G. (eds.) L’Europa linguistica: contatti, contrasti, e affinità di lingue, pp. 49–68. SLI, Atti XXI Congresso Internazionale di Studi, Bulzoni (1992)
Bertinetto, P.: Il verbo. In: Renzi, L., Salvi, G., Cardinaletti, A. (eds.) Grande Grammatica Italiana di Consultazione. I sintagmi verbale, aggettivale e avverbiale. La Subordinazione, vol. II, pp. 13–162. Il Mulino (2001)
Bertinetto, P.M.: Sulle proprieta tempo-aspettuali dell’infinito in italiano. In: Atti del 35 Congresso Internazionale della Societa di Linguistica Italiana (2001)
Bittar, A.: Annotation of events and temporal expressions in French texts. In: Proceedings of the Joint conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, pp. 48–51 (2009)
Bittar, A., Amsili, P., Denis, P., Danlos, L.: French TimeBank: an ISO-TimeML annotated reference corpus. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 130–134. Association for Computational Linguistics, Portland (2011)
Caselli, T.: Time, events and temporal relations: an empirical model for temporal processing of Italian texts. Ph.D. thesis, Dept. of Linguistics, University of Pisa (2009)
Caselli, T., Lenzi, V.B., Sprugnoli, R., Pianta, E., Prodanof, I.: Annotating events, temporal expressions and relations in Italian: The It-Timeml experience for the Ita-TimeBank. In: Proceedings of the Fifth Linguistic Annotation Workshop, pp. 143–151 (2011)
Caselli, T., Llorens, H., Navarro-Colorado, B., Saquete, E.: Data-driven approach using semantics for recognizing and classifying TimeML events in Italian. In: Proceedings of the International Conference Recent Advances in Natural Language Processing 2011, pp. 533–538. RANLP 2011 Organising Committee, Hissar (2011)
Caselli, T., Sprugnoli, R.: It-TimeML - TimeML Annotation Guidelines for Italian, v. 1.4. Technical report, VU Amsterdam and Fondazione Bruno Kessler (2015)
Caselli, T., Sprugnoli, R., Speranza, M., Monachini, M.: Eventi. EValuation of events and temporal INformation at Evalita 2014. In: Bosco, C., DellOrletta, F., Montemagni, S., Simi, M. (eds.) Evaluation of Natural Language and Speech Tools for Italian, pp. 27–34. Pisa University Press, Pisa (2014)
Costa, F., Branco, A.: TimeBankPT: a TimeML annotated corpus of Portuguese. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation, pp. 3727–3734 (2012)
Eshaghzadeh Torbati, M., Ghassem-sani, G., Mirroshandel, S.A., Yaghoobzadeh, Y., Karimi Hosseini, N.: Temporal relation classification in Persian and english contexts. In: Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, pp. 261–269. INCOMA Ltd. Shoumen (2013)
Forascu, C.: Why don’t Romanians have a five O’clock tea, nor Halloween, but have a kind of valentines day? In: 9th International Computational Linguistics and Intelligent Text Processing Conference (CICLing 2008). LNCS, vol. 4919, pp. 73–84. Springer (2008)
Group, T.W.: TimeML Annotation Guidelines Version 1.3. Brandeis University, Boston (2008)
Im, S., You, H., Jang, H., Nam, S., Shin, H.: KTimeML: specification of temporal and event expressions in Korean text. In: Proceedings of the 7th Workshop on Asian Language Resources, pp. 115–122. Association for Computational Linguistics (2009)
ISO, S.W.G.: ISO DIS 24617–1: 2008 Language resource management - Semantic annotation framework - Part 1: Time and events. ISO Central Secretariat, Geneva (2008)
Linguistic Data Consortium.: ACE (Automatic Content Extraction) English Annotation Guidelines for Entities, Version 6.6 2008.06.13 (2008)
Magnini, B., Pianta, E., Girardi, C., Negri, M., Romano, L., Speranza, M., Bartalesi Lenzi, V., Sprugnoli, R.: I-CAB: The Italian content annotation bank. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation, pp. 963–968 (2006)
Manfedi, G., Strötgen, J., Zell, J., Gertz, M.: HeidelTime at EVENTI: tuning Italian resources and addressing TimeML empty tags. In: Proceedings of the 4th Evaluation Campaign of Natural Language Processing and Speech tools for Italian (EVALITA 2014), pp. 39–43. Pisa University Press, Pisa (2014)
Mani, I.: Chronoscopes: a theory of underspecified temporal representation. In: Schilder, F., Katz, G., Pustejovsky, J. (eds.) Annotating, Extracting and Reasoning about Time and Events. LNAI, pp. 127–139. Springer, Berlin (2007)
Marinelli, R., Biagini, L., Bindi, R., Goggi, S., Monachini, M., Orsolini, P., Picchi, E., Rossi, S., Calzolari, N., Zampolli, A.: The Italian PAROLE corpus: an overview. In: Computational Linguistics in Pisa, XVI-XVII, IEPI., I, pp. 401–421 (2003)
Miller, T.A., Bethard, S., Dligach, D., Pradhan, S., Lin, C., Savova, G.K.: Discovering temporal narrative containers in clinical text. In: Proceedings of the Workshop on Biomedical Natural Language Processing, pp. 18-26 (2013)
Mirza, P., Minard, A.L.: FBK-HLT-time: a complete Italian temporal processing system for EVENTI-EVALITA 2014. In: Proceedings of the 4th Evaluation Campaign of Natural Language Processing and Speech tools for Italian (EVALITA 2014), pp. 44–49. Pisa University Press, Pisa (2014)
Montemagni, S., Barsotti, F., Battista, M., Calzolari, N., Corazzar, O., Lenci, A., Pirelli, V., Zampolli, A., Fanciulli, F., Massetani, M., Raffaelli, R., Basili, R., Pazienza, M.T., Saracino, D., Zanzotto, F., Mana, N., Pianesi, F., Delmonte, R.: The syntactic-semantic treebank of Italian. An overview. In: Computational Linguistics in Pisa, special Issue XVIII-XIX, pp. 461–93 (2003)
Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., Webber, B.: The penn discourse TreeBank 2.0. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08). European Language Resources Association (ELRA), Marrakech (2008)
Pustejovsky, J., Castaño, J.M., Ingria, R., Saurí, R., Gaizauskas, R., Setzer, A., Katz, G.: TimeML: Robust specification of event and temporal expressions in text. In: Proceedings of the Fifth International Workshop on Computational Semantics (2003)
Pustejovsky, J., Littman, J., Saurí, R., Verhagen, M.: TimeBank 1.2 Documentation (2006)
Pustejovsky, J., Stubbs, A.: Increasing informativeness in temporal annotation. In: Proceedings of the fifth Linguistic Annotation Workshop, pp. 152–160. Association for Computational Linguistics (2011)
Pustejovsky, J., Stubbs, A.: Natural Language Annotation for Machine Learning. O’Reilly Media Inc., Sebastopol (2012)
Robaldo, L., Caselli, T., Grella, M.: Rule-based creation of timeml documents from dependency trees. In: Pirrone, R., Sorbello, F. (eds.) AI* IA 2011: Artificial Intelligence Around Man and Beyond, pp. 389–394. Springer, Heidelberg (2011)
Saurì, R.: Annotating Temporal Relations in Catalan and Spanish TimeML Annotation Guidelines (2010)
Saurı, R., Pustejovsky, J.: Annotating Events in Catalan - TimeML Annotation Guidelines (Version TempEval-2010) (2009)
Saurı, R., Pustejovsky, J.: Annotating Time Expressions in Catalan - TimeML Annotation Guidelines (Version TempEval-2010) (2010)
Smith, C.S.: The Parameter of Aspect. Kluwer Academic Publishers, Dordrecht (1997)
Stubbs, A.: MAE and MAI: lightweight annotation and adjudication tools. In: Proceedings of the fifth Linguistic Annotation Workshop, pp. 129–133. Association for Computational Linguistics (2011)
UzZaman, N., Llorens, H., Derczynski, L., Allen, J., Verhagen, M., Pustejovsky, J.: Semeval-2013 task 1: Tempeval-3: evaluating time expressions, events, and temporal relations. In: Second Joint Conference on Lexical and Computational Semantics (*SEM). Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pp. 1–9. Association for Computational Linguistics, Atlanta (2013)
Vanelli, L.: La concordanza dei tempi. In: Renzi, L., Salvi, G., Cardinaletti, A. (eds.) Grande Grammatica Italiana di Consultazione. I sintagmi verbale, aggettivale e avverbiale. La Subordinazione, vol. II, pp. 611–632. Il Mulino (2001)
Verhagen, M.: The brandeis annotation tool. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation, pp. 3638–3643. European Languages Resources Association (ELRA), Valletta (2010). ACL Anthology Identifier: L10-1513
Verhagen, M., Gaizauskas, R., Schilder, F., Hepple, M., Katz, G., Pustejovsky, J.: Semeval-2007 task 15: tempeval temporal relation identification. In: Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), pp. 75–80 (2007)
Verhagen, M., Saurí, R., Caselli, T., Pustejovsky, J.: Semeval-2010 task 13: Tempeval-2. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 57–62. ACL, Uppsala (2010)
Yaghoobzadeh, Y., Ghassem-Sani, G., Mirroshandel, S.A., Eshaghzadeh, M.: ISO-TimeML event extraction in Persian text. In: Proceedings of the 24th International Conference on Computational Linguistics, pp. 2931–2944 (2012)
Acknowledgements
This development of the ILC corpus has been supported by two grants from the Instituto di Linguistica Computazionale - CNR of Pisa, “Disegno di Standard e Costruzione di Risorse Linguistico Computazionali”, IC-P02-ILC-CNR and “Risorse e Tecnologie Linguistiche: modelli, metodi di sviluppo, applicazioni, disegno di strategie internazionali”, IC.P02.005. Assistance provided by Irina Prodanof and Nicoletta Calzolari was greatly appreciated.
The development of the CELCT corpus has been supported by LiveMemories project (Active Digital Memories of Collective Life), funded by the Autonomous Province of Trento under the Major Projects 2006 research program, and by the European Union’s 7th Framework Programme via the NewsReader Project (ICT-316404). Emanuele Pianta and Valentina Bartalesi Lenzi made invaluable contribution to the creation of the CELCT Corpus. Special thanks go to Giovanni Moretti, and Alessandro Marchetti who collaborated with us in processing and annotating the CELCT corpus.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Caselli, T., Sprugnoli, R. (2017). It-TimeML and the Ita-TimeBank: Language Specific Adaptations for Temporal Annotation. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_36
Download citation
DOI: https://doi.org/10.1007/978-94-024-0881-2_36
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-024-0879-9
Online ISBN: 978-94-024-0881-2
eBook Packages: Social SciencesSocial Sciences (R0)