Skip to main content

Multilingual and cross-domain temporal tagging

Abstract

Extraction and normalization of temporal expressions from documents are important steps towards deep text understanding and a prerequisite for many NLP tasks such as information extraction, question answering, and document summarization. There are different ways to express (the same) temporal information in documents. However, after identifying temporal expressions, they can be normalized according to some standard format. This allows the usage of temporal information in a term- and language-independent way. In this paper, we describe the challenges of temporal tagging in different domains, give an overview of existing annotated corpora, and survey existing approaches for temporal tagging. Finally, we present our publicly available temporal tagger HeidelTime, which is easily extensible to further languages due to its strict separation of source code and language resources like patterns and rules. We present a broad evaluation on multiple languages and domains on existing corpora as well as on a newly created corpus for a language/domain combination for which no annotated corpus has been available so far.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Notes

  1. 1.

    Recently, we were able to add resources for Dutch. These were developed and kindly provided by Matje van de Camp (Tilburg University).

  2. 2.

    HeidelTime, the German corpus as well as additional scripts and components are publicly available at http://dbs.ifi.uni-heidelberg.de/heideltime/. Thus, all our evaluation results are reproducible.

  3. 3.

    http://en.wikipedia.org/wiki/Wikipedia:Featured_articles.

  4. 4.

    http://fofoca.mitre.org/.

  5. 5.

    http://www.timeml.org/.

  6. 6.

    The details of the attributes are described in the TimeML annotation guidelines including further attributes, e.g., to capture the function of a temporal expression in a document. For details, see http://www.timeml.org/.

  7. 7.

    The 2004 and 2005 training sets and the 2004 evaluation set are released by LDC (LDC2005T07, LDC2006T06, LDC2010T18); see: http://www.ldc.upenn.edu/.

  8. 8.

    The TERN evaluation script is available at http://fofoca.mitre.org/tern.html.

  9. 9.

    TimeBank 1.2 is released by LDC (LDC2006T08); see: http://www.ldc.upenn.edu/.

  10. 10.

    The TempEval-2 data are available at http://timeml.org/site/timebank/timebank.html. While TempEval-2 had a task for the extraction and normalization of temporal expressions, the first TempEval evaluation challenge concentrated on tasks for identifying temporal relations. Thus, we do not consider the corpus of the first TempEval here.

  11. 11.

    WikiWars is available at http://www.timexportal.info/wikiwars/.

  12. 12.

    WikiWarsDE is publicly available at http://dbs.ifi.uni-heidelberg.de/heideltime/.

  13. 13.

    http://callisto.mitre.org/.

  14. 14.

    http://timeml.org/site/tarsqi/modules/gutime/index.html.

  15. 15.

    Due to this feature, we were able to include Dutch language resources recently developed at Tilburg University, see, http://dbs.ifi.uni-heidelberg.de/heideltime/.

  16. 16.

    See http://uima.apache.org/.

  17. 17.

    Our UIMA components as well as conversion scripts described in this section are available at http://dbs.ifi.uni-heidelberg.de/heideltime/.

  18. 18.

    Results slightly differ from HeidelTime-1 due to some bug fixes.

References

  1. Ahn, D., Adafre, S. F., & de Rijke, M. (2005a). Extracting temporal information from open domain text: A comparative exploration. Journal of Digital Information Management, 3, 14–20.

    Google Scholar 

  2. Ahn, D., Adafre, S. F., & de Rijke, M. (2005b). Towards task-based temporal extraction and recognition. In G. Katz, J. Pustejovsky, & F. Schilder (Eds.) Annotating, extracting and reasoning about time and events, Dagstuhl, Germany, no. 05151 in Dagstuhl Seminar Proceedings.

  3. Ahn, D., van Rantwijk, J., & de Rijke, M. (2007) A cascaded machine learning approach to interpreting temporal expressions. In Proceedings of human language technologies: The annual conference of the North American chapter of the association for computational linguistics, pp. 420–427.

  4. Allan, J. (Ed.) (2002). Topic detection and tracking: Event-based information organization. Norwell, MA: Kluwer Academic Publishers.

    Google Scholar 

  5. Alonso, O., Gertz, M., & Baeza-Yates, R. (2007). On the value of temporal information in information retrieval. SIGIR Forum, 41(2), 35–41.

    Article  Google Scholar 

  6. Alonso, O., Strötgen, J., Baeza-Yates, R., & Gertz, M. (2011). Temporal information retrieval: Challenges and opportunities. In Proceedings of the 1st international temporal web analytics workshop (TWAW 2011), pp. 1–8.

  7. Boguraev, B., & Ando, R. K. (2005). TimeBank-driven TimeML analysis. In G. Katz, J. Pustejovsky, & F. Schilder (Eds.) Annotating, extracting and reasoning about tme and events, no. 05151 in Dagstuhl Seminar Proceedings.

  8. Chinchor, N. A. (1997). Overview of MUC-7/MET-2. In Proceedings of the 7th conference on message understanding (MUC 1997).

  9. Costa, F., & Branco, A. (2010). Temporal information processing of a new language: Fast porting with minimal resources. In Proceedings of the 48th annual meeting of the association for computational linguistics (ACL ’10), pp. 671–677.

  10. Ferro, L., Mani, I., Sundheim, B., & Wilson, G. (2001). TIDES temporal annotation guidelines—version 1.0.2. Technical report, The MITRE Corporation.

  11. Ferro, L., Gerber, L., Mani, I., Sundheim, B., & Wilson, G. (2005). TIDES 2005 standard for the annotation of temporal expressions. Technical report, The MITRE Corporation.

  12. Grishman, R., & Sundheim, B. (1995). Design of the MUC-6 evaluation. In Proceedings of the 6th conference on message understanding (MUC 1995).

  13. Gurevych, I., Mühlhäuser, M., Müller, C., Steimle, J., Weimer, M., & Zesch, T. (2007). Darmstadt knowledge processing repository based on UIMA. In Proceedings of the first workshop on unstructured information management architecture at biannual conference of the society for computational linguistics and language technology.

  14. Hacioglu, K., Chen, Y., & Douglas, B. (2005). Automatic time expression labeling for english and chinese text. In Proceedings of the 6th international conference on intelligent text processing and computational linguistics (CICLing ’05), Springer, pp. 548–559.

  15. Kolomiyets, O., & Moens, M.-F. (2009). Meeting tempeval-2: Shallow approach for temporal tagger. In Proceedings of the workshop on semantic evaluations: Recent achievements and future directions (DEW ’09), pp. 52–57.

  16. Makkonen, J., Ahonen-myka, H., & Salmenkivi, M. (2003). Topic detection and tracking with spatio-temporal evidence. In Proceedings of 25th European conference on information retrieval research (ECIR ’03), pp. 251–265.

  17. Mani, I., & Wilson, G. (2000). Robust temporal processing of news. In Proceedings of the 38th annual meeting on association for computational linguistics (ACL ’00), pp. 69–76.

  18. Mazur, P., & Dale, R. (2009). The DANTE temporal expression tagger. In Proceedings of the 3rd language and technology conference, pp. 245–257.

  19. Mazur, P., & Dale, R. (2010). WikiWars: A new corpus for research on temporal expressions. In Proceedings of the conference on empirical methods in natural language processing (EMNLP ’10), pp. 913–922.

  20. Negri, M., & Marseglia, L. (2005). Recognition and normalization of time expressions: ITC-irst at TERN 2004. Technical report.

  21. Negri, M., Saquete, E., Martínez-Barco, P., & Muñoz, R. (2006). Evaluating knowledge-based approaches to the multilingual extension of a temporal expression normalizer. In Proceedings of the workshop on annotating and reasoning about time and events (ARTE ’06), pp. 30–37.

  22. Pustejovsky, J., Castaño, J. M., Ingria, R., Sauri, R., Gaizauskas, R. J., Setzer, A., Katz, G., & Radev, D. R. (2003a). TimeML: Robust specification of event and temporal expressions in text. In: New Directions in Question Answering, pp. 28–34.

  23. Pustejovsky, J., Hanks, P., Sauri, R., See, A., Gaizauskas, R., Setzer, A., Radev, D., Sundheim, B., Day, D., Ferro, L., Lazo, M. (2003b). The TIMEBANK corpus. In Proceedings of corpus linguistics 2003, pp. 647–656.

  24. Pustejovsky, J., Knippen, R., Littman, J., & Sauri, R. (2005). Temporal and event information in natural language text. Language resources and evaluation, 39(2–3), 23–164.

    Google Scholar 

  25. Saquete Boro, E. (2010). ID 392:TERSEO + T2T3 transducer. A systems for recognizing and normalizing TIMEX3. In Proceedings of the 5th international workshop on semantic evaluation (SemEval ’10), pp. 317–320.

  26. Schilder, F., & Habel, C. (2001). From temporal expressions to temporal information: Semantic tagging of news messages. In Proceedings of the ACL-2001 workshop on temporal and spatial information processing, pp. 65–72.

  27. Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In Proceedings of the international conference on new methods in language processing.

  28. Strötgen, J., & Gertz, M. (2010a). HeidelTime: High quality rule-based extraction and normalization of temporal expressions. In Proceedings of the 5th international workshop on semantic evaluation (SemEval ’10), pp. 321–324.

  29. Strötgen, J., & Gertz, M. (2010b). TimeTrails: A system for exploring spatio-temporal information in documents. In Proceedings of the 36th international conference on very large data bases (VLDB 2010), pp. 1569–1572.

  30. Strötgen, J., & Gertz, M. (2011). WikiWarsDE: A German corpus of narratives annotated with temporal expressions. In Proceedings of the conference of the German society for computational linguistics and language technology (GSCL 2011), pp. 129–134.

  31. Strötgen, J., Gertz, M., & Popov, P. (2010). Extraction and exploration of spatio-temporal information in documents. In Proceedings of the 6th workshop on geographic information retrieval (GIR ’10), pp. 1–8.

  32. Strötgen, J., Gertz, M., & Junghans, C. (2011) An event-centric model for multilingual document similarity. In Proceeding of the 34rd international ACM SIGIR conference on research and development in information retrieval (SIGIR’11), pp. 953–962.

  33. UzZaman, N., & Allen, J. (2011). Event and temporal expression extraction from raw text: First step towards a temporally aware system. International Journal of Semantic Computing, 4(4), 487–508.

    Google Scholar 

  34. Verhagen, M., & Pustejovsky, J. (2008). Temporal processing with the TARSQI toolkit. In Coling 2008: Companion volume: Demonstrations, pp. 189–192.

  35. Verhagen, M., Sauri, R., Caselli, T., & Pustejovsky, J. (2010). SemEval-2010 task 13: TempEval-2. In Proceedings of the 5th international workshop on semantic evaluation (SemEval ’10), pp. 57–62.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Michael Gertz.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Strötgen, J., Gertz, M. Multilingual and cross-domain temporal tagging. Lang Resources & Evaluation 47, 269–298 (2013). https://doi.org/10.1007/s10579-012-9179-y

Download citation

Keywords

  • Temporal information
  • Temporal tagger
  • Named entity recognition
  • Named entity normalization
  • TIMEX2
  • TIMEX3