Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Integer programming ensemble of temporal relations classifiers


The extraction of temporal events from text and the classification of temporal relations among both temporal events and time expressions are major challenges for the interface of data mining and natural language processing. We present an ensemble method, which reconciles the outputs of multiple heterogenous classifiers of temporal expressions. We use integer programming, a constrained optimisation technique, to improve on the best result of any individual classifier by choosing consistent temporal relations from among those recommended by multiple classifiers. Our ensemble method is conceptually simple and empirically powerful. It allows us to encode knowledge about the structure of valid temporal expressions as a set of constraints. It obtains new state-of-the-art results on two recent natural language processing challenges, SemEval-2013 TempEval-3 (Temporal Annotation) and SemEval-2016 Task 12 (Clinical TempEval), with F1 scores of 0.3915 and 0.595 respectively.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9


  1. 1.

    In Clinical TempEval 2016, sets were not reduced, so \(S_r = S\) and \(H_r = H\).

  2. 2.

    We note these numbers are directly comparable to the performance of individual classifiers on S2, as captured in Tables 6 and 7, but not the performance of the individual classifiers on \(S1 \cup S2\), as documented in Table 4.

  3. 3.

    The ROC curve is usually plotted in terms of recall (true positive rate) and the false positive rate, rather than precision. We use recall and precision to keep the values directly comparable with Tables 4, 5, 6, 7 and 8.

  4. 4.

    UtahBMI submitted corrected classifiers to the Task 12 challenge, whose results were too late for formal inclusion. We opted to use this data instead.


  1. Allen JF (1983) Maintaining knowledge about temporal intervals. Commun ACM 26(11):832–843

  2. Ashish N, Eguchi R, Hegde R, Huyck C, Kalashnikov D, Mehrotra S, Smyth P, Venkatasubramanian N (2008) Situational awareness technologies for disaster response. In: Chen H, Reid E, Sinai J, Silke A, Ganor B (eds) Terrorism informatics. Springer, Boston, pp 517–544

  3. Benetka JR, Balog K, Nørvåg K (2017) Towards building a knowledge base of monetary transactions from a news collection. In: Proceedings of the 17th ACM/IEEE joint conference on digital libraries, JCDL ’17, pp 209–218, Piscataway, NJ, USA, 2017. IEEE Press. ISBN 978-1-5386-3861-3

  4. Bethard S (2013) ClearTK-TimeML: a minimalist approach to TempEval 2013. In: 2nd joint conference on lexical and computational semantics (*SEM), vol 2, pp 10–14

  5. Bethard S, Savova G, Chen W-T, Derczynski L, Pustejovsky J, Verhagen M (2016) SemEval-2016 task 12: Clinical TempEval. In: Proceedings of SemEval, pp 1052–1062

  6. Bhattacharya I, Getoor L (2007) Collective entity resolution in relational data. ACM Trans Knowl Discov Data 1(1):5. ISSN 1556-4681

  7. Bier EA, Card SK, Bodnar JW (2008) Entity-based collaboration tools for intelligence analysis. In: IEEE symposium on visual analytics science and technology. VAST’08, 2008. IEEE, pp 99–106

  8. Burke EK, Mareček J, Parkes AJ, Rudová H (2012) A branch-and-cut procedure for the Udine course timetabling problem. Ann Oper Res 194(1):71–87

  9. Caselli T, Morante R (2016) VUACLTL at SemEval 2016 task 12: a CRF pipeline to clinical TempEval. In: Proceedings of SemEval, pp 1241–1247

  10. Cesa-Bianchi N, Lugosi G (2006) Prediction, learning, and games. Cambridge University Press, Cambridge

  11. Chambers N (2013) NavyTime: event and time ordering from raw text. In: 2nd joint conference on lexical and computational semantics (*SEM), vol 2. Association for Computational Linguistics, pp 73–77

  12. Chambers N, Jurafsky D (2008) Jointly combining implicit constraints improves temporal ordering. In: Proceedings of the 2008 conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 698–706

  13. Chikka VR (2016) CDE-IIITH at semeval-2016 task 12: extraction of temporal information from clinical documents using machine learning techniques. In: Proceedings of SemEval, pp 1237–1240

  14. Cohan A, Meurer K, Goharian N (2016) GUIR at SemEval-2016 task 12: temporal information processing for clinical narratives. In: Proceedings of SemEval, pp 1248–1255

  15. Comfort S, Perera S, Hudson Z, Dorrell D, Meireis S, Nagarajan M, Ramakrishnan C, Fine J (2018) Sorting through the safety data haystack: using machine learning to identify individual case safety reports in social-digital media. Drug Saf 41(6):579–590. https://doi.org/10.1007/s40264-018-0641-7. ISSN 1179-1942

  16. Daykin JW, Miller M, Ryan J (2016) Trends in temporal reasoning: constraints, graphs and posets. In: IS Kotsireas, SM Rump, CK Yap (eds) Mathematical aspects of computer and information sciences, pp 290–304. Springer International Publishing, Cham. ISBN 978-3-319-32859-1

  17. Dietterich TG (2000) Ensemble methods in machine learning. In: Multiple classifier systems. Springer, pp 1–15

  18. Do QX, Lu W, Roth D (2012) Joint inference for event timeline construction. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. Association for Computational Linguistics, pp 677–687

  19. Elkin PL, Froehling DA, Wahner-Roedler DL, Brown SH, Bailey KR (2012) Comparison of natural language processing biosurveillance methods for identifying influenza from encounter notes. Ann Intern Med 156(1-Part-1):11–18

  20. Flach PA (2003) The geometry of ROC space: understanding machine learning metrics through roc isometrics. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 194–201

  21. Florian R, Cucerzan S, Schaefer C, Yarowsky D (2002) Combining classifiers for word sense disambiguation. Nat Lang Eng 8:327–341

  22. Forrest J, Lougee-Heimer R (2005) CBC user guide. In: INFORMS tutorials in operations research, pp 257–277

  23. Glavaš G, Šnajder J (2015) Construction and evaluation of event graphs. Nat Lang Eng 21:607–652

  24. Grouin C, Moriceau V (2016) LIMSI at SemEval-2016 task 12: machine-learning and temporal information to identify clinical events and time expressions. In: Proceedings of SemEval, pp 1225–1230

  25. Hart William E, Carl L, Jean-Paul W, Woodruff David L (2012) Pyomo—optimization modeling in python. Springer, Berlin

  26. Huang C-C, Lu Z (2016) Community challenges in biomedical text mining over 10 years: success, failure and the future. Brief Bioinform 17(1):132–144. https://doi.org/10.1093/bib/bbv024

  27. Khalifa A, Velupillai S, Meystre S (2016) UtahBMI at SemEval-2016 task 12: extracting temporal information from clinical text. In: Proceedings of SemEval, pp 1256–1262

  28. Josef K, Mohamad H, Duin Robert PW, Jiri M (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239

  29. Ladkin PB (1990) Constraint reasoning with intervals: a tutorial, survey and bibliography. International Computer Science Institute, Berkeley

  30. Laokulrat N, Miwa M, Tsuruoka Y, Chikayama T (2013) UTTime: temporal relation classification using deep syntactic features. In: 2nd joint conference on lexical and computational semantics (*SEM), vol 2, pp 88–92

  31. Lee H-J, Zhang Y, Xu J, Moon S, Wang J, Wu Y, Xu H (2016) UTHealth at SemEval-2016 task 12: an end-to-end system for temporal information extraction from clinical notes. In: Proceedings of SemEval, pp 1292–1297

  32. Leeuwenberg A, Moens M-F (2016) KULeuven-LIIR at SemEval- 2016 task 12: detecting narrative containment in clinical records. In: Proceedings of SemEval, pp 1280–1285

  33. Madhavan J, Jeffery SR, Cohen S, Dong X, Ko D Yu C, Halevy A (2007) Web-scale data integration: You can only afford to pay as you go. In: CIDR, 2017

  34. Movshovitz-Attias D, Whang SE, Noy N, Halevy A (2010) Discovering subsumption relationships for web-based ontologies. In: Proceedings of the 18th international workshop on web and databases, WebDB’15, New York, NY, USA. ACM, pp 62–69. ISBN 978-1-4503-3627-7

  35. Nebel B, Bürckert H-J (1995) Reasoning about temporal relations: a maximal tractable subclass of Allen’s interval algebra. J ACM 42(1):43–66

  36. Nemhauser GL, Wolsey LA (1988) Integer and combinatorial optimization. Wiley, New York. ISBN 9780471828198; 047182819X

  37. Nuij W, Milea V, Hogenboom F, Frasincar F, Kaymak U (2013) An automated framework for incorporating news into stock trading strategies. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2013.133 ISSN 1041-4347

  38. Papadimitriou CH, Steiglitz K (1998) Combinatorial optimization: algorithms and complexity. Courier Corporation, North Chelmsford

  39. Powers DMW (2011) Evaluation: from precision, recall and f-measure to roc. informedness, markedness and correlation. J Mach Learn Technol 2(1):37–63

  40. Punyakanok V, Roth D, Yih W, Zimak D (2004) Semantic role labeling via integer linear programming inference. In: Proceedings of the 20th international conference on computational linguistics. Association for Computational Linguistics, p 1346

  41. Pustejovsky J, Stubbs A (2011) Increasing informativeness in temporal annotation. In: Proceedings of the 5th linguistic annotation workshop. Association for Computational Linguistics, pp 152–160

  42. Pustejovsky J, Ingria B, Sauri R, Castano J, Littman J, Gaizauskas R, Setzer A, Katz G, Mani I (2005) The specification language TimeML. The language of time: a reader, pp 545–557

  43. Lior R (2010) Pattern classification using ensemble methods, vol 75. World Scientific, Singapore

  44. Roth D, Yih W (2004) A linear programming formulation for global inference in natural language tasks. In: Proceedings of CoNLL-2004

  45. Saurí R, Knippen R, Verhagen M, Pustejovsky J (2005) Evita: a robust event recognizer for QA systems. In: Proceedings of the conference on human language technology and empirical methods in natural language processing, HLT ’05, Stroudsburg, PA, USA, 2005. Association for Computational Linguistics, pp 700–707

  46. Saurí R, Goldberg L, Verhagen M, Pustejovsky J (2009) Annotating events in English. TimeML annotation guidelines. Brandeis University. Version TempEval-2010

  47. Sawilowsky SS (2009) New effect size rules of thumb. J Mod Appl Stat Methods 8:597–599

  48. Schrijver A (2003) Combinatorial optimization: polyhedra and efficiency, vol 24. Springer, Berlin

  49. Seni G, Elder JF (2010) Ensemble methods in data mining: improving accuracy through combining predictions. Synth Lect Data Min Knowl Discov 2(1):1–126

  50. Styler WF IV, Bethard S, Finan S, Palmer M, Pradhan S, de Groen PC, Erickson B, Miller T, Lin C, Savova G et al (2014) Temporal annotation in the clinical domain. Trans Assoc Comput Linguist 2:143–154

  51. Tatonetti NP, Patrick PY, Daneshjou R, Altman RB (2012) Data-driven prediction of drug effects and interactions. Sci Transl Med 4(125ra31):125

  52. Trevor H, Robert T, Friedman JH (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, Berlin

  53. UzZaman N (2012) Interpreting the temporal aspects of language. University of Rochester, Thesis

  54. UzZaman N, Llorens H, Allen J, Derczynski L, Verhagen M, Pustejovsky J (2013) Semeval-2013 task 1: TempEval-3: evaluating time expressions, events, and temporal relations. In: 2nd joint conference on lexical and computational semantics (*SEM), pp 1–9. Association for Computational Linguistics. Also see preprint arXiv:1206.5333

  55. Verhagen M, Gaizauskas R, Schilder F, Hepple M, Moszkowicz J, Pustejovsky J (2009) The TempEval challenge: Identifying temporal relations in text. Lang Resour Eval 43(2):161–179. ISSN 1574020X, 15728412

  56. Woodsend K, Lapata M (2011) Learning to simplify sentences with quasi-synchronous grammar and integer programming. In: Proceedings of the 2011 conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 409–420

  57. Zhou Z-H (2012) Ensemble methods: foundations and algorithms. Chapman and Hall/CRC, Boca Raton

Download references


The authors would like to thank Chambers (2013), Bethard (2013), Laokulrat et al. (2013), Leeuwenberg and Moens (2016), Caselli and Morante (2016), Chikka (2016), Grouin and Moriceau (2016), Khalifa et al. (2016), Cohan et al. (2016), and Lee et al. (2016), whose work and data kindly shared with us made this research possible. Jakub Marecek has received funding from the European Union Horizon 2020 Programme (Horizon2020/2014-2020), under Grant Agreement No. 688380.

Author information

Correspondence to Paula Carroll.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Responsible editor: Björn Bringmann, Jesse Davis, Elisa Fromont and Derek Greene.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kerr, C., Hoare, T., Carroll, P. et al. Integer programming ensemble of temporal relations classifiers. Data Min Knowl Disc 34, 533–562 (2020). https://doi.org/10.1007/s10618-019-00671-x

Download citation


  • Natural language processing
  • Temporal reasoning
  • Ensemble methods
  • Integer programming