Integer programming ensemble of temporal relations classifiers


The extraction of temporal events from text and the classification of temporal relations among both temporal events and time expressions are major challenges for the interface of data mining and natural language processing. We present an ensemble method, which reconciles the outputs of multiple heterogenous classifiers of temporal expressions. We use integer programming, a constrained optimisation technique, to improve on the best result of any individual classifier by choosing consistent temporal relations from among those recommended by multiple classifiers. Our ensemble method is conceptually simple and empirically powerful. It allows us to encode knowledge about the structure of valid temporal expressions as a set of constraints. It obtains new state-of-the-art results on two recent natural language processing challenges, SemEval-2013 TempEval-3 (Temporal Annotation) and SemEval-2016 Task 12 (Clinical TempEval), with F1 scores of 0.3915 and 0.595 respectively.

  1. 1.

    In Clinical TempEval 2016, sets were not reduced, so \(S_r = S\) and \(H_r = H\).

  2. 2.

    We note these numbers are directly comparable to the performance of individual classifiers on S2, as captured in Tables 6 and 7, but not the performance of the individual classifiers on \(S1 \cup S2\), as documented in Table 4.

  3. 3.

    The ROC curve is usually plotted in terms of recall (true positive rate) and the false positive rate, rather than precision. We use recall and precision to keep the values directly comparable with Tables 4, 5, 6, 7 and 8.

  4. 4.

    UtahBMI submitted corrected classifiers to the Task 12 challenge, whose results were too late for formal inclusion. We opted to use this data instead.


The authors would like to thank Chambers (2013), Bethard (2013), Laokulrat et al. (2013), Leeuwenberg and Moens (2016), Caselli and Morante (2016), Chikka (2016), Grouin and Moriceau (2016), Khalifa et al. (2016), Cohan et al. (2016), and Lee et al. (2016), whose work and data kindly shared with us made this research possible. Jakub Marecek has received funding from the European Union Horizon 2020 Programme (Horizon2020/2014-2020), under Grant Agreement No. 688380.

  • Natural language processing
  • Temporal reasoning
  • Ensemble methods
  • Integer programming