Skip to main content

Advertisement

Log in

On searching and indexing sequences of temporal intervals

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

In several application domains, including sign language, sensor networks, and medicine, events are not necessarily instantaneous but they may have a time duration. Such events build sequences of temporal intervals, which may convey useful domain knowledge; thus, searching and indexing these sequences is crucial. We formulate the problem of comparing sequences of labeled temporal intervals and present a distance measure that can be computed in polynomial time. We prove that the distance measure is metric and satisfies the triangle inequality. For speeding up search in large databases of sequences of temporal intervals, we propose an approximate indexing method that is based on embeddings. The proposed indexing framework is shown to be contractive and can guarantee no false dismissal. The distance measure is tested and benchmarked through rigorous experimentation on real data taken from several application domains, including: American Sign Language annotated video recordings, robot sensor data, and Hepatitis patient data. In addition, the indexing scheme is tested on a large synthetic dataset. Our experiments show that speedups of over an order of magnitude can be achieved while maintaining high levels of accuracy. As a result of our work, it becomes possible to implement recommender systems, search engines and assistive applications for the fields that employ sequences of temporal intervals.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. For the remainder of this paper we will refer to nearest-neighbor retrieval accuracy (Papapetrou et al. 2011) as “retrieval accuracy”. For clarity, we note that in this paper this term is used within the context of nearest neighbor similarity search and not in the context of information retrieval. A formal definition is provided in Sect. 6.

  2. http://users.ics.aalto.fi/kostakis/software/ArtemisJournal.

  3. http://www.ics.uci.edu/~mlearn/MLRepository.html.

  4. http://users.ics.aalto.fi/kostakis/software/INT-GEN.zip.

References

  • Abraham T, Roddick JF (1999) Incremental meta-mining from large temporal data sets. In: ER ’98: Proceedings of the Workshops on Data Warehousing and Data Mining, pp 1–37

    Chapter  Google Scholar 

  • Ale JM, Rossi GH (2000) An approach to discovering temporal association rules. In: Proceedings of the ACM Symposium On Applied Computing, pp 294–300

  • Allen JF (1983) Maintaining knowledge about temporal intervals. Commun ACM 26(11):832–843

    Article  Google Scholar 

  • Athitsos V, Hadjieleftheriou M, Kollios G, Sclaroff S (2007) Query-sensitive embeddings. ACM Trans Database Syst 32(2). doi:10.1145/1242524.1242525

    Article  Google Scholar 

  • Batal I, Sacchi L, Bellazzi R, Hauskrecht M (2009) Multivariate time series classification with temporal abstractions. In: FLAIRS

  • Batal I, Fradkin D, Harrison J, Moerchen F, Hauskrecht M (2012) Mining recent temporal patterns for event detection in multivariate time series data. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, pp 280–288

  • Batal I, Valizadegan H, Cooper GF, Hauskrecht M (2013) A temporal pattern mining approach for classifying electronic health record data. ACM Trans Intell Syst Technol 4(4):63:1–63:22

    Article  Google Scholar 

  • Bentley JL, Friedman JH (1979) Data structures for range searching. ACM Comput Surv 11(4):397–409. doi:10.1145/356789.356797

    Article  Google Scholar 

  • Berendt B (1996) Explaining preferred mental models in Allen inferences with a metrical model of imagery. In: Proceedings of the Conference of the Cognitive Science Society, pp 489–494

  • Bergen B, Chang N (2005) Embodied construction grammar in simulation-based language understanding. In: Construction grammars: cognitive grounding and theoretical extensions, vol 3, pp 147–190

    Chapter  Google Scholar 

  • Bunke H (2000) Recent developments in graph matching. In: IEEE 15th International Conference on Pattern Recognition, vol 2, pp 117–124

  • Burrows M, Wheeler DJ (1994) A block-sorting lossless data compression algorithm. Tech. Rep. 124, Systems Research Center, Palo Alto. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.6774

  • Chen X, Petrounias I (1999) Mining temporal features in association rules. In: Proceedings of the 3rd European Conference on Principles and Practice of Knowledge Discovery in Databases, Springer, pp 295–300

  • Chen L, Ng R (2004) On the marriage of \(l_p\)-norms and edit distance. In: VLDB, pp 792–803

  • Chen L, Özsu MT (2005) Robust and fast similarity search for moving object trajectories. In: SIGMOD, pp 491–502

  • Chen YC, Jiang JC, Peng WC, Lee SY (2010) An efficient algorithm for mining time interval-based patterns in large database. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM ’10, pp 49–58

  • Chen YC, Peng WC, Le SY (2011) CEMiner- an effcient algorithms for mining closed patterns from interval-based data. In: Proceedings of the IEEE International Conference on Data Mining (ICDM)

  • Chen YC, Weng JTY, Hui L (2015) A novel algorithm for mining closed temporal patterns from interval-based data. KAIS 46(1):151–183

    Google Scholar 

  • Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. In: Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, ACM, New York, NY, USA, SIGMOD ’94, pp 419–429

    Article  Google Scholar 

  • Finkel RA, Bentley JL (1974) Quad trees: a data structure for retrieval on composite keys. Acta Inf 4:1–9. doi:10.1007/BF00288933

    Article  MATH  Google Scholar 

  • Fradkin D, Mörchen F (2015) Mining sequential patterns for classification. Knowl Inf Syst 45(3):731–749

    Article  Google Scholar 

  • Gaede V, Günther O (1998) Multidimensional access methods. ACM Comput Surv 30(2):170–231

    Article  Google Scholar 

  • Giannotti F, Nanni M, Pedreschi D (2006) Efficient mining of temporally annotated sequences. In: Proceedings of the 6th SIAM Data Mining Conference, vol 124, pp 348–359

  • Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: Proceedings of the 25th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, VLDB ’99, pp 518–529. http://dl.acm.org/citation.cfm?id=645925.671516

  • Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, ACM, New York, NY, USA, SIGMOD ’84, pp 47–57. doi:10.1145/602259.602266

  • Han TS, Ko SK, Kang J (2007) Efficient subsequence matching using the longest common subsequence with a dual match index. In: International Workshop on Machine Learning and Data Mining in Pattern Recognition, Springer, pp 585–600

  • Hjaltason G, Samet H (2003) Properties of embedding methods for similarity searching in metric spaces. IEEE Trans Pattern Anal Mach Intell 25(5):530–549

    Article  Google Scholar 

  • Höppner F (2001) Discovery of temporal patterns: learning rules about the qualitative behaviour of time series. In: Proceedings of the European Conference on Principles of Knowledge Discovery in Databases, pp 192–203

    Chapter  Google Scholar 

  • Höppner F, Klawonn F (2001) Finding informative rules in interval sequences. In: Proceedings of the International Symposium on Advances in Intelligent Data Analysis, pp 123–132

    Chapter  Google Scholar 

  • Hwang SY, Wei CP, Yang WS (2004) Discovery of temporal patterns from process instances. Comput Ind 53(3):345–364

    Article  Google Scholar 

  • Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust Speech and Signal Process 23(1):67–72

    Article  Google Scholar 

  • Kam P, Fu AW (2000) Discovering temporal patterns for interval-based events. In: Proceedings of the 2nd International Conference on Data Warehousing and Knowledge Discovery, pp 317–326

    Chapter  Google Scholar 

  • Keogh E (2002) Exact indexing of dynamic time warping. In: Proceedings of the 28th International Conference on Very Large Data Bases (VLDB), pp 406–417

    Chapter  Google Scholar 

  • Klimov D, Shknevsky A, Shahar Y (2015) Exploration of patterns predicting renal damage in patients with diabetes type II using a visual temporal analysis laboratory. J Am Med Inform Assoc 22(2):275–289

    Google Scholar 

  • Kosara R, Miksch S (2001) Visualizing complex notions of time. Stud Health Technol Inform 1:211–215

    Google Scholar 

  • Kostakis O (2014) Classy: fast clustering streams of call-graphs. Data Min Knowl Discov 28(5–6):1554–1585

    Article  MathSciNet  Google Scholar 

  • Kostakis O, Gionis A (2015) Subsequence search in event-interval sequences. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp 851–854

  • Kostakis O, Papapetrou P (2015) Finding the longest common sub-pattern in sequences of temporal intervals. Data Min Knowl Discov 29(5):1178–1210

    Article  MathSciNet  Google Scholar 

  • Kostakis O, Papapetrou P, Hollmén J (2011a) Artemis: assessing the similarity of event-interval sequences. In: Proceedings of the Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD 2011), pp 229–244

  • Kostakis O, Papapetrou P, Hollmén J (2011b) Distance measure for querying arrangements of temporal intervals. In: Proceedings of Pervasive Technologies Related to Assistive Environments

  • Kotsifakos A, Papapetrou P, Athitsos V (2013) IBSM: Interval-based sequence matching. In: Proceedings of SIAM Conference on Data Mining, pp 596–604

    Chapter  Google Scholar 

  • Kruskall JB, Liberman M (1983) The symmetric time warping algorithm: from continuous to discrete. In: Time warps, Addison-Wesley

  • Laxman S, Sastry P, Unnikrishnan K (2007) Discovering frequent generalized episodes when events persist for different durations. IEEE Trans Knowl Data Eng 19(9):1188–1201

    Article  Google Scholar 

  • Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys 10(8):707–710

    MathSciNet  Google Scholar 

  • Li C, Lu J, Lu Y (2008) Efficient merging and filtering algorithms for approximate string searches. In: International Conference on data Engineering (ICDE)

  • Li Y, Patel JM, Terrell A (2012) Wham: a high-throughput sequence alignment method. ACM Trans Database Syst (TODS) 37(4):28

    Google Scholar 

  • Lin JL (2003) Mining maximal frequent intervals. In: Proceedings of the ACM Symposium On Applied Computing, pp 624–629

  • Maier D (1978) The complexity of some problems on subse- quences and supersequences. J ACM 25(2):322–336

    Article  Google Scholar 

  • Mooney C, Roddick JF (2004) Mining relationships between interacting episodes. In: Proceedings of the 4th SIAM International Conference on Data Mining

  • Mörchen F (2007) Unsupervised pattern mining from symbolic temporal data. SIGKDD Explor Newsl 9:41–55

    Article  Google Scholar 

  • Mörchen F (2010) Temporal pattern mining in symbolic time point and time interval data. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, ACM, KDD ’10, pp 2:1–2:1

  • Mörchen F, Fradkin D (2010) Robust mining of time intervals with semi-interval partial order patterns. In: Proceedings of the SIAM International Conference on Data Mining, pp 315–326

  • Moskovitch R, Shahar Y (2009) Medical temporal-knowledge discovery via temporal abstraction. Proceedings of the AMIA Annual Symposium 2009:452–456

    Google Scholar 

  • Moskovitch R, Shahar Y (2014a) Classification-driven temporal discretization of multivariate time series. Data Min Knowl Discov 29(4):871–913

    Article  MathSciNet  Google Scholar 

  • Moskovitch R, Shahar Y (2014b) Classification of multivariate time series via temporal abstraction and time intervals mining. Knowl Inf Syst 45(1):35–74

    Article  Google Scholar 

  • Moskovitch R, Shahar Y (2015) Fast time intervals mining using the transitivity of temporal relations. Knowl Inf Syst 42(1):21–48

    Article  Google Scholar 

  • Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1):32–38

    Article  MathSciNet  Google Scholar 

  • Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453

    Article  Google Scholar 

  • Orlandic R, Yu B (2002) A retrieval technique for high-dimensional data and partially specified queries. Data Knowl Eng 42(1):1–21. doi:10.1016/S0169-023X(02)00023-X

    Article  MATH  Google Scholar 

  • Pachet F, Ramalho G, Carrive J (1996) Representing temporal musical objects and reasoning in the MusES system. J New Music Res 25(3):252–275

    Article  Google Scholar 

  • Papapetrou P, Kollios G, Sclaroff S, Gunopulos D (2005) Discovering frequent arrangements of temporal intervals. In: Proceedings of IEEE International Conference on Data Mining, pp 354–361

  • Papapetrou P, Benson G, Kollios G (2006) Discovering frequent poly-regions in DNA sequences. In: Proceedings of the IEEE ICDM Workshop on Data Mining in Bioinformatics

  • Papapetrou P, Athitsos V, Kollios G, Gunopulos D (2009a) Reference-based alignment in large sequence databases. Proc VLDB Endow 2(1):205–216

    Article  Google Scholar 

  • Papapetrou P, Kollios G, Sclaroff S, Gunopulos D (2009b) Mining frequent arrangements of temporal intervals. Knowl Inf Syst 21:133–171

    Article  Google Scholar 

  • Papapetrou P, Athitsos V, Potamias M, Kollios G, Gunopulos D (2011) Embedding-based subsequence matching in time-series databases. ACM Trans Database Syst 36(3):17:1–17:39

    Article  Google Scholar 

  • Patel D, Hsu W, Lee M (2008) Mining relationships among interval-based events for classification. In: Proceedings of the 28th ACM SIGMOD International Conference on Management of Data, ACM, pp 393–404

  • Pissinou N, Radev I, Makki K (2001) Spatio-temporal modeling in video and multimedia geographic information systems. GeoInformatica 5(4):375–409

    Article  Google Scholar 

  • Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, pp 262–270

  • Sacchi L, Larizza C, Combi C, Bellazzi R (2007) Data mining with temporal abstractions: learning rules from time series. Data Min Knowl Discov 15(2):217–247

    Article  MathSciNet  Google Scholar 

  • Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. Trans ASSP 26:43–49

    Article  Google Scholar 

  • Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197

    Article  Google Scholar 

  • Umeyama S (1988) An eigendecomposition approach to weighted graph matching problems. IEEE Trans Pattern Anal Mach Intell 10(5):695–703

    Article  Google Scholar 

  • Venkateswaran J, Lachwani D, Kahveci T, Jermaine C (2006) Reference-based indexing of sequence databases. In: International Conference on Very Large Databases (VLDB), pp 906–917

  • Villafane R, Hua KA, Tran D, Maulik B (2000) Knowledge discovery from series of interval events. Intell Inf Syst 15(1):71–89

    Article  Google Scholar 

  • Weber R, Schek HJ, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of the 24rd International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, VLDB ’98, pp 194–205. http://dl.acm.org/citation.cfm?id=645924.671192

  • Winarko E, Roddick JF (2007) Armada: an algorithm for discovering richer relative temporal association rules from interval-based data. Data Knowl Eng 63(1):76–90

    Article  Google Scholar 

  • Wu SY, Chen YL (2007) Mining nonambiguous temporal patterns for interval-based events. IEEE Trans Knowl Data Eng 19(6):742–758

    Article  Google Scholar 

  • Yang X, Wang B, Li C (2008) Cost-based variable-length-gram selection for string collections to support approximate queries efficiently. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, ACM, pp 353–364

  • Yi BK, Roh JW (2004) Similarity search for interval time sequences. In: International Conference on Database Systems for Advanced Applications, Springer, pp 232–243

Download references

Acknowledgements

The work of Orestis Kostakis was supported in party by the Helsinki Doctoral Education Network in Information and Communications Technology (HICT). The work of Panagiotis Papapetrou was supported in part by the Stockholm City Council (Stockholms Läns Landsting).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Orestis Kostakis.

Additional information

Responsible editor: Eamonn Keogh.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kostakis, O., Papapetrou, P. On searching and indexing sequences of temporal intervals. Data Min Knowl Disc 31, 809–850 (2017). https://doi.org/10.1007/s10618-016-0489-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-016-0489-3

Keywords

Navigation