Abstract
In several application domains, including sign language, sensor networks, and medicine, events are not necessarily instantaneous but they may have a time duration. Such events build sequences of temporal intervals, which may convey useful domain knowledge; thus, searching and indexing these sequences is crucial. We formulate the problem of comparing sequences of labeled temporal intervals and present a distance measure that can be computed in polynomial time. We prove that the distance measure is metric and satisfies the triangle inequality. For speeding up search in large databases of sequences of temporal intervals, we propose an approximate indexing method that is based on embeddings. The proposed indexing framework is shown to be contractive and can guarantee no false dismissal. The distance measure is tested and benchmarked through rigorous experimentation on real data taken from several application domains, including: American Sign Language annotated video recordings, robot sensor data, and Hepatitis patient data. In addition, the indexing scheme is tested on a large synthetic dataset. Our experiments show that speedups of over an order of magnitude can be achieved while maintaining high levels of accuracy. As a result of our work, it becomes possible to implement recommender systems, search engines and assistive applications for the fields that employ sequences of temporal intervals.
Similar content being viewed by others
Notes
For the remainder of this paper we will refer to nearest-neighbor retrieval accuracy (Papapetrou et al. 2011) as “retrieval accuracy”. For clarity, we note that in this paper this term is used within the context of nearest neighbor similarity search and not in the context of information retrieval. A formal definition is provided in Sect. 6.
References
Abraham T, Roddick JF (1999) Incremental meta-mining from large temporal data sets. In: ER ’98: Proceedings of the Workshops on Data Warehousing and Data Mining, pp 1–37
Ale JM, Rossi GH (2000) An approach to discovering temporal association rules. In: Proceedings of the ACM Symposium On Applied Computing, pp 294–300
Allen JF (1983) Maintaining knowledge about temporal intervals. Commun ACM 26(11):832–843
Athitsos V, Hadjieleftheriou M, Kollios G, Sclaroff S (2007) Query-sensitive embeddings. ACM Trans Database Syst 32(2). doi:10.1145/1242524.1242525
Batal I, Sacchi L, Bellazzi R, Hauskrecht M (2009) Multivariate time series classification with temporal abstractions. In: FLAIRS
Batal I, Fradkin D, Harrison J, Moerchen F, Hauskrecht M (2012) Mining recent temporal patterns for event detection in multivariate time series data. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, pp 280–288
Batal I, Valizadegan H, Cooper GF, Hauskrecht M (2013) A temporal pattern mining approach for classifying electronic health record data. ACM Trans Intell Syst Technol 4(4):63:1–63:22
Bentley JL, Friedman JH (1979) Data structures for range searching. ACM Comput Surv 11(4):397–409. doi:10.1145/356789.356797
Berendt B (1996) Explaining preferred mental models in Allen inferences with a metrical model of imagery. In: Proceedings of the Conference of the Cognitive Science Society, pp 489–494
Bergen B, Chang N (2005) Embodied construction grammar in simulation-based language understanding. In: Construction grammars: cognitive grounding and theoretical extensions, vol 3, pp 147–190
Bunke H (2000) Recent developments in graph matching. In: IEEE 15th International Conference on Pattern Recognition, vol 2, pp 117–124
Burrows M, Wheeler DJ (1994) A block-sorting lossless data compression algorithm. Tech. Rep. 124, Systems Research Center, Palo Alto. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.6774
Chen X, Petrounias I (1999) Mining temporal features in association rules. In: Proceedings of the 3rd European Conference on Principles and Practice of Knowledge Discovery in Databases, Springer, pp 295–300
Chen L, Ng R (2004) On the marriage of \(l_p\)-norms and edit distance. In: VLDB, pp 792–803
Chen L, Özsu MT (2005) Robust and fast similarity search for moving object trajectories. In: SIGMOD, pp 491–502
Chen YC, Jiang JC, Peng WC, Lee SY (2010) An efficient algorithm for mining time interval-based patterns in large database. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM ’10, pp 49–58
Chen YC, Peng WC, Le SY (2011) CEMiner- an effcient algorithms for mining closed patterns from interval-based data. In: Proceedings of the IEEE International Conference on Data Mining (ICDM)
Chen YC, Weng JTY, Hui L (2015) A novel algorithm for mining closed temporal patterns from interval-based data. KAIS 46(1):151–183
Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. In: Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, ACM, New York, NY, USA, SIGMOD ’94, pp 419–429
Finkel RA, Bentley JL (1974) Quad trees: a data structure for retrieval on composite keys. Acta Inf 4:1–9. doi:10.1007/BF00288933
Fradkin D, Mörchen F (2015) Mining sequential patterns for classification. Knowl Inf Syst 45(3):731–749
Gaede V, Günther O (1998) Multidimensional access methods. ACM Comput Surv 30(2):170–231
Giannotti F, Nanni M, Pedreschi D (2006) Efficient mining of temporally annotated sequences. In: Proceedings of the 6th SIAM Data Mining Conference, vol 124, pp 348–359
Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: Proceedings of the 25th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, VLDB ’99, pp 518–529. http://dl.acm.org/citation.cfm?id=645925.671516
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, ACM, New York, NY, USA, SIGMOD ’84, pp 47–57. doi:10.1145/602259.602266
Han TS, Ko SK, Kang J (2007) Efficient subsequence matching using the longest common subsequence with a dual match index. In: International Workshop on Machine Learning and Data Mining in Pattern Recognition, Springer, pp 585–600
Hjaltason G, Samet H (2003) Properties of embedding methods for similarity searching in metric spaces. IEEE Trans Pattern Anal Mach Intell 25(5):530–549
Höppner F (2001) Discovery of temporal patterns: learning rules about the qualitative behaviour of time series. In: Proceedings of the European Conference on Principles of Knowledge Discovery in Databases, pp 192–203
Höppner F, Klawonn F (2001) Finding informative rules in interval sequences. In: Proceedings of the International Symposium on Advances in Intelligent Data Analysis, pp 123–132
Hwang SY, Wei CP, Yang WS (2004) Discovery of temporal patterns from process instances. Comput Ind 53(3):345–364
Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust Speech and Signal Process 23(1):67–72
Kam P, Fu AW (2000) Discovering temporal patterns for interval-based events. In: Proceedings of the 2nd International Conference on Data Warehousing and Knowledge Discovery, pp 317–326
Keogh E (2002) Exact indexing of dynamic time warping. In: Proceedings of the 28th International Conference on Very Large Data Bases (VLDB), pp 406–417
Klimov D, Shknevsky A, Shahar Y (2015) Exploration of patterns predicting renal damage in patients with diabetes type II using a visual temporal analysis laboratory. J Am Med Inform Assoc 22(2):275–289
Kosara R, Miksch S (2001) Visualizing complex notions of time. Stud Health Technol Inform 1:211–215
Kostakis O (2014) Classy: fast clustering streams of call-graphs. Data Min Knowl Discov 28(5–6):1554–1585
Kostakis O, Gionis A (2015) Subsequence search in event-interval sequences. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp 851–854
Kostakis O, Papapetrou P (2015) Finding the longest common sub-pattern in sequences of temporal intervals. Data Min Knowl Discov 29(5):1178–1210
Kostakis O, Papapetrou P, Hollmén J (2011a) Artemis: assessing the similarity of event-interval sequences. In: Proceedings of the Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD 2011), pp 229–244
Kostakis O, Papapetrou P, Hollmén J (2011b) Distance measure for querying arrangements of temporal intervals. In: Proceedings of Pervasive Technologies Related to Assistive Environments
Kotsifakos A, Papapetrou P, Athitsos V (2013) IBSM: Interval-based sequence matching. In: Proceedings of SIAM Conference on Data Mining, pp 596–604
Kruskall JB, Liberman M (1983) The symmetric time warping algorithm: from continuous to discrete. In: Time warps, Addison-Wesley
Laxman S, Sastry P, Unnikrishnan K (2007) Discovering frequent generalized episodes when events persist for different durations. IEEE Trans Knowl Data Eng 19(9):1188–1201
Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys 10(8):707–710
Li C, Lu J, Lu Y (2008) Efficient merging and filtering algorithms for approximate string searches. In: International Conference on data Engineering (ICDE)
Li Y, Patel JM, Terrell A (2012) Wham: a high-throughput sequence alignment method. ACM Trans Database Syst (TODS) 37(4):28
Lin JL (2003) Mining maximal frequent intervals. In: Proceedings of the ACM Symposium On Applied Computing, pp 624–629
Maier D (1978) The complexity of some problems on subse- quences and supersequences. J ACM 25(2):322–336
Mooney C, Roddick JF (2004) Mining relationships between interacting episodes. In: Proceedings of the 4th SIAM International Conference on Data Mining
Mörchen F (2007) Unsupervised pattern mining from symbolic temporal data. SIGKDD Explor Newsl 9:41–55
Mörchen F (2010) Temporal pattern mining in symbolic time point and time interval data. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, ACM, KDD ’10, pp 2:1–2:1
Mörchen F, Fradkin D (2010) Robust mining of time intervals with semi-interval partial order patterns. In: Proceedings of the SIAM International Conference on Data Mining, pp 315–326
Moskovitch R, Shahar Y (2009) Medical temporal-knowledge discovery via temporal abstraction. Proceedings of the AMIA Annual Symposium 2009:452–456
Moskovitch R, Shahar Y (2014a) Classification-driven temporal discretization of multivariate time series. Data Min Knowl Discov 29(4):871–913
Moskovitch R, Shahar Y (2014b) Classification of multivariate time series via temporal abstraction and time intervals mining. Knowl Inf Syst 45(1):35–74
Moskovitch R, Shahar Y (2015) Fast time intervals mining using the transitivity of temporal relations. Knowl Inf Syst 42(1):21–48
Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1):32–38
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453
Orlandic R, Yu B (2002) A retrieval technique for high-dimensional data and partially specified queries. Data Knowl Eng 42(1):1–21. doi:10.1016/S0169-023X(02)00023-X
Pachet F, Ramalho G, Carrive J (1996) Representing temporal musical objects and reasoning in the MusES system. J New Music Res 25(3):252–275
Papapetrou P, Kollios G, Sclaroff S, Gunopulos D (2005) Discovering frequent arrangements of temporal intervals. In: Proceedings of IEEE International Conference on Data Mining, pp 354–361
Papapetrou P, Benson G, Kollios G (2006) Discovering frequent poly-regions in DNA sequences. In: Proceedings of the IEEE ICDM Workshop on Data Mining in Bioinformatics
Papapetrou P, Athitsos V, Kollios G, Gunopulos D (2009a) Reference-based alignment in large sequence databases. Proc VLDB Endow 2(1):205–216
Papapetrou P, Kollios G, Sclaroff S, Gunopulos D (2009b) Mining frequent arrangements of temporal intervals. Knowl Inf Syst 21:133–171
Papapetrou P, Athitsos V, Potamias M, Kollios G, Gunopulos D (2011) Embedding-based subsequence matching in time-series databases. ACM Trans Database Syst 36(3):17:1–17:39
Patel D, Hsu W, Lee M (2008) Mining relationships among interval-based events for classification. In: Proceedings of the 28th ACM SIGMOD International Conference on Management of Data, ACM, pp 393–404
Pissinou N, Radev I, Makki K (2001) Spatio-temporal modeling in video and multimedia geographic information systems. GeoInformatica 5(4):375–409
Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, pp 262–270
Sacchi L, Larizza C, Combi C, Bellazzi R (2007) Data mining with temporal abstractions: learning rules from time series. Data Min Knowl Discov 15(2):217–247
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. Trans ASSP 26:43–49
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197
Umeyama S (1988) An eigendecomposition approach to weighted graph matching problems. IEEE Trans Pattern Anal Mach Intell 10(5):695–703
Venkateswaran J, Lachwani D, Kahveci T, Jermaine C (2006) Reference-based indexing of sequence databases. In: International Conference on Very Large Databases (VLDB), pp 906–917
Villafane R, Hua KA, Tran D, Maulik B (2000) Knowledge discovery from series of interval events. Intell Inf Syst 15(1):71–89
Weber R, Schek HJ, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of the 24rd International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, VLDB ’98, pp 194–205. http://dl.acm.org/citation.cfm?id=645924.671192
Winarko E, Roddick JF (2007) Armada: an algorithm for discovering richer relative temporal association rules from interval-based data. Data Knowl Eng 63(1):76–90
Wu SY, Chen YL (2007) Mining nonambiguous temporal patterns for interval-based events. IEEE Trans Knowl Data Eng 19(6):742–758
Yang X, Wang B, Li C (2008) Cost-based variable-length-gram selection for string collections to support approximate queries efficiently. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, ACM, pp 353–364
Yi BK, Roh JW (2004) Similarity search for interval time sequences. In: International Conference on Database Systems for Advanced Applications, Springer, pp 232–243
Acknowledgements
The work of Orestis Kostakis was supported in party by the Helsinki Doctoral Education Network in Information and Communications Technology (HICT). The work of Panagiotis Papapetrou was supported in part by the Stockholm City Council (Stockholms Läns Landsting).
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Eamonn Keogh.
Rights and permissions
About this article
Cite this article
Kostakis, O., Papapetrou, P. On searching and indexing sequences of temporal intervals. Data Min Knowl Disc 31, 809–850 (2017). https://doi.org/10.1007/s10618-016-0489-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-016-0489-3