Similarity Analysis of Time Interval Data Sets—A Graph Theory Approach

  • Marc HaßlerEmail author
  • Christian Kohlschein
  • Tobias MeisenEmail author
Conference paper
Part of the Contributions to Statistics book series (CONTRIB.STAT.)


Comparison of entities, i.e., the measurement of their similarity, is a frequent, but challenging task in computer science. It requires a precise and quantifiable definition of similarity itself. Are two texts equal, if they overlap in a majority of their composing words? Does a pair of pictures resemble the same content? What defines the sameness of two songs? While certain distance-based approaches, e.g., Minkowski, make for a good starting point in defining similarity, there is no one-size-fits-all approach. In this work, we tackle a particularly interesting problem, namely, the definition of a similarity measure for comparing time interval data sets. Our approach regards the data sets as disjoint parts of a bigraph, thereby allowing for an application of methods from graph theory. We present both a formal definition of the similarity of two time intervals and our methods as well as concrete use-case from the medical domain, thus demonstrating the applicability for real-world scenarios.


Graph theory Time interval data set Similarity analysis Medical data analysis Distance measures 


  1. 1.
    Jayal, A., Badurdeen, F., Dillon, O., Jawahir, I.: Sustainable manufacturing: modeling and optimization challenges at the product, process and system levels. CIRP J. Manuf. Sci. Technol. 2(3), 144–152 (2010)CrossRefGoogle Scholar
  2. 2.
    Tanomaru, J.: Staff scheduling by a genetic algorithm with heuristic operators. In: IEEE International Conference on Systems, Man and Cybernetics. Intelligent Systems for the 21st Century, vol. 3, pp. 1951–1956. IEEE (1995)Google Scholar
  3. 3.
    American Academy of Sleep Medicine: Other: the AASM manual for the scoring of sleep and associated events: rules. AASM, Terminology and Technical Specifications. Westchester (2007)Google Scholar
  4. 4.
    Metzler, D., Dumais, S., Meek, C.: Similarity measures for short segments of text. In: European Conference on Information Retrieval, pp. 16–27. Springer (2007)Google Scholar
  5. 5.
    Yu, G., Li, F., Qin, Y., Bo, X., Wu, Y., Wang, S.: Gosemsim: an r package for measuring semantic similarity among go terms and gene products. Bioinformatics 26(7), 976–978 (2010)CrossRefGoogle Scholar
  6. 6.
    Ogata, H., Fujibuchi, W., Goto, S., Kanehisa, M.: A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters. Nucleic Acids Res. 28(20), 4021–4028 (2000)CrossRefGoogle Scholar
  7. 7.
    Wang, A., et al.: An industrial strength audio search algorithm. In: Ismir, pp. 7–13. Washington, DC (2003)Google Scholar
  8. 8.
    Kostakis, O., Papapetrou, P., Hollmén, J.: Artemis: assessing the similarity of event-interval sequence. Mach. Learn. Knowl. Discov. Datab. 229–244 (2011)Google Scholar
  9. 9.
    Meisen, P., Keng, D., Meisen, T., Recchioni, M., Jeschke, S.: Similarity search of bounded tidasets within large time interval databases. In: 2015 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 24–29. IEEE (2015)Google Scholar
  10. 10.
    Kruskall, J., Liberman, M.: The symmetric time warping algorithm: from continuous to discrete. Time warps, string edits and macromolecules (1983)Google Scholar
  11. 11.
    Chen, Y.L., Chiang, M.C., Ko, M.T.: Discovering time-interval sequential patterns in sequence databases. Expert Syst. Appl. 25(3), 343–354 (2003)CrossRefGoogle Scholar
  12. 12.
    Sadasivam, R., Duraiswamy, K.: Efficient approach to discover interval-based sequential patterns. J. Comput. Sci. 9(2), 225–234 (2013)CrossRefGoogle Scholar
  13. 13.
    Koncilia, C., Morzy, T., Wrembel, R., Eder, J.: Interval OLAP: analyzing interval data. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 233–244. Springer (2014)Google Scholar
  14. 14.
    Munkres, J.: Algorithms for the assignment and transportation problems. J. Soc. Industr. Appl. Math. 5(1), 32–38 (1957)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Haßler, M., Jeschke, S., Meisen, T.: Similarity analysis of time interval data sets regarding time shifts and rescaling. In: International Work-Conference on Time Series Analysis, vol. 02, pp. 995–1006 (2017)Google Scholar
  16. 16.
    Rechtschaffen, A., Kales, A.: A manual of standardized terminology, techniques, and scoring systems for sleep stages of human subjects (1968)Google Scholar
  17. 17.
    Bourgeois, F., Lassalle, J.C.: An extension of the munkres algorithm for the assignment problem to rectangular matrices. Commun. ACM 14(12), 802–804 (1971)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Institute of Information Management in Mechanical Engineering, RWTH Aachen UniversityAachenGermany

Personalised recommendations