Advertisement

Evolution of a Data Series Index

The iSAX Family of Data Series Indexes: iSAX, iSAX2.0, iSAX2+, ADS, ADS+, ADS-Full, ParIS, ParIS+, MESSI, DPiSAX, ULISSE, Coconut-Trie/Tree, Coconut-LSM
  • Themis PalpanasEmail author
Conference paper
  • 12 Downloads
Part of the Communications in Computer and Information Science book series (CCIS, volume 1197)

Abstract

There is an increasingly pressing need, by several applications in diverse domains, for developing techniques able to index and mine very large collections of sequences, or data series. It is not unusual for these applications to involve numbers of data series in the order of billions, which are often times not analyzed in their full detail due to their sheer size. In this work, we describe techniques for indexing and efficient similarity search in truly massive collections of data series, focusing on the iSAX family of data series indexes. We present their design characteristics, and describe their evolution to address different needs: bulk loading, adaptive indexing, parallelism and distribution, variable-length query answering, and bottom-up indexing. Based on this discussion, we conclude by presenting promising research directions.

Keywords

Data series Time series Sequences Indexing Analytics 

Notes

Acknowledgements

I would like to thank my collaborators (in alphabetical order): R. Akbarinia, H. Benbrahim, A. Bezerianos, A. Camerra, M. Dallachiesa, N. Dayan, K. Echihabi, A. Gogolou, P. Fatourou, J. Gehrke, S. Idreos, I. Ilyas, E. Keogh, B. Kolev, H. Kondylakis, O. Levchenko, M. Linardi, Y. Lou, F. Masseglia, K. Mirylenka, B. Nushi, B. Peng, T. Rakthanmanon, D. Shasha, J. Shieh, T. Tsandilas, P. Valduriez, and D.-E. Yagoubi. Special thanks go to K. Zoumpatianos.

References

  1. 1.
    ADHD-200 (2011). http://fcon\(\_\)1000.projects.nitrc.org/indi/adhd200/Google Scholar
  2. 2.
    Sloan Digital Sky Survey (2015). https://www.sdss3.org/dr10/data_access/volume.php
  3. 3.
    Agrawal, R., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730, pp. 69–84. Springer, Heidelberg (1993).  https://doi.org/10.1007/3-540-57301-1_5CrossRefGoogle Scholar
  4. 4.
    An, N., Kothuri, R.K.V., Ravada, S.: Improving performance with bulk-inserts in Oracle R-trees. In: VLDB, pp. 948–951. VLDB Endowment (2003)Google Scholar
  5. 5.
    Assent, I., Krieger, R., Afschari, F., Seidl, T.: The TS-tree: efficient time series search and retrieval. In: EDBT (2008)Google Scholar
  6. 6.
    Aßfalg, J., Kriegel, H.-P., Kröger, P., Kunath, P., Pryakhin, A., Renz, M.: Similarity search on time series based on threshold queries. In: Ioannidis, Y., et al. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 276–294. Springer, Heidelberg (2006).  https://doi.org/10.1007/11687238_19CrossRefGoogle Scholar
  7. 7.
    Bagnall, A.J., Cole, R.L., Palpanas, T., Zoumpatianos, K.: Data series management (Dagstuhl seminar 19282). Dagstuhl Rep. 9(7), 24–39 (2019)Google Scholar
  8. 8.
    Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Berchtold, S., Keim, D.A., Kriegel, H.-P.: The X-tree: an index structure for high-dimensional data. In: VLDB, pp. 28–39 (1996)Google Scholar
  10. 10.
    Berndt, D.J, Clifford, J.: Using dynamic time warping to find patterns in time series. In: AAAIWS, pp. 359–370 (1994)Google Scholar
  11. 11.
    Bu, Y., Leung, T.W., Fu, A.W.C., Keogh, E., Pei, J., Meshkin, S.: WAT: finding top-k discords in time series database. In: SDM, pp. 449–454 (2007)Google Scholar
  12. 12.
    Camerra, A., Palpanas, T., Shieh, J., Keogh, E.: iSAX 2.0: indexing and mining one billion time series. In: ICDM (2010)Google Scholar
  13. 13.
    Camerra, A., Shieh, J., Palpanas, T., Rakthanmanon, T., Keogh, E.J.: Beyond one billion time series: indexing and mining very large time series collections with iSAX2+. KAIS 39(1), 123–151 (2014).  https://doi.org/10.1007/s10115-012-0606-6CrossRefGoogle Scholar
  14. 14.
    Chakrabarti, K., Keogh, E., Mehrotra, S.: Locally adaptive dimensionality reduction for indexing large time series databases. ACM Trans. Database Syst. (TODS) 27(2), 188–228 (2002)CrossRefGoogle Scholar
  15. 15.
    Chakrabarti, K., Keogh, E., Mehrotra, S., Pazzani, M.: Locally adaptive dimensionality reduction for indexing large time series databases. In: SIGMOD (2002)Google Scholar
  16. 16.
    Chan, K.-P., Fu, A.-C.: Efficient time series matching by wavelets. In: ICDE (1999)Google Scholar
  17. 17.
    Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 1–58 (2009)CrossRefGoogle Scholar
  18. 18.
    Chen, Y., Nascimento, M.A., Ooi, B.C., Tung, A.K.H.: SpADe: on shape-based pattern detection in streaming time series. In: ICDE (2007)Google Scholar
  19. 19.
    Dallachiesa, M., Nushi, B., Mirylenka, K., Palpanas, T.: Uncertain time-series similarity: return to the basics. PVLDB 5(11), 1662–1673 (2012)Google Scholar
  20. 20.
    Dallachiesa, M., Palpanas, T., Ilyas, I.F.: Top-k nearest neighbor search in uncertain data series. PVLDB 8(1), 13–24 (2014)Google Scholar
  21. 21.
    Das, G., Gunopulos, D., Mannila, H.: Finding similar time series. In: Komorowski, J., Zytkow, J. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 88–100. Springer, Heidelberg (1997).  https://doi.org/10.1007/3-540-63223-9_109CrossRefGoogle Scholar
  22. 22.
    Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., Keogh, E.: Querying and mining of time series data: experimental comparison of representations and distance measures. In: PVLDB (2008)Google Scholar
  23. 23.
    Echihabi, K., Zoumpatianos, K., Palpanas, T., Benbrahim, H.: The Lernaean Hydra of data series similarity search: an experimental evaluation of the state of the art. PVLDB 12(2), 112–127 (2018)Google Scholar
  24. 24.
    Echihabi, K., Zoumpatianos, K., Palpanas, T., Benbrahim, H.: Return of the Lernaean Hydra: experimental evaluation of data series approximate similarity search. PVLDB 13, 403–420 (2019)Google Scholar
  25. 25.
    Soisalon-Soininen, E., Widmayer, P.: Single and bulk updates in stratified trees: an amortized andworst-case analysis. In: Klein, R., Six, H.-W., Wegner, L. (eds.) Computer Science in Perspective. LNCS, vol. 2598, pp. 278–292. Springer, Heidelberg (2003).  https://doi.org/10.1007/3-540-36477-3_21CrossRefzbMATHGoogle Scholar
  26. 26.
    Fekete, J.-D., Primet, R.: Progressive analytics: a computation paradigm for exploratory data analysis. CoRR (2016)Google Scholar
  27. 27.
    Gogolou, A., Tsandilas, T., Palpanas, T., Bezerianos, A.: Comparing similarity perception in time series visualizations. IEEE TVCS 25(1), 523–533 (2019)Google Scholar
  28. 28.
    Gogolou, A., Tsandilas, T., Palpanas, T., Bezerianos, A.: Progressive similarity search on time series data. In: Workshops of the EDBT/ICDT (2019)Google Scholar
  29. 29.
    Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: SIGMOD (1984)Google Scholar
  30. 30.
    Huijse, P., Estévez, P.A., Protopapas, P., Principe, J.C., Zegers, P.: Computational intelligence challenges and applications on large-scale astronomical time series databases. IEEE Comput. Int. Mag. 9(3), 27–39 (2014)CrossRefGoogle Scholar
  31. 31.
    Jensen, S.K., Pedersen, T.B., Thomsen, C.: Time series management systems: a survey. IEEE Trans. Knowl. Data Eng. 29(11), 2581–2600 (2017)CrossRefGoogle Scholar
  32. 32.
    Seeger, B., Van den Bercken, J.: An evaluation of generic bulk loading techniques. In: VLDB, pp. 461–470 (2001)Google Scholar
  33. 33.
    Widmayer, P., Van den Bercken, J., Seeger, B.: A generic approach to bulk loading multidimensional index structures. In: VLDB (1997)Google Scholar
  34. 34.
    Kashino, K., Smith, G., Murase, H.: Time-series active search for quick retrieval of audio and video. In: ICASSP (1999)Google Scholar
  35. 35.
    Kashyap, S., Karras, P.: Scalable KNN search on vertically stored time series. In: KDD (2011)Google Scholar
  36. 36.
    Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. KAIS 3(3), 263–286 (2000).  https://doi.org/10.1007/PL00011669CrossRefzbMATHGoogle Scholar
  37. 37.
    Keogh, E.J., Palpanas, T., Zordan, V.B., Gunopulos, D., Cardle, M.: Indexing large human-motion databases. In: VLDB, pp. 780–791 (2004)Google Scholar
  38. 38.
    Kondylakis, H., Dayan, N., Zoumpatianos, K., Palpanas, T.: Coconut: a scalable bottom-up approach for building data series indexes. In: PVLDB (2018)Google Scholar
  39. 39.
    Kondylakis, H., Dayan, N., Zoumpatianos, K., Palpanas, T.: Coconut palm: static and streaming data series exploration now in your palm. In: SIGMOD, pp. 1941–1944 (2019)Google Scholar
  40. 40.
    Kondylakis, H., Dayan, N., Zoumpatianos, K., Palpanas, T.: Coconut: sortable summarizations for scalable indexes over static and streaming data series. VLDBJ 28, 847–869 (2019).  https://doi.org/10.1007/s00778-019-00573-wCrossRefGoogle Scholar
  41. 41.
    Arge, L., Hinrichs, K., Vahrenhold, J., et al.: Efficient bulk operations on dynamic R-trees. Algorithmica 33(1), 104–128 (2002).  https://doi.org/10.1007/s00453-001-0107-6MathSciNetCrossRefzbMATHGoogle Scholar
  42. 42.
    Levchenko, O., et al.: Distributed algorithms to find similar time series. In: ECML/PKDD (2019)Google Scholar
  43. 43.
    Li, C.-S., Yu, P., Castelli, V.: HierarchyScan: a hierarchical similarity search algorithm for databases of long sequences. In: ICDE (1996)Google Scholar
  44. 44.
    Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. In: DMKD (2003)Google Scholar
  45. 45.
    Linardi, M., Palpanas, T.: Scalable, variable-length similarity search in data series: the ULISSE approach. PVLDB 11(13), 2236–2248 (2018)Google Scholar
  46. 46.
    Linardi, M., Palpanas, T.: ULISSE: ULtra compact index for variable-length similarity SEarch in data series. In: ICDE (2018)Google Scholar
  47. 47.
    Linardi, M., Palpanas, T.: Scalable data series subsequence matching with ULISSE. Technical Report (2020)Google Scholar
  48. 48.
    Linardi, M., Zhu, Y., Palpanas, T., Keogh, E.J.: Matrix profile X: VALMOD - scalable discovery of variable-length motifs in data series (2018)Google Scholar
  49. 49.
    Linardi, M., Zhu, Y., Palpanas, T., Keogh, E.J.: VALMOD: a suite for easy and exact detection of variable length motifs in data series. In: SIGMOD (2018)Google Scholar
  50. 50.
    Mirylenka, K., Dallachiesa, M., Palpanas, T.: Correlation-aware distance measures for data series. In: EDBT, pp. 502–505 (2017)Google Scholar
  51. 51.
    Mirylenka, K., Dallachiesa, M., Palpanas, T.: Data series similarity using correlation-aware measures. In: SSDBM (2017)Google Scholar
  52. 52.
    Morton, G.M.: A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing. International Business Machines Company, Ottawa (1966)Google Scholar
  53. 53.
    Palpanas, T.: Data series management: the road to big sequence analytics. SIGMOD Rec. 44, 47–52 (2015)CrossRefGoogle Scholar
  54. 54.
    Palpanas, T.: Big sequence management: a glimpse of the past, the present, and the future. In: Freivalds, R.M., Engels, G., Catania, B. (eds.) SOFSEM 2016. LNCS, vol. 9587, pp. 63–80. Springer, Heidelberg (2016).  https://doi.org/10.1007/978-3-662-49192-8_6CrossRefGoogle Scholar
  55. 55.
    Palpanas, T.: The parallel and distributed future of data series mining. In: High Performance Computing & Simulation (HPCS) (2017)Google Scholar
  56. 56.
    Palpanas, T., Beckmann, V.: Report on the first and second interdisciplinary time series analysis workshop (ITISA). ACM SIGMOD Rec. 48(3), 36–40 (2019)CrossRefGoogle Scholar
  57. 57.
    Palpanas, T., Vlachos, M., Keogh, E.J., Gunopulos, D.: Streaming time series summarization using user-defined amnesic functions. IEEE Trans. Knowl. Data Eng. 20(7), 992–1006 (2008)CrossRefGoogle Scholar
  58. 58.
    Peng, B., Fatourou, P., Palpanas, T.: Paris: the next destination for fast data series indexing and query answering. In: IEEE BigData, pp. 791–800 (2018)Google Scholar
  59. 59.
    Peng, B., Fatourou, P., Palpanas, T.: MESSI: in-memory data series indexing. In: ICDE (2020)Google Scholar
  60. 60.
    Peng, B., Fatourou, P., Palpanas, T.: Paris+: data series indexing on multi-core architectures. In: TKDE (2020)Google Scholar
  61. 61.
    Rafiei, D.: On similarity-based queries for time series data. In: ICDE (1999)Google Scholar
  62. 62.
    Rafiei, D., Mendelzon, A.: Similarity-based queries for time series data. In: SIGMOD (1997)Google Scholar
  63. 63.
    Rakthanmanon, T.: Searching and mining trillions of time series subsequences under dynamic time warping. In: KDD (2012)Google Scholar
  64. 64.
    Raza, U., Camerra, A., Murphy, A.L., Palpanas, T., Picco, G.P.: Practical data prediction for real-world wireless sensor networks. TKDE 27(8), 2231–2244 (2015)Google Scholar
  65. 65.
    Choubey, R., Chen, L., Rundensteiner, E.A.: GBI: a generalized R-tree bulk-insertion strategy. In: Güting, R.H., Papadias, D., Lochovsky, F. (eds.) SSD 1999. LNCS, vol. 1651, pp. 91–108. Springer, Heidelberg (1999).  https://doi.org/10.1007/3-540-48482-5_8CrossRefGoogle Scholar
  66. 66.
    Schäfer, P., Högqvist, M.: SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets. In: EDBT (2012)Google Scholar
  67. 67.
    Shasha, D.: Tuning time series queries in finance: case studies and recommendations. IEEE Data Eng. Bull. 22(2), 40–46 (1999)Google Scholar
  68. 68.
    Shieh, J., Keogh, E.: iSAX: indexing and mining terabyte sized time series. In: SIGKDD, pp. 623–631 (2008)Google Scholar
  69. 69.
    Shieh, J., Keogh, E.: iSAX: disk-aware mining and indexing of massive time series datasets. DMKD 19(1), 24–57 (2009).  https://doi.org/10.1007/s10618-009-0125-6CrossRefGoogle Scholar
  70. 70.
    Shieh, J., Keogh, E.J.: iSAX: indexing and mining terabyte sized time series. In: KDD, pp. 623–631 (2008)Google Scholar
  71. 71.
    Wang, X., Mueen, A., Ding, H., Trajcevski, G., Scheuermann, P., Keogh, E.: Experimental comparison of representation methods and distance measures for time series data. Data Min. Knowl. Discov. 26(2), 275–309 (2013)MathSciNetCrossRefGoogle Scholar
  72. 72.
    Wang, Y., Wang, P., Pei, J., Wang, W., Huang, S.: A data-adaptive and dynamic segmentation index for whole matching on time series. PVLDB 6(10), 793–804 (2013)Google Scholar
  73. 73.
    Liao, T.W.: Clustering of time series data - a survey. Pattern Recogn. 38(11), 1857–1874 (2005)CrossRefGoogle Scholar
  74. 74.
    Yagoubi, D.-E., Akbarinia, R., Masseglia, F., Palpanas, T.: DPiSAX: massively distributed partitioned iSAX. In: ICDM (2017)Google Scholar
  75. 75.
    Yagoubi, D.-E., Akbarinia, R., Masseglia, F., Palpanas, T.: Massively distributed time series indexing and querying. TKDE 32(1), 108–120 (2020)Google Scholar
  76. 76.
    Ye, L., Keogh, E.J.: Time series shapelets: a new primitive for data mining. In: KDD (2009)Google Scholar
  77. 77.
    Yi, B., Faloutsos, C.: Fast time sequence indexing for arbitrary Lp norms. In: VLDB (2000)Google Scholar
  78. 78.
    Zoumpatianos, K., Idreos, S., Palpanas, T.: Indexing for interactive exploration of big data series. In: SIGMOD (2014)Google Scholar
  79. 79.
    Zoumpatianos, K., Idreos, S., Palpanas, T.: RINSE: interactive data series exploration with ADS+. PVLDB 8(12), 1912–1923 (2015)Google Scholar
  80. 80.
    Zoumpatianos, K., Idreos, S., Palpanas, T.: ADS: the adaptive data series index. VLDB J. 25, 843–866 (2016).  https://doi.org/10.1007/s00778-016-0442-5CrossRefGoogle Scholar
  81. 81.
    Zoumpatianos, K., Lou, Y., Ileana, I., Palpanas, T., Gehrke, J.: Generating data series query workloads. VLDB J. 27(6), 823–846 (2018).  https://doi.org/10.1007/s00778-018-0513-xCrossRefGoogle Scholar
  82. 82.
    Zoumpatianos, K., Lou, Y., Palpanas, T., Gehrke, J.: Query workloads for data series indexes. In: KDD (2015)Google Scholar
  83. 83.
    Zoumpatianos, K., Palpanas, T.: Data series management: fulfilling the need for big sequence analytics. In: ICDE (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.University of ParisParisFrance

Personalised recommendations