Advertisement

HIME: discovering variable-length motifs in large-scale time series

  • Yifeng Gao
  • Jessica Lin
Regular Paper
  • 30 Downloads

Abstract

Detecting repeated variable-length patterns, also called variable-length motifs, has received a great amount of attention in recent years. The state-of-the-art algorithm utilizes a fixed-length motif discovery algorithm as a subroutine to enumerate variable-length motifs. As a result, it may take hours or days to execute when the enumeration range is large. In this work, we introduce an approximate algorithm called hierarchical-based motif enumeration (HIME) to detect variable-length motifs with a large enumeration range in million-scale time series. We show in the experiments that the scalability of the proposed algorithm is significantly better than that of the state-of-the-art algorithm. Moreover, the motif length range detected by HIME is considerably larger than previous sequence matching-based approximate variable-length motif discovery approach. We demonstrate that HIME can efficiently detect meaningful variable-length motifs in long, real-world time series.

Keywords

Time-series motif discovery Variable-length Scalable algorithm 

Notes

References

  1. 1.
    Begum N, Keogh E (2014) Rare time series motif discovery from unbounded streams. Proc VLDB Endow 8(2):149–160CrossRefGoogle Scholar
  2. 2.
    Buza K, Schmidt-Thieme L (2010) Motif-based classification of time series with Bayesian networks and SVMS. In: Fink A, Lausen B, Seidel W, Ultsch A (eds) Advances in data analysis, data handling and business intelligence. Springer, Berlin, pp 105–114Google Scholar
  3. 3.
    Castro N, Azevedo PJ (2010) Multiresolution motif discovery in time series. In: Proceedings of the 2010 SIAM international conference on data mining. SIAM, pp 665–676Google Scholar
  4. 4.
    Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 493–498Google Scholar
  5. 5.
    Gao Y, Li Q, Li X, Lin J, Rangwala H (2017) Trajviz: A tool for visualizing patterns and anomalies in trajectory. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp 428–431. SpringerGoogle Scholar
  6. 6.
    Gao Y, Lin J (2017) Efficient discovery of time series motifs with large length range in million scale time series. In: Data Mining (ICDM), 2017 IEEE International Conference on, pp 1213–1222. IEEEGoogle Scholar
  7. 7.
    Gao Y, Lin J (2018) Exploring variable-length time series motifs in one hundred million length scale. Data Min Knowl Discov 32(5):1200–1228MathSciNetCrossRefGoogle Scholar
  8. 8.
    Gao Y, Lin J, Rangwala H (2016) Iterative grammar-based framework for discovering variable-length time series motifs. In: 2016 15th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 7–12Google Scholar
  9. 9.
    Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng C-K, Stanley HE (2000) Physiobank, physiotoolkit, and physionet components of a new research resource for complex physiologic signals. Circulation 101(23):e215–e220CrossRefGoogle Scholar
  10. 10.
    Jessica Lin SL, Keogh E, Patel P (2002) Finding motifs in time series. In: Proceedings of the 2nd workshop on temporal data mining, pp 53–68Google Scholar
  11. 11.
    Keogh E, Lin J, Fu A (2005) Hot sax: efficiently finding the most unusual time series subsequence. In: 2005 IEEE 5th international conference on data mining (ICDM), p 8Google Scholar
  12. 12.
    Li Y, Lin J, Oates T (2012) Visualizing variable-length time series motifs. In: Proceedings of the 2012 SIAM international conference on data mining. SIAM, pp 895–906Google Scholar
  13. 13.
    Li Y, Yiu ML, Gong Z, et al (2015) Quick-motif: an efficient and scalable framework for exact motif discovery. In: 2015 IEEE 31st international conference on data engineering (ICDE). IEEE, pp 579–590Google Scholar
  14. 14.
    Lin J, Keogh E, Lonardi S, Lankford JP, Nystrom DM (2004) Visually mining and monitoring massive time series. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 460–469Google Scholar
  15. 15.
    Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144MathSciNetCrossRefGoogle Scholar
  16. 16.
    Linardi M, Zhu Y, Palpanas T, Keogh E (2018) Matrix profile x: Valmod-scalable discovery of variable-length motifs in data series. In: Proceedings of the 2018 international conference on management of data. ACM, pp 1053–1066Google Scholar
  17. 17.
    Lines J, Davis LM, Hills J, Bagnall A (2012) A shapelet transform for time series classification. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 289–297Google Scholar
  18. 18.
    Liu B, Li J, Chen C, Tan W, Chen Q, Zhou M (2015) Efficient motif discovery for large-scale time series in healthcare. IEEE Trans Ind Inf 11(3):583–590CrossRefGoogle Scholar
  19. 19.
    Maaten Lvd, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(Nov):2579–2605zbMATHGoogle Scholar
  20. 20.
    Meng J, Yuan J, Hans M, Wu Y (2008) Mining motifs from human motion. In: Proceedings of EUROGRAPHICS, vol 8Google Scholar
  21. 21.
    Minnen D, Starner T, Essa I, Isbell C (2006) Discovering characteristic actions from on-body sensor data. In: 2006 10th IEEE international symposium on wearable computers. IEEE, pp 11–18Google Scholar
  22. 22.
    Mohammad Y, Nishida T (2014) Exact discovery of length-range motifs. In: Intelligent information and database systems, pp 23–32. Springer, BerlinGoogle Scholar
  23. 23.
    Mohammad Y, Nishida T (2014) Scale invariant multi-length motif discovery. In: Modern advances in applied intelligence. Springer, Berlin, pp 417–426Google Scholar
  24. 24.
    Mueen A (2013) Enumeration of time series motifs of all lengths. In: 2013 IEEE 13th international conference on data mining (ICDM). IEEE, pp 547–556Google Scholar
  25. 25.
    Mueen A, Keogh E (2010) Online discovery and maintenance of time series motifs. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1089–1098Google Scholar
  26. 26.
    Mueen A, Keogh E, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1154–1162Google Scholar
  27. 27.
    Mueen A, Keogh EJ, Zhu Q, Cash S, Westover MB (2009) Exact discovery of time series motifs. In: Proceedings of the 2009 SIAM international conference on data mining. SIAM, pp 473–484Google Scholar
  28. 28.
    Mueen A, Zhu Y, Yeh M, Kamgar K, Viswanathan K, Gupta C, Keogh E (2015) The fastest similarity search algorithm for time series subsequences under Euclidean distance. http://www.cs.unm.edu/~mueen/FastestSimilaritySearch.html
  29. 29.
    Murray D, Liao J, Stankovic L, Stankovic V, Hauxwell-Baldwin R, Wilson C, Coleman M, Kane T, Firth S (2015) A data management platform for personalised real-time energy feedback. In: Proceedings of the 8th international conference on energy efficiency in domestic appliances and lighting, pp 1–15Google Scholar
  30. 30.
    Nevill-Manning CG, Witten IH (1997) Identifying hierarchical structure in sequences: a linear-time algorithm. J Artif Intell Res (JAIR) 7:67–82CrossRefGoogle Scholar
  31. 31.
    Nunthanid P, Niennattrakul V, Ratanamahatana CA (2011) Discovery of variable length time series motif. In: 2011 8th international conference on electrical engineering/electronics, computer, telecommunications and information technology (ECTI-CON). IEEE, pp 472–475Google Scholar
  32. 32.
    Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 262–270Google Scholar
  33. 33.
    Senin P, Lin J, Wang X, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S, Lerner M (2014) Grammarviz 2.0: a tool for grammar-based pattern discovery in time series. In: Machine learning and knowledge discovery in databases. Springer, Berlin, pp 468–472Google Scholar
  34. 34.
    Shokoohi-Yekta M, Chen Y, Campana B, Hu B, Zakaria J, Keogh E (2015) Discovery of meaningful rules in time series. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1085–1094Google Scholar
  35. 35.
    Skaletsky H, Kuroda-Kawaguchi T, Minx PJ, Cordum HS, Hillier L, Brown LG, Repping S, Pyntikova T, Ali J, Bieri T et al (2003) The male-specific region of the human y chromosome is a mosaic of discrete sequence classes. Nature 423(6942):825–837CrossRefGoogle Scholar
  36. 36.
    Tang H, Liao SS (2008) Discovering original motifs with different lengths from time series. Knowl Based Syst 21(7):666–671CrossRefGoogle Scholar
  37. 37.
    Wang X, Lin J, Senin P, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S (2016) Rpm: representative pattern mining for efficient time series classification. In: 19th international conference on extending database technology (EDBT), pp 185–196Google Scholar
  38. 38.
    Bob P, Willem-Pier V, Sander P, Jonathon J (2005) Xeno-Canto. www.xeno-canto.org. Accessed 30 May 2005
  39. 39.
    Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 947–956Google Scholar
  40. 40.
    Yeh C-CM, Zhu Y, Ulanova L, Begum N, Ding Y, Dau HA, Silva DF, Mueen A, Keogh E (2016) Matrix profile i: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: 2016 IEEE 16th international conference on data mining (ICDM), pp 1317–1322Google Scholar
  41. 41.
    Zhang X, Zhao L, Boedihardjo AP, Lu C-T, Ramakrishnan N (2017) Spatiotemporal event forecasting from incomplete hyper-local price data. In: Proceedings of the 2017 ACM on conference on information and knowledge management. ACM, pp 507–516Google Scholar
  42. 42.
    Zhu Y, Schall-Zimmerman Z, Senobari NS, Yeh C-CM, Funning G, Mueen A, Brisk P, Keogh EJ (2016) Matrix profile II: exploiting a novel algorithm and gpus to break the one hundred million barrier for time series motifs and joins. In: 2016 IEEE 16th international conference on data mining (ICDM), pp 739–748Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  1. 1.George Mason UniversityFairfaxUSA

Personalised recommendations