Advertisement

Accelerating pattern-based time series classification: a linear time and space string mining approach

  • Atif RazaEmail author
  • Stefan Kramer
Regular Paper

Abstract

Subsequences-based time series classification algorithms provide interpretable and generally more accurate classification models compared to the nearest neighbor approach, albeit at a considerably higher computational cost. A number of discretized time series-based algorithms have been proposed to reduce the computational complexity of these algorithms; however, the asymptotic time complexity of the proposed algorithms is also cubic or higher-order polynomial. We present a remarkably fast and resource-efficient time series classification approach which employs a linear time and space string mining algorithm for extracting frequent patterns from discretized time series data. Compared to other subsequence or pattern-based classification algorithms, the proposed approach only requires a few parameters, which can be chosen arbitrarily and do not require any fine-tuning for different datasets. The time series data are discretized using symbolic aggregate approximation, and frequent patterns are extracted using a string mining algorithm. An independence test is used to select the most discriminative frequent patterns, which are subsequently used to create a transformed version of the time series data. Finally, a classification model can be trained using any off-the-shelf algorithm. Extensive empirical evaluations demonstrate the competitive classification accuracy of our approach compared to other state-of-the-art approaches. The experiments also show that our approach is at least one to two orders of magnitude faster than the existing pattern-based methods due to the extremely fast frequent pattern extraction, which is the most computationally intensive process in pattern-based time series classification approaches.

Keywords

Time series Classification String mining Linear time and space 

Notes

Acknowledgements

We are grateful to the reviewers for their comments and suggestions which helped in improving the quality of this paper. The first author was supported by a scholarship from the Higher Education Commission (HEC), Pakistan, and the German Academic Exchange Service (DAAD), Germany.

References

  1. 1.
    Breiman L (2001) Random forests. Mach Learn 45(1):5–32.  https://doi.org/10.1023/A:1010933404324 CrossRefzbMATHGoogle Scholar
  2. 2.
    Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetzbMATHGoogle Scholar
  3. 3.
    Dhaliwal J, Puglisi SJ, Turpin A (2012) Practical efficient string mining. IEEE Trans Knowl Data Eng 24(4):735–744.  https://doi.org/10.1109/TKDE.2010.242 CrossRefGoogle Scholar
  4. 4.
    Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data. Proc VLDB Endow 1(2):1542–1552CrossRefGoogle Scholar
  5. 5.
    Fischer J, Heun V, Kramer S (2005) Fast frequent string mining using suffix arrays. In: 5th International conference on data mining, IEEE, ICDM ’05, pp 609–612.  https://doi.org/10.1109/ICDM.2005.62
  6. 6.
    Fischer J, Heun V, Kramer S (2006) Optimal string mining under frequency constraints. In: Knowledge discovery in databases, PKDD 2006, lecture notes in computer science, vol 4213. Springer, Berlin, pp 139–150.  https://doi.org/10.1007/11871637_17
  7. 7.
    Freund Y (1995) Boosting a Weak Learning Algorithm by Majority. Inf Comput 121(2):256–285.  https://doi.org/10.1006/inco.1995.1136
  8. 8.
    Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42.  https://doi.org/10.1007/s10994-006-6226-1 CrossRefzbMATHGoogle Scholar
  9. 9.
    Hills J, Lines J, Baranauskas E, Mapp J, Bagnall A (2014) Classification of time series by shapelet transformation. Data Min Knowl Discov 28(4):851–881.  https://doi.org/10.1007/s10618-013-0322-1 MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144.  https://doi.org/10.1007/s10618-007-0064-z MathSciNetCrossRefGoogle Scholar
  11. 11.
    Lin J, Khade R, Li Y (2012) Rotation-invariant similarity in time series using bag-of-patterns representation. J Intell Inf Syst 39(2):287–315.  https://doi.org/10.1007/s10844-012-0196-5 CrossRefGoogle Scholar
  12. 12.
    Rakthanmanon T, Keogh E (2013) Fast shapelets: a scalable algorithm for discovering time series shapelets. In: Proceedings of the 2013 SIAM international conference on data mining, SDM, Society for Industrial and Applied Mathematics, pp 668–676.  https://doi.org/10.1137/1.9781611972832.74
  13. 13.
    Schäfer P (2015) The BOSS is concerned with time series classification in the presence of noise. Data Min Knowl Discov 29(6):1505–1530.  https://doi.org/10.1007/s10618-014-0377-7 MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Schäfer P (2016) Scalable time series classification. Data Min Knowl Discov 30(5):1273–1298.  https://doi.org/10.1007/s10618-015-0441-y MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Senin P, Malinchik S (2013) SAX-VSM: interpretable time series classification using SAX and vector space model. In: 13th International conference on data mining, IEEE, ICDM ’13, pp 1175–1180.  https://doi.org/10.1109/ICDM.2013.52
  16. 16.
    Toivonen H (2017) Frequent pattern. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning and data mining. Springer, Boston, pp 524–529.  https://doi.org/10.1007/978-1-4899-7687-1_318 CrossRefGoogle Scholar
  17. 17.
    Ye L, Keogh E (2011) Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data Min Knowl Discov 22(1):149–182.  https://doi.org/10.1007/s10618-010-0179-5 MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Institute of Computer ScienceJohannes Gutenberg University MainzMainzGermany

Personalised recommendations