# Accelerating pattern-based time series classification: a linear time and space string mining approach

## Abstract

Subsequences-based time series classification algorithms provide interpretable and generally more accurate classification models compared to the nearest neighbor approach, albeit at a considerably higher computational cost. A number of discretized time series-based algorithms have been proposed to reduce the computational complexity of these algorithms; however, the asymptotic time complexity of the proposed algorithms is also cubic or higher-order polynomial. We present a remarkably fast and resource-efficient time series classification approach which employs a linear time and space string mining algorithm for extracting frequent patterns from discretized time series data. Compared to other subsequence or pattern-based classification algorithms, the proposed approach only requires a few parameters, which can be chosen arbitrarily and do not require any fine-tuning for different datasets. The time series data are discretized using symbolic aggregate approximation, and frequent patterns are extracted using a string mining algorithm. An independence test is used to select the most discriminative frequent patterns, which are subsequently used to create a transformed version of the time series data. Finally, a classification model can be trained using any off-the-shelf algorithm. Extensive empirical evaluations demonstrate the competitive classification accuracy of our approach compared to other state-of-the-art approaches. The experiments also show that our approach is at least one to two orders of magnitude faster than the existing pattern-based methods due to the extremely fast frequent pattern extraction, which is the most computationally intensive process in pattern-based time series classification approaches.

## Keywords

Time series Classification String mining Linear time and space## Notes

### Acknowledgements

We are grateful to the reviewers for their comments and suggestions which helped in improving the quality of this paper. The first author was supported by a scholarship from the Higher Education Commission (HEC), Pakistan, and the German Academic Exchange Service (DAAD), Germany.

## References

- 1.Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324 CrossRefzbMATHGoogle Scholar
- 2.Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetzbMATHGoogle Scholar
- 3.Dhaliwal J, Puglisi SJ, Turpin A (2012) Practical efficient string mining. IEEE Trans Knowl Data Eng 24(4):735–744. https://doi.org/10.1109/TKDE.2010.242 CrossRefGoogle Scholar
- 4.Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data. Proc VLDB Endow 1(2):1542–1552CrossRefGoogle Scholar
- 5.Fischer J, Heun V, Kramer S (2005) Fast frequent string mining using suffix arrays. In: 5th International conference on data mining, IEEE, ICDM ’05, pp 609–612. https://doi.org/10.1109/ICDM.2005.62
- 6.Fischer J, Heun V, Kramer S (2006) Optimal string mining under frequency constraints. In: Knowledge discovery in databases, PKDD 2006, lecture notes in computer science, vol 4213. Springer, Berlin, pp 139–150. https://doi.org/10.1007/11871637_17
- 7.Freund Y (1995) Boosting a Weak Learning Algorithm by Majority. Inf Comput 121(2):256–285. https://doi.org/10.1006/inco.1995.1136
- 8.Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42. https://doi.org/10.1007/s10994-006-6226-1 CrossRefzbMATHGoogle Scholar
- 9.Hills J, Lines J, Baranauskas E, Mapp J, Bagnall A (2014) Classification of time series by shapelet transformation. Data Min Knowl Discov 28(4):851–881. https://doi.org/10.1007/s10618-013-0322-1 MathSciNetCrossRefzbMATHGoogle Scholar
- 10.Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144. https://doi.org/10.1007/s10618-007-0064-z MathSciNetCrossRefGoogle Scholar
- 11.Lin J, Khade R, Li Y (2012) Rotation-invariant similarity in time series using bag-of-patterns representation. J Intell Inf Syst 39(2):287–315. https://doi.org/10.1007/s10844-012-0196-5 CrossRefGoogle Scholar
- 12.Rakthanmanon T, Keogh E (2013) Fast shapelets: a scalable algorithm for discovering time series shapelets. In: Proceedings of the 2013 SIAM international conference on data mining, SDM, Society for Industrial and Applied Mathematics, pp 668–676. https://doi.org/10.1137/1.9781611972832.74
- 13.Schäfer P (2015) The BOSS is concerned with time series classification in the presence of noise. Data Min Knowl Discov 29(6):1505–1530. https://doi.org/10.1007/s10618-014-0377-7 MathSciNetCrossRefzbMATHGoogle Scholar
- 14.Schäfer P (2016) Scalable time series classification. Data Min Knowl Discov 30(5):1273–1298. https://doi.org/10.1007/s10618-015-0441-y MathSciNetCrossRefzbMATHGoogle Scholar
- 15.Senin P, Malinchik S (2013) SAX-VSM: interpretable time series classification using SAX and vector space model. In: 13th International conference on data mining, IEEE, ICDM ’13, pp 1175–1180. https://doi.org/10.1109/ICDM.2013.52
- 16.Toivonen H (2017) Frequent pattern. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning and data mining. Springer, Boston, pp 524–529. https://doi.org/10.1007/978-1-4899-7687-1_318 CrossRefGoogle Scholar
- 17.Ye L, Keogh E (2011) Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data Min Knowl Discov 22(1):149–182. https://doi.org/10.1007/s10618-010-0179-5 MathSciNetCrossRefzbMATHGoogle Scholar