Abstract
The abundance of time series data in various domains and their high dimensionality characteristic are challenging for harvesting useful information from them. To tackle storage and processing challenges, compression-based techniques have been proposed. Our previous work, Domain Series Corpus (DSCo), compresses time series into symbolic strings and takes advantage of language modeling techniques to extract from the training set knowledge about different classes. However, this approach was flawed in practice due to its excessive memory usage and the need for a priori knowledge about the dataset. In this paper we propose DSCo-NG, which reduces DSCo’s complexity and offers an efficient (linear time complexity and low memory footprint), accurate (performance comparable to approaches working on uncompressed data) and generic (so that it can be applied to various domains) approach for time series classification. Our confidence is backed with extensive experimental evaluation against publicly accessible datasets, which also offers insights on when DSCo-NG can be a better choice than others.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Repository is available at https://github.com/serval-snt-uni-lu/dsco.
References
Batista, G.E., Wang, X., Keogh, E.J.: A complexity-invariant distance measure for time series. In: SDM, vol. 11, pp. 699–710 (2011)
Baydogan, M.G., Runger, G., Tuv, E.: A bag-of-features framework to classify time series. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2796–2802 (2013)
Bellegarda, J.R.: Statistical language model adaptation: review and perspectives. Speech Commun. 42(1), 93–108 (2004)
Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. In: KDD Workshop, vol. 10, pp. 359–370 (1994)
Chen, Y., Keogh, E., Hu, B., Begum, N., Bagnall, A., Mueen, A., Batista, G.: The UCR time series classification archive, July 2015. www.cs.ucr.edu/~eamonn/time_series_data/
Chung, F.L., Fu, T.C., Luk, R., Ng, V.: Flexible time series pattern matching based on perceptually important points. In: International Joint Conference on Artificial Intelligence Workshop on Learning from Temporal and Spatial Data, pp. 1–7 (2001)
Fu, T.C.: A review on time series data mining. Eng. Appl. Artif. Intell. 24(1), 164–181 (2011)
Keogh, E.: Fast similarity search in the presence of longitudinal scaling in time series databases. In: Proceedings of the Ninth IEEE International Conference on Tools with Artificial Intelligence, pp. 578–584. IEEE (1997)
Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf. Syst. 3(3), 263–286 (2001)
Keogh, E., Lonardi, S., Ratanamahatana, C.A.: Towards parameter-free data mining. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215. ACM (2004)
Li, D., Bissyande, T.F., Klein, J., Le Traon, Y.: Time series classification with discrete wavelet transformed data: insights from an empirical study. In: The 28th International Conference on Software Engineering and Knowledge Engineering (2016)
Li, D., Bissyande, T.F., Kubler, S., Klein, J., Le Traon, Y.: Profiling household appliance electricity usage with n-gram language modeling. In: The 2016 IEEE International Conference on Industrial Technology, Taipei, pp. 604–609. IEEE (2016)
Li, D., Li, L., Bissyande, T.F., Klein, J., Le Traon, Y.: DSCo: a language modeling approach for time series classification. In: The 12th International Conference on Machine Learning and Data Mining, New York (2016)
Li, Y., Lin, J.: Approximate variable-length time series motif discovery using grammar inference. In: Proceedings of the Tenth International Workshop on Multimedia Data Mining, p. 10 (2010)
Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing SAX: a novel symbolic representation of time series. Data Min. Knowl. Disc. 15(2), 107–144 (2007)
Marteau, P.F.: Time warp edit distance with stiffness adjustment for time series matching. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 306–318 (2009)
Senin, P., et al.: GrammarViz 2.0: a tool for grammar-based pattern discovery in time series. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS, vol. 8726, pp. 468–472. Springer, Heidelberg (2014). doi:10.1007/978-3-662-44845-8_37
Senin, P., Malinchik, S.: SAX-VSM: interpretable time series classification using SAX and vector space model. In: IEEE 13th International Conference on Data Mining, pp. 1175–1180. IEEE (2013)
Serrà, J., Arcos, J.L.: An empirical evaluation of similarity measures for time series classification. Knowl. Based Syst. 67, 305–314 (2014)
Varrette, S., Bouvry, P., Cartiaux, H., Georgatos, F.: Management of an academic HPC cluster: the UL experience. In: Proceedings of the 2014 International Conference on High Performance Computing and Simulation (HPCS 2014), Bologna, Italy, pp. 959–967. IEEE, July 2014
Wang, Q., Megalooikonomou, V.: A dimensionality reduction technique for efficient time series similarity analysis. Inf. Syst. 33(1), 115–132 (2008)
Wang, X., Mueen, A., Ding, H., Trajcevski, G., Scheuermann, P., Keogh, E.: Experimental comparison of representation methods and distance measures for time series data. Data Min. Knowl. Disc. 26(2), 275–309 (2013)
Wang, X., Lin, J., Senin, P., Oates, T., Gandhi, S., Boedihardjo, A.P., Chen, C., Frankenstein, S.: RPM: representative pattern mining for efficient time series classification. In: Proceedings of the 19th International Conference on Extending Database Technology (2016)
Xi, X., Keogh, E., Shelton, C., Wei, L., Ratanamahatana, C.A.: Fast time series classification using numerosity reduction. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 1033–1040. ACM (2006)
Ye, L., Keogh, E.: Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 947–956. ACM (2009)
Acknowledgment
The authors would like to thank Paul Wurth S.A. and Luxembourg Ministry of Economy for sponsoring this research work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Li, D., Bissyandé, T.F., Klein, J., Le Traon, Y. (2016). DSCo-NG: A Practical Language Modeling Approach for Time Series Classification. In: Boström, H., Knobbe, A., Soares, C., Papapetrou, P. (eds) Advances in Intelligent Data Analysis XV. IDA 2016. Lecture Notes in Computer Science(), vol 9897. Springer, Cham. https://doi.org/10.1007/978-3-319-46349-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-46349-0_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46348-3
Online ISBN: 978-3-319-46349-0
eBook Packages: Computer ScienceComputer Science (R0)