Advertisement

DSCo-NG: A Practical Language Modeling Approach for Time Series Classification

  • Daoyuan LiEmail author
  • Tegawendé F. Bissyandé
  • Jacques Klein
  • Yves Le Traon
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9897)

Abstract

The abundance of time series data in various domains and their high dimensionality characteristic are challenging for harvesting useful information from them. To tackle storage and processing challenges, compression-based techniques have been proposed. Our previous work, Domain Series Corpus (DSCo), compresses time series into symbolic strings and takes advantage of language modeling techniques to extract from the training set knowledge about different classes. However, this approach was flawed in practice due to its excessive memory usage and the need for a priori knowledge about the dataset. In this paper we propose DSCo-NG, which reduces DSCo’s complexity and offers an efficient (linear time complexity and low memory footprint), accurate (performance comparable to approaches working on uncompressed data) and generic (so that it can be applied to various domains) approach for time series classification. Our confidence is backed with extensive experimental evaluation against publicly accessible datasets, which also offers insights on when DSCo-NG can be a better choice than others.

Notes

Acknowledgment

The authors would like to thank Paul Wurth S.A. and Luxembourg Ministry of Economy for sponsoring this research work.

References

  1. 1.
    Batista, G.E., Wang, X., Keogh, E.J.: A complexity-invariant distance measure for time series. In: SDM, vol. 11, pp. 699–710 (2011)Google Scholar
  2. 2.
    Baydogan, M.G., Runger, G., Tuv, E.: A bag-of-features framework to classify time series. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2796–2802 (2013)CrossRefGoogle Scholar
  3. 3.
    Bellegarda, J.R.: Statistical language model adaptation: review and perspectives. Speech Commun. 42(1), 93–108 (2004)CrossRefGoogle Scholar
  4. 4.
    Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. In: KDD Workshop, vol. 10, pp. 359–370 (1994)Google Scholar
  5. 5.
    Chen, Y., Keogh, E., Hu, B., Begum, N., Bagnall, A., Mueen, A., Batista, G.: The UCR time series classification archive, July 2015. www.cs.ucr.edu/~eamonn/time_series_data/
  6. 6.
    Chung, F.L., Fu, T.C., Luk, R., Ng, V.: Flexible time series pattern matching based on perceptually important points. In: International Joint Conference on Artificial Intelligence Workshop on Learning from Temporal and Spatial Data, pp. 1–7 (2001)Google Scholar
  7. 7.
    Fu, T.C.: A review on time series data mining. Eng. Appl. Artif. Intell. 24(1), 164–181 (2011)CrossRefGoogle Scholar
  8. 8.
    Keogh, E.: Fast similarity search in the presence of longitudinal scaling in time series databases. In: Proceedings of the Ninth IEEE International Conference on Tools with Artificial Intelligence, pp. 578–584. IEEE (1997)Google Scholar
  9. 9.
    Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf. Syst. 3(3), 263–286 (2001)CrossRefzbMATHGoogle Scholar
  10. 10.
    Keogh, E., Lonardi, S., Ratanamahatana, C.A.: Towards parameter-free data mining. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215. ACM (2004)Google Scholar
  11. 11.
    Li, D., Bissyande, T.F., Klein, J., Le Traon, Y.: Time series classification with discrete wavelet transformed data: insights from an empirical study. In: The 28th International Conference on Software Engineering and Knowledge Engineering (2016)Google Scholar
  12. 12.
    Li, D., Bissyande, T.F., Kubler, S., Klein, J., Le Traon, Y.: Profiling household appliance electricity usage with n-gram language modeling. In: The 2016 IEEE International Conference on Industrial Technology, Taipei, pp. 604–609. IEEE (2016)Google Scholar
  13. 13.
    Li, D., Li, L., Bissyande, T.F., Klein, J., Le Traon, Y.: DSCo: a language modeling approach for time series classification. In: The 12th International Conference on Machine Learning and Data Mining, New York (2016)Google Scholar
  14. 14.
    Li, Y., Lin, J.: Approximate variable-length time series motif discovery using grammar inference. In: Proceedings of the Tenth International Workshop on Multimedia Data Mining, p. 10 (2010)Google Scholar
  15. 15.
    Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing SAX: a novel symbolic representation of time series. Data Min. Knowl. Disc. 15(2), 107–144 (2007)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Marteau, P.F.: Time warp edit distance with stiffness adjustment for time series matching. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 306–318 (2009)CrossRefGoogle Scholar
  17. 17.
    Senin, P., et al.: GrammarViz 2.0: a tool for grammar-based pattern discovery in time series. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS, vol. 8726, pp. 468–472. Springer, Heidelberg (2014). doi: 10.1007/978-3-662-44845-8_37 Google Scholar
  18. 18.
    Senin, P., Malinchik, S.: SAX-VSM: interpretable time series classification using SAX and vector space model. In: IEEE 13th International Conference on Data Mining, pp. 1175–1180. IEEE (2013)Google Scholar
  19. 19.
    Serrà, J., Arcos, J.L.: An empirical evaluation of similarity measures for time series classification. Knowl. Based Syst. 67, 305–314 (2014)CrossRefGoogle Scholar
  20. 20.
    Varrette, S., Bouvry, P., Cartiaux, H., Georgatos, F.: Management of an academic HPC cluster: the UL experience. In: Proceedings of the 2014 International Conference on High Performance Computing and Simulation (HPCS 2014), Bologna, Italy, pp. 959–967. IEEE, July 2014Google Scholar
  21. 21.
    Wang, Q., Megalooikonomou, V.: A dimensionality reduction technique for efficient time series similarity analysis. Inf. Syst. 33(1), 115–132 (2008)CrossRefGoogle Scholar
  22. 22.
    Wang, X., Mueen, A., Ding, H., Trajcevski, G., Scheuermann, P., Keogh, E.: Experimental comparison of representation methods and distance measures for time series data. Data Min. Knowl. Disc. 26(2), 275–309 (2013)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Wang, X., Lin, J., Senin, P., Oates, T., Gandhi, S., Boedihardjo, A.P., Chen, C., Frankenstein, S.: RPM: representative pattern mining for efficient time series classification. In: Proceedings of the 19th International Conference on Extending Database Technology (2016)Google Scholar
  24. 24.
    Xi, X., Keogh, E., Shelton, C., Wei, L., Ratanamahatana, C.A.: Fast time series classification using numerosity reduction. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 1033–1040. ACM (2006)Google Scholar
  25. 25.
    Ye, L., Keogh, E.: Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 947–956. ACM (2009)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Daoyuan Li
    • 1
    Email author
  • Tegawendé F. Bissyandé
    • 1
  • Jacques Klein
    • 1
  • Yves Le Traon
    • 1
  1. 1.Interdisciplinary Centre for Security, Reliability and Trust (SnT)University of LuxembourgLuxembourgLuxembourg

Personalised recommendations