Advertisement

A Novel Decision Tree Approach for the Handling of Time Series

  • Andrea BrunelloEmail author
  • Enrico Marzano
  • Angelo Montanari
  • Guido Sciavicco
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11308)

Abstract

Time series play a major role in many analysis tasks. As an example, in the stock market, they can be used to model price histories and to make predictions about future trends. Sometimes, information contained in a time series is complemented by other kinds of data, which may be encoded by static attributes, e.g., categorical or numeric ones, or by more general discrete data sequences. In this paper, we present J48SS, a novel decision tree learning algorithm capable of natively mixing static, sequential, and time series data for classification purposes. The proposed solution is based on the well-known C4.5 decision tree learner, and it relies on the concept of time series shapelets, which are generated by means of multi-objective evolutionary computation techniques and, differently from most previous approaches, are not required to be part of the training set. We evaluate the algorithm against a set of well-known UCR time series datasets, and we show that it provides better classification performances with respect to previous approaches based on decision trees, while generating highly interpretable models and effectively reducing the data preparation effort.

Keywords

Decision trees Time series Evolutionary computation 

Notes

Acknowledgments

Andrea Brunello and Angelo Montanari would like to thank the PRID project ENCASE - Efforts in the uNderstanding of Complex interActing SystEms for the support.

References

  1. 1.
    Adesuyi, A.S., Munch, Z.: Using time-series NDVI to model land cover change: a case study in the Berg River catchment area, Western Cape, South Africa. Int. J. Environ. Chem. Ecol. Geol. Geophys. Eng. 9(5), 537–542 (2015)Google Scholar
  2. 2.
    Arathi, M., Govardhan, A.: Effect of Mahalanobis distance on time series classification using shapelets. In: Satapathy, S., Govardhan, A., Raju, K., Mandal, J. (eds.) CSI 2015. AISC, vol. 338, pp. 525–535. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-13731-5_57CrossRefGoogle Scholar
  3. 3.
    Barros, R.C., Freitas, A.A.: A survey of evolutionary algorithms for decision-tree induction. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(3), 291–312 (2012)CrossRefGoogle Scholar
  4. 4.
    Boström, H.: Concurrent learning of large-scale random forests. In: SCAI. Frontiers in Artificial Intelligence and Applications, vol. 227, pp. 20–29. IOS Press (2011)Google Scholar
  5. 5.
    Brunello, A., Gallo, P., Marzano, E., Montanari, A., Vitacolonna, N.: An event-based data warehouse to support decisions in multi-channel, multi-service contact centers. J. Cases Inf. Technol. 21(1), 33–51 (2019)CrossRefGoogle Scholar
  6. 6.
    Brunello, A., Marzano, E., Montanari, A., Sciavicco, G.: J48S: a sequence classification approach to text analysis based on decision trees. In: Damaševičius, R., Vasiljevienė, G. (eds.) ICIST 2018. CCIS, vol. 920, pp. 240–256. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-99972-2_19CrossRefGoogle Scholar
  7. 7.
    Chen, Y., et al.: The UCR time series classification archive, July 2015Google Scholar
  8. 8.
    Dabhi, V.K., Chaudhary, S.: A survey on techniques of improving generalization ability of genetic programming solutions. arXiv preprint arXiv:1211.1119 (2012)
  9. 9.
    Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)CrossRefGoogle Scholar
  10. 10.
    Durillo, J.J., Nebro, A.J., Alba, E.: The jMetal framework for multi-objective optimization: design and architecture. In: Proceedings of the IEEE Congress on Evolutionary Computation (CEC 2010), Barcelona, Spain, pp. 4138–4325, July 2010Google Scholar
  11. 11.
    Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Springer, Heidelberg (2003).  https://doi.org/10.1007/978-3-662-05094-1CrossRefzbMATHGoogle Scholar
  12. 12.
    Fitzgerald, J., Azad, R.M.A., Ryan, C.: A bootstrapping approach to reduce over-fitting in genetic programming. In: Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation (GECCO 2013), pp. 1113–1120. ACM (2013)Google Scholar
  13. 13.
    Gagné, C., Schoenauer, M., Parizeau, M., Tomassini, M.: Genetic programming, validation sets, and parsimony pressure. In: Collet, P., Tomassini, M., Ebner, M., Gustafson, S., Ekárt, A. (eds.) EuroGP 2006. LNCS, vol. 3905, pp. 109–120. Springer, Heidelberg (2006).  https://doi.org/10.1007/11729976_10CrossRefGoogle Scholar
  14. 14.
    Gonçalves, I., Silva, S.: Balancing learning and overfitting in genetic programming with interleaved sampling of training data. In: Krawiec, K., Moraglio, A., Hu, T., Etaner-Uyar, A.Ş., Hu, B. (eds.) EuroGP 2013. LNCS, vol. 7831, pp. 73–84. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-37207-0_7CrossRefGoogle Scholar
  15. 15.
    Grabocka, J., Wistuba, M., Schmidt-Thieme, L.: Scalable discovery of time-series shapelets. arXiv preprint arXiv:1503.03238 (2015)
  16. 16.
    Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)CrossRefGoogle Scholar
  17. 17.
    Hou, L., Kwok, J.T., Zurada, J.M.: Efficient learning of timeseries shapelets. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI 2016) (2016)Google Scholar
  18. 18.
    Kampouraki, A., Manis, G., Nikou, C.: Heartbeat time series classification with support vector machines. IEEE Trans. Inf. Technol. Biomed. 13(4), 512–518 (2009)CrossRefGoogle Scholar
  19. 19.
    Karim, F., Majumdar, S., Darabi, H., Chen, S.: LSTM fully convolutional networks for time series classification, 6, 1662–1669 (2018). arXiv preprint arXiv:1709.05206
  20. 20.
    Karlsson, I., Papapetrou, P., Boström, H.: Generalized random shapelet forests. Data Min. Knowl. Discov. 30(5), 1053–1085 (2016)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (SIGMOD 2003), pp. 2–11. ACM (2003)Google Scholar
  22. 22.
    Mabroukeh, N.R., Ezeife, C.I.: A taxonomy of sequential pattern mining algorithms. ACM Comput. Surv. 43(1), 1–41 (2010)CrossRefGoogle Scholar
  23. 23.
    Mörchen, F., Ultsch, A.: Optimizing time series discretization for knowledge discovery. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD 2005), pp. 660–665. ACM (2005)Google Scholar
  24. 24.
    Moskovitch, R., Shahar, Y.: Classification-driven temporal discretization of multivariate time series. Data Min. Knowl. Discov. 29(4), 871–913 (2015)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Nerlove, M., Grether, D.M., Carvalho, J.L.: Analysis of Economic Time Series: A Synthesis. Academic Press, New York (2014)zbMATHGoogle Scholar
  26. 26.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  27. 27.
    Rakthanmanon, T., Keogh, E.: Fast shapelets: a scalable algorithm for discovering time series shapelets. In: Proceedings of the 2013 SIAM International Conference on Data Mining (SIAM 2013), pp. 668–676 (2013)CrossRefGoogle Scholar
  28. 28.
    Renard, X., Rifqi, M., Erray, W., Detyniecki, M.: Random-shapelet: an algorithm for fast shapelet discovery. In: Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA 2015), pp. 1–10. IEEE (2015)Google Scholar
  29. 29.
    Schäfer, P., Leser, U.: Fast and accurate time series classification with WEASEL. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM 2017), pp. 637–646. ACM (2017)Google Scholar
  30. 30.
    Shah, M., Grabocka, J., Schilling, N., Wistuba, M., Schmidt-Thieme, L.: Learning DTW-shapelets for time-series classification. In: Proceedings of the 3rd IKDD Conference on Data Science (CODS 2016), p. 3. ACM (2016)Google Scholar
  31. 31.
    Vanneschi, L., Castelli, M., Silva, S.: Measuring bloat, overfitting and functional complexity in genetic programming. In: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation (GECCO 2010), pp. 877–884. ACM (2010)Google Scholar
  32. 32.
    Wei, L.Y., et al.: A hybrid time series model based on AR-EMD and volatility for medical data forecasting: a case study in the emergency department. Int. J. Manag., Econ. Soc. Sci. (IJMESS) 6(Spec. Issue), 166–184 (2017)Google Scholar
  33. 33.
    Welch, T.A.: A technique for high-performance data compression. Computer 17(6), 8–19 (1984)CrossRefGoogle Scholar
  34. 34.
    Wistuba, M., Grabocka, J., Schmidt-Thieme, L.: Ultra-fast shapelets for time series classification. arXiv preprint arXiv:1503.05018 (2015)
  35. 35.
    Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann (2016). https://www.cs.waikato.ac.nz/ml/weka/book.html
  36. 36.
    Ye, L., Keogh, E.: Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009), pp. 947–956. ACM (2009)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Andrea Brunello
    • 1
    Email author
  • Enrico Marzano
    • 2
  • Angelo Montanari
    • 1
  • Guido Sciavicco
    • 3
  1. 1.University of UdineUdineItaly
  2. 2.Gap s.r.l.u.TriesteItaly
  3. 3.University of FerraraFerraraItaly

Personalised recommendations