Machine Learning

, Volume 101, Issue 1–3, pp 377–395 | Cite as

Using causal discovery for feature selection in multivariate numerical time series

  • Youqiang Sun
  • Jiuyong Li
  • Jixue Liu
  • Christopher Chow
  • Bingyu Sun
  • Rujing Wang


Time series data contains temporal ordering, which makes its feature selection different from the normal feature selection. Feature selection in multivariate time series has two tasks: identifying the relevant features and finding their effective window sizes of lagged values. The methods extended from normal feature selection methods do not solve this two-dimensional feature selection problem since they do not take lagged observations of features into consideration. In this paper, we present a method using the Granger causality discovery to identify causal features with effective sliding window sizes in multivariate numerical time series. The proposed method considers the influence of lagged observations of features on the target time series. We compare our proposed feature selection method with several normal feature selection methods on multivariate time series data using three well-known modeling methods. Our method outperforms other methods for predicting future values of target time series. In a real world case study on water quality monitoring data, we show that the features selected by our method contain four out of five features used by domain experts, and prediction performance on our features is better than that on features of domain experts using three modeling methods.


Feature selection Multivariate time series Causal discovery Prediction and regression Granger causality 



The authors would like to thank the SA Water Corporation and the SA Water Centre for Water Management and Reuse for supporting the work. The work has also been partially supported by Australian Research Council Grant DP140103617 and the National Natural Science Foundation of China (No: 31171456).


  1. Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723.MATHMathSciNetCrossRefGoogle Scholar
  2. Aliferis, C. F., Statnikov, A., Tsamardinos, I., Mani, S., & Koutsoukos, X. D. (2010). Local causal and markov blanket induction for causal discovery and feature selection for classification part I: Algorithms and empirical evaluation. The Journal of Machine Learning Research, 11, 171–234.MATHMathSciNetGoogle Scholar
  3. Arnold, A., Liu, Y., & Abe, N. (2007). Temporal causal modeling with graphical Granger methods. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, ACM (pp. 66–75).Google Scholar
  4. Bache, K., & Lichman, M. (2013). UCI machine learning repository.
  5. Biesiada, J., & Duch, W. (2007). Feature selection for high-dimensional dataa pearson redundancy based filter. In: Computer recognition systems 2 (pp. 242–249). Springer.Google Scholar
  6. Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Oxford University Press.Google Scholar
  7. Brovelli, A., Ding, M., Ledberg, A., Chen, Y., Nakamura, R., & Bressler, S. L. (2004). Beta oscillations in a large-scale sensorimotor cortical network: Directional influences revealed by Granger causality. Proceedings of the National Academy of Sciences of the United States of America, 101(26), 9849–9854.CrossRefGoogle Scholar
  8. Byrne, A. J., Chow, C., Trolio, R., Lethorn, A., Lucas, J., & Korshin, G. V. (2011). Development and validation of online surrogate parameters for water quality monitoring at a conventional water treatment plant using a UV absorbance spectrolyser. The 7th IEEE international conference on intelligent sensors (pp. 200–204). IEEE: Sensor Networks and Information Processing.Google Scholar
  9. Cawley, G. C. (2008). Causal and non-causal feature selection for ridge regression. Journal of Machine Learning Research-Proceedings Track, 3, 107–128.Google Scholar
  10. Chan, K. P., & Fu, A. C. (1999). Efficient time series matching by wavelets. In Proceedings of the 15th international conference on data engineering, IEEE (pp. 126–133).Google Scholar
  11. Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 27.CrossRefGoogle Scholar
  12. Chen, Y., Rangarajan, G., Feng, J., & Ding, M. (2004). Analyzing multiple nonlinear time series with extended Granger causality. Physics Letters A, 324(1), 26–35.MATHMathSciNetCrossRefGoogle Scholar
  13. Chizi, B., & Maimon, O. (2010). Dimension reduction and feature selection. In: Data mining and knowledge discovery handbook (pp. 83–100). Springer.Google Scholar
  14. Cleary, J. G., & Trigg, L. E. (1995). K*: An instance-based learner using an entropic distance measure. In: Proceedings of the international conference on machine learning (pp. 108–114).Google Scholar
  15. Crone, S. F., & Kourentzes, N. (2010). Feature selection for time series prediction: A combined filter and wrapper approach for neural networks. Neurocomputing, 73(10), 1923–1936.CrossRefGoogle Scholar
  16. Eichler, M. (2012). Graphical modelling of multivariate time series. Probability Theory and Related Fields, 153(1–2), 233–268.MATHMathSciNetCrossRefGoogle Scholar
  17. Engle, R.F., & Granger, C.W. (1987). Co-integration and error correction: Representation, estimation, and testing. Econometrica: Journal of the Econometric Society, 251–276.Google Scholar
  18. Granger, C. W. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica: Journal of the Econometric Society, 424–438.Google Scholar
  19. Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1–3), 389–422.MATHCrossRefGoogle Scholar
  20. Guyon, I., Elisseeff, A., & Aliferis, C. (2007). Computational methods of feature selection, chapter causal feature selection. London: Chapman and Hall/CRC.Google Scholar
  21. Han, M., & Liu, X. (2013). Feature selection techniques with class separability for multivariate time series. Neurocomputing, 110, 29–34.CrossRefGoogle Scholar
  22. Haufe, S., Nolte, G., Mueller, K.R., & Krämer, N. (2010). Sparse causal discovery in multivariate time series. In NIPS causality: Objectives and assessment (pp. 97–106).Google Scholar
  23. Hido, S., & Morimura, T. (2012). Temporal feature selection for time-series prediction. In 21st International conference on pattern recognition (ICPR), IEEE (pp. 3557–3560).Google Scholar
  24. Hiemstra, C., & Jones, J. D. (1994). Testing for linear and nonlinear Granger causality in the stock price–volume relation. The Journal of Finance, 49(5), 1639–1664.Google Scholar
  25. Huang, S. C., & Wu, T. K. (2008). Integrating ga-based time-scale feature extractions with svms for stock index forecasting. Expert Systems with Applications, 35(4), 2080–2088.CrossRefGoogle Scholar
  26. Keogh, E., Chakrabarti, K., Pazzani, M., & Mehrotra, S. (2001). Dimensionality reduction for fast similarity search in large time series databases. Knowledge and information Systems, 3(3), 263–286.MATHCrossRefGoogle Scholar
  27. Kim, M. (2012). Time-series dimensionality reduction via Granger causality. IEEE Signal Processing Letters, 19(10), 611–614.CrossRefGoogle Scholar
  28. Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial intelligence, 97(1), 273–324.MATHCrossRefGoogle Scholar
  29. Lal, T. N., Schroder, M., Hinterberger, T., Weston, J., Bogdan, M., Birbaumer, N., et al. (2004). Support vector channel selection in BCI. IEEE Transactions on Biomedical Engineering, 51(6), 1003–1010.CrossRefGoogle Scholar
  30. Lozano, A. C., Abe, N., Liu, Y., & Rosset, S. (2009) Grouped graphical Granger modeling methods for temporal causal modeling. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM (pp. 577–586).Google Scholar
  31. Lu, Y., Cohen, I., Zhou, X. S., & Tian, Q. (2007). Feature selection using principal feature analysis. In Proceedings of the 15th international conference on multimedia, ACM (pp. 301–304).Google Scholar
  32. Maldonado, S., Weber, R., & Basak, J. (2011). Simultaneous feature selection and classification using kernel-penalized support vector machines. Information Sciences, 181(1), 115–128.CrossRefGoogle Scholar
  33. Phillips, P. C., & Perron, P. (1988). Testing for a unit root in time series regression. Biometrika, 75(2), 335–346.MATHMathSciNetCrossRefGoogle Scholar
  34. Qiu, H., Liu, Y., Subrahmanya, N. A., & Li, W. (2012). Granger causality for time-series anomaly detection. In Proceedings of the 12th IEEE international conference on data mining, IEEE (pp. 1074–1079).Google Scholar
  35. Ratanamahatana, C. A., Lin, J., Gunopulos, D., Keogh, E., Vlachos, M., & Das, G. (2010). Mining time series data. In Data mining and knowledge discovery handbook (pp. 1049–1077). Springer.Google Scholar
  36. Ravi Kanth, K., Agrawal, D., & Singh, A. (1998). Dimensionality reduction for similarity searching in dynamic databases. ACM SIGMOD Record, ACM, 27, 166–176.CrossRefGoogle Scholar
  37. Rocchi, L., Chiari, L., & Cappello, A. (2004). Feature selection of stabilometric parameters based on principal component analysis. Medical and Biological Engineering and Computing, 42(1), 71–79.CrossRefGoogle Scholar
  38. Said, S. E., & Dickey, D. A. (1984). Testing for unit roots in autoregressive-moving average models of unknown order. Biometrika, 71(3), 599–607.MATHMathSciNetCrossRefGoogle Scholar
  39. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.MATHMathSciNetCrossRefGoogle Scholar
  40. Shibuya, T., Harada, T., & Kuniyoshi, Y. (2009). Causality quantification and its applications: structuring and modeling of multivariate time series. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM (pp. 787–796).Google Scholar
  41. Tsai, C. F., & Hsiao, Y. C. (2010). Combining multiple feature selection methods for stock prediction: Union, intersection, and multi-intersection approaches. Decision Support Systems, 50(1), 258–269.CrossRefGoogle Scholar
  42. Weston, J., Elisseeff, A., BakIr, G., & Sinz, F. (2005). SPIDER: object-orientated machine learning library.
  43. Wong, C., & Versace, M. (2012). Cartmap: A neural network method for automated feature selection in financial time series forecasting. Neural Computing and Applications, 21(5), 969–977.CrossRefGoogle Scholar
  44. Yang, K., Yoon, H., & Shahabi, C. (2005). A supervised feature subset selection technique for multivariate time series. In Proceedings of the workshop on feature selection for data mining: Interfacing machine learning with statistics (pp. 92–101).Google Scholar
  45. Yoon, H., & Shahabi, C. (2006). Feature subset selection on multivariate time series with extremely large spatial features. In Workshops of the 12th IEEE international conference on data mining, IEEE (pp. 337–342).Google Scholar
  46. Yoon, H., Yang, K., & Shahabi, C. (2005). Feature subset selection and feature ranking for multivariate time series. IEEE Transactions on Knowledge and Data Engineering, 17(9), 1186–1198.CrossRefGoogle Scholar
  47. Zhang, M. L., Peña, J. M., & Robles, V. (2009). Feature selection for multi-label naive bayes classification. Information Sciences, 179(19), 3218–3229.MATHCrossRefGoogle Scholar
  48. Zhao, Y., & Zhang, S. (2006). Generalized dimension-reduction framework for recent-biased time series analysis. IEEE Transactions on Knowledge and Data Engineering, 18(2), 231–244.CrossRefGoogle Scholar
  49. Zoubek, L., Charbonnier, S., Lesecq, S., Buguet, A., & Chapotot, F. (2007). Feature selection for sleep/wake stages classification using data driven methods. Biomedical Signal Processing and Control, 2(3), 171–179.CrossRefGoogle Scholar

Copyright information

© The Author(s) 2014

Authors and Affiliations

  • Youqiang Sun
    • 1
    • 2
  • Jiuyong Li
    • 3
  • Jixue Liu
    • 3
  • Christopher Chow
    • 4
    • 5
  • Bingyu Sun
    • 1
    • 2
  • Rujing Wang
    • 1
    • 2
  1. 1.School of Information Science and TechnologyUniversity of Science and Technology of ChinaHefeiChina
  2. 2.Institute of Intelligent MachinesChinese Academy of SciencesHefeiChina
  3. 3.School of Information Technology and Mathematical SciencesUniversity of South AustraliaAdelaideAustralia
  4. 4.Australian Water Quality Centre, SA WaterAdelaideAustralia
  5. 5.SA Water Centre for Water Management and ReuseUniversity of South AustraliaAdelaideAustralia

Personalised recommendations