Using causal discovery for feature selection in multivariate numerical time series
- 841 Downloads
Time series data contains temporal ordering, which makes its feature selection different from the normal feature selection. Feature selection in multivariate time series has two tasks: identifying the relevant features and finding their effective window sizes of lagged values. The methods extended from normal feature selection methods do not solve this two-dimensional feature selection problem since they do not take lagged observations of features into consideration. In this paper, we present a method using the Granger causality discovery to identify causal features with effective sliding window sizes in multivariate numerical time series. The proposed method considers the influence of lagged observations of features on the target time series. We compare our proposed feature selection method with several normal feature selection methods on multivariate time series data using three well-known modeling methods. Our method outperforms other methods for predicting future values of target time series. In a real world case study on water quality monitoring data, we show that the features selected by our method contain four out of five features used by domain experts, and prediction performance on our features is better than that on features of domain experts using three modeling methods.
KeywordsFeature selection Multivariate time series Causal discovery Prediction and regression Granger causality
The authors would like to thank the SA Water Corporation and the SA Water Centre for Water Management and Reuse for supporting the work. The work has also been partially supported by Australian Research Council Grant DP140103617 and the National Natural Science Foundation of China (No: 31171456).
- Aliferis, C. F., Statnikov, A., Tsamardinos, I., Mani, S., & Koutsoukos, X. D. (2010). Local causal and markov blanket induction for causal discovery and feature selection for classification part I: Algorithms and empirical evaluation. The Journal of Machine Learning Research, 11, 171–234.MATHMathSciNetGoogle Scholar
- Arnold, A., Liu, Y., & Abe, N. (2007). Temporal causal modeling with graphical Granger methods. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, ACM (pp. 66–75).Google Scholar
- Bache, K., & Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml.
- Biesiada, J., & Duch, W. (2007). Feature selection for high-dimensional dataa pearson redundancy based filter. In: Computer recognition systems 2 (pp. 242–249). Springer.Google Scholar
- Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Oxford University Press.Google Scholar
- Brovelli, A., Ding, M., Ledberg, A., Chen, Y., Nakamura, R., & Bressler, S. L. (2004). Beta oscillations in a large-scale sensorimotor cortical network: Directional influences revealed by Granger causality. Proceedings of the National Academy of Sciences of the United States of America, 101(26), 9849–9854.CrossRefGoogle Scholar
- Byrne, A. J., Chow, C., Trolio, R., Lethorn, A., Lucas, J., & Korshin, G. V. (2011). Development and validation of online surrogate parameters for water quality monitoring at a conventional water treatment plant using a UV absorbance spectrolyser. The 7th IEEE international conference on intelligent sensors (pp. 200–204). IEEE: Sensor Networks and Information Processing.Google Scholar
- Cawley, G. C. (2008). Causal and non-causal feature selection for ridge regression. Journal of Machine Learning Research-Proceedings Track, 3, 107–128.Google Scholar
- Chan, K. P., & Fu, A. C. (1999). Efficient time series matching by wavelets. In Proceedings of the 15th international conference on data engineering, IEEE (pp. 126–133).Google Scholar
- Chizi, B., & Maimon, O. (2010). Dimension reduction and feature selection. In: Data mining and knowledge discovery handbook (pp. 83–100). Springer.Google Scholar
- Cleary, J. G., & Trigg, L. E. (1995). K*: An instance-based learner using an entropic distance measure. In: Proceedings of the international conference on machine learning (pp. 108–114).Google Scholar
- Engle, R.F., & Granger, C.W. (1987). Co-integration and error correction: Representation, estimation, and testing. Econometrica: Journal of the Econometric Society, 251–276.Google Scholar
- Granger, C. W. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica: Journal of the Econometric Society, 424–438.Google Scholar
- Guyon, I., Elisseeff, A., & Aliferis, C. (2007). Computational methods of feature selection, chapter causal feature selection. London: Chapman and Hall/CRC.Google Scholar
- Haufe, S., Nolte, G., Mueller, K.R., & Krämer, N. (2010). Sparse causal discovery in multivariate time series. In NIPS causality: Objectives and assessment (pp. 97–106).Google Scholar
- Hido, S., & Morimura, T. (2012). Temporal feature selection for time-series prediction. In 21st International conference on pattern recognition (ICPR), IEEE (pp. 3557–3560).Google Scholar
- Hiemstra, C., & Jones, J. D. (1994). Testing for linear and nonlinear Granger causality in the stock price–volume relation. The Journal of Finance, 49(5), 1639–1664.Google Scholar
- Lozano, A. C., Abe, N., Liu, Y., & Rosset, S. (2009) Grouped graphical Granger modeling methods for temporal causal modeling. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM (pp. 577–586).Google Scholar
- Lu, Y., Cohen, I., Zhou, X. S., & Tian, Q. (2007). Feature selection using principal feature analysis. In Proceedings of the 15th international conference on multimedia, ACM (pp. 301–304).Google Scholar
- Qiu, H., Liu, Y., Subrahmanya, N. A., & Li, W. (2012). Granger causality for time-series anomaly detection. In Proceedings of the 12th IEEE international conference on data mining, IEEE (pp. 1074–1079).Google Scholar
- Ratanamahatana, C. A., Lin, J., Gunopulos, D., Keogh, E., Vlachos, M., & Das, G. (2010). Mining time series data. In Data mining and knowledge discovery handbook (pp. 1049–1077). Springer.Google Scholar
- Shibuya, T., Harada, T., & Kuniyoshi, Y. (2009). Causality quantification and its applications: structuring and modeling of multivariate time series. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM (pp. 787–796).Google Scholar
- Weston, J., Elisseeff, A., BakIr, G., & Sinz, F. (2005). SPIDER: object-orientated machine learning library. http://www.kyb.tuebingen.mpg.de/bs/people/spider.
- Yang, K., Yoon, H., & Shahabi, C. (2005). A supervised feature subset selection technique for multivariate time series. In Proceedings of the workshop on feature selection for data mining: Interfacing machine learning with statistics (pp. 92–101).Google Scholar
- Yoon, H., & Shahabi, C. (2006). Feature subset selection on multivariate time series with extremely large spatial features. In Workshops of the 12th IEEE international conference on data mining, IEEE (pp. 337–342).Google Scholar