Temporal pattern attention for multivariate time series forecasting
- 70 Downloads
Forecasting of multivariate time series data, for instance the prediction of electricity consumption, solar power production, and polyphonic piano pieces, has numerous valuable applications. However, complex and non-linear interdependencies between time steps and series complicate this task. To obtain accurate prediction, it is crucial to model long-term dependency in time series data, which can be achieved by recurrent neural networks (RNNs) with an attention mechanism. The typical attention mechanism reviews the information at each previous time step and selects relevant information to help generate the outputs; however, it fails to capture temporal patterns across multiple time steps. In this paper, we propose using a set of filters to extract time-invariant temporal patterns, similar to transforming time series data into its “frequency domain”. Then we propose a novel attention mechanism to select relevant time series, and use its frequency domain information for multivariate forecasting. We apply the proposed model on several real-world tasks and achieve state-of-the-art performance in almost all of cases. Our source code is available at https://github.com/gantheory/TPA-LSTM.
KeywordsMultivariate time series Attention mechanism Recurrent neural network Convolutional neural network Polyphonic music generation
This work was financially supported by the Ministry of Science and Technology of Taiwan.
- Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. ICLR.Google Scholar
- Bouchachia, A., & Bouchachia, S. (2008). Ensemble learning for time series prediction. Proceedings of the 1st international workshop on nonlinear dynamics and synchronization.Google Scholar
- Cao, L. J., & Tay, F. E. H. (2003). Support vector machine with adaptive parameters in financial time series forecasting. IEEE Transactions on Neural Networks, pp. 1506–1518.Google Scholar
- Chen, S., Wang, X. X., & Harris, C. J. (2008). Narxbased nonlinear system identification using orthogonal least squares basis hunting. IEEE Transactions on Control Systems, pp. 78–84.Google Scholar
- Cho, K., Bahdanau, D., Van Merrienboer, B., & Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:14091259.
- Chuan, C. H., & Herremans, D. (2018). Modeling temporal tonal relations in polyphonic music through deep networks with a novel image-based representation. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16679.
- Connor, J., Atlas, L. E., & Martin, D. R. (1991). Recurrent networks and NARMA modeling. Advances in Neural Information Processing Systems, pp. 301–308.Google Scholar
- Dasgupta, S., & Osogami, T. (2017). Nonlinear dynamic Boltzmann machines for time-series prediction.Google Scholar
- Dong, H.-W., Yang, L. C., Hsiao, W.-Y., & Yang, Y. H. (2018). MuseGAN: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment.Google Scholar
- Elman, J. L. (1990). Finding structure in time. Cognitive Science, pp. 179–211.Google Scholar
- Frigola, R., & Rasmussen, C. E. (2014). Integrated pre-processing for Bayesian nonlinear system identification with Gaussian processes. IEEE Conference on Decision and Control, pp. 552–560.Google Scholar
- Frigola-Alcade, R. (2015). Bayesian time series learning with Gaussian processes. Ph.D. thesis, University of Cambridge.Google Scholar
- Huang, N. E., Shen, Z., Long, S. R., Wu, M. C., Shih, H. H., Zheng, Q., et al. (1998). The empirical mode decomposition and Hilbert spectrum for nonlinear and nonstationary time series analysis. Proceedings of the Royal Society of London. Series A, 454, 903–995.MathSciNetCrossRefzbMATHGoogle Scholar
- Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, pp. 1097–1105.Google Scholar
- Lai, G., Chang, W. C., Yang, Y., & Liu, H. (2018). Modeling long- and short-term temporal patterns with deep neural networks. SIGIR, pp. 95–104.Google Scholar
- LeCun, Y., & Bengio, Y. (1995). Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks.Google Scholar
- Luong, T., Pham, H., & Manning, C. D. (2015). Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 conference on empirical methods in natural language processing, pp. 1412–1421.Google Scholar
- Nicolas Boulanger-Lewandowski, Y. B., & Vincent, P. (2012). Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription.Google Scholar
- Qin, Y., Song, D., Cheng, H., Cheng, W., Jiang, G., & Cottrell, G. W. (2017). A dual-stage attention-based recurrent neural network for time series prediction. In IJCAI’17, pp. 2627–2633. http://dl.acm.org/citation.cfm?id=3172077.3172254.
- Raffel, C. (2016). Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching. Ph.D. thesis.Google Scholar
- Rippel, O., Snoek, J., & Adams, R. P. (2015). Spectral representations for convolutional neural networks. NIPS, pp. 2449–2457.Google Scholar
- Roberts, S., Osborne, M., Ebden, M., Reece, S., Gibson, N., & Aigrain, S. (2011). Gaussian processes for time-series modelling. Philosophical Transactions of the Royal Society A.Google Scholar
- Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by backpropagating errors. Nature, pp. 533–536.Google Scholar
- Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, pp. 3104–3112.Google Scholar
- Tong, H., & Lim, K. S. (2009). Threshold autoregression, limit cycles and cyclical data. In Exploration of a nonlinear world: An appreciation of Howell Tong’s contributions to statistics, World Scientific, pp. 9–56.Google Scholar
- Vapnik, V., Golowich, S. E., & Smola, A. (1997). Support vector method for function approximation, regression estimation, and signal processing. Advances in Neural Information Processing Systems, pp. 281–287.Google Scholar
- Werbos, P. J. (1990). Backpropagation through time: What it does and how to do it. Proceedings of the IEEE, pp. 1550–1560.Google Scholar
- Zhang, G. P. (2003). Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing, pp. 159–175.Google Scholar
- Zhang, G., Patuwo, B. E., & Hu, M. Y. (1998). Forecasting with artificial neural networks: The state of the art. International Journal of Forecasting, pp. 35–62.Google Scholar