Comparison of Deep Neural Networks and Deep Hierarchical Models for Spatio-Temporal Data
Spatio-temporal data are ubiquitous in the agricultural, ecological, and environmental sciences, and their study is important for understanding and predicting a wide variety of processes. One of the difficulties with modeling spatial processes that change in time is the complexity of the dependence structures that must describe how such a process varies, and the presence of high-dimensional complex datasets and large prediction domains. It is particularly challenging to specify parameterizations for nonlinear dynamic spatio-temporal models (DSTMs) that are simultaneously useful scientifically and efficient computationally. Statisticians have developed multi-level (deep) hierarchical models that can accommodate process complexity as well as the uncertainties in the predictions and inference. However, these models can be expensive and are typically application specific. On the other hand, the machine learning community has developed alternative “deep learning” approaches for nonlinear spatio-temporal modeling. These models are flexible yet are typically not implemented in a probabilistic framework. The two paradigms have many things in common and suggest hybrid approaches that can benefit from elements of each framework. This overview paper presents a brief introduction to the multi-level (deep) hierarchical DSTM (H-DSTM) framework, and deep models in machine learning, culminating with the deep neural DSTM (DN-DSTM). Recent approaches that combine elements from H-DSTMs and echo state network DN-DSTMs are presented as illustrations. Supplementary materials accompanying this paper appear online.
KeywordsBayesian Convolutional neural network CNN Dynamic model Echo state network ESN Recurrent neural network RNN
This work was partially supported by the US National Science Foundation (NSF) and the US Census Bureau under NSF Grant SES-1132031, funded through the NSF-Census Research Network (NCRN) program, and NSF Award DMS-1811745. The author would like to thank Brian Reich for encouraging the writing of this paper, Patrick McDermott for helpful discussions, Nathan Wikle for providing helpful comments on an early draft, and Jennifer Hoeting for encouraging and helpful review comments.
- Berliner, L. M. (1996), “Hierarchical Bayesian time series models,” in Maximum Entropy and Bayesian Methods, eds. Hanson, K. M. and Silver, R. N., Dordecht: Kluwer, Fundamental Theories of Physics, 79, pp. 15–22.Google Scholar
- Bingham, E. and Mannila, H. (2001), “Random projection in dimensionality reduction: applications to image and text data,” in Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp. 245–250.Google Scholar
- Chatzis, S. P. (2015), “Sparse Bayesian Recurrent Neural Networks,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, pp. 359–372.Google Scholar
- Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014), “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078.
- Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., and Darrell, T. (2015), “Long-term recurrent convolutional networks for visual recognition and description,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2625–2634.Google Scholar
- Gan, Z., Li, C., Chen, C., Pu, Y., Su, Q., and Carin, L. (2016), “Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling,” arXiv preprint arXiv:1611.08034.
- Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014), “Generative adversarial nets,” in Advances in neural information processing systems, pp. 2672–2680.Google Scholar
- Graves, A., Mohamed, A.-r., and Hinton, G. (2013), “Speech recognition with deep recurrent neural networks,” in 2013 ieee international conference on acoustics, speech and signal processing (icassp), IEEE, pp. 6645–6649.Google Scholar
- Heaton, M. J., Datta, A., Finley, A. O., Furrer, R., Guinness, J., Guhaniyogi, R., Gerber, F., Gramacy, R. B., Hammerling, D., Katzfuss, M., et al. (2018), “A case study competition among methods for analyzing large spatial data,” Journal of Agricultural, Biological and Environmental Statistics, 1–28.Google Scholar
- Hermans, M. and Schrauwen, B. (2013), “Training and analysing deep recurrent neural networks,” in Advances in neural information processing systems, pp. 190–198.Google Scholar
- Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A.-r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., et al. (2012), “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal processing magazine, 29, 82–97.CrossRefGoogle Scholar
- Jaeger, H. (2007), “Discovering multiscale dynamical features with hierarchical echo state networks,” Tech. rep., Jacobs University Bremen.Google Scholar
- Karpatne, A., Atluri, G., Faghmous, J. H., Steinbach, M., Banerjee, A., Ganguly, A., Shekhar, S., Samatova, N., and Kumar, V. (2017), “Theory-guided data science: A new paradigm for scientific discovery from data,” IEEE Transactions on Knowledge and Data Engineering, 29, 2318–2331.CrossRefGoogle Scholar
- Keren, G. and Schuller, B. (2016), “Convolutional RNN: an enhanced model for extracting features from sequential data,” in Neural Networks (IJCNN), 2016 International Joint Conference on, IEEE, pp. 3412–3419.Google Scholar
- Ma, Q., Shen, L., and Cottrell, G. W. (2017), “Deep-ESN: A Multiple Projection-encoding Hierarchical Reservoir Computing Framework,” arXiv preprint arXiv:1711.05255.
- McDermott, P. L. and Wikle, C. K. (2017a), “Bayesian Recurrent Neural Network Models for Forecasting and Quantifying Uncertainty in Spatial-Temporal Data,” arXiv preprint arXiv:1711.00636.
- McDermott, P. L. and Wikle, C. K. (2018), “Deep echo state networks with uncertainty quantification for spatio-temporal forecasting,” Environmetrics, e2553.Google Scholar
- Quiroz, M., Nott, D. J., and Kohn, R. (2018), “Gaussian variational approximation for high-dimensional state space models,” arXiv preprint arXiv:1801.07873.
- Shalev-Shwartz, S., Shamir, O., and Shammah, S. (2017), “Failures of deep learning,” arXiv preprint arXiv:1703.07950.
- Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al. (2016), “Mastering the game of Go with deep neural networks and tree search,” nature, 529, 484.Google Scholar
- Snoek, J., Rippel, O., Swersky, K., Kiros, R., Satish, N., Sundaram, N., Patwary, M., Prabhat, M., and Adams, R. (2015), “Scalable bayesian optimization using deep neural networks,” in International Conference on Machine Learning, pp. 2171–2180.Google Scholar
- Takens, F. (1981), “Detecting strange attractors in turbulence,” Lecture notes in mathematics, 898, 366–381.Google Scholar
- Tong, Z. and Tanaka, G. (2018), “Reservoir Computing with Untrained Convolutional Neural Networks for Image Recognition,” in 2018 24th International Conference on Pattern Recognition (ICPR), IEEE, pp. 1289–1294.Google Scholar
- Tran, M.-N., Nguyen, N., Nott, D., and Kohn, R. (2018), “Bayesian Deep Net GLM and GLMM,” arXiv preprint arXiv:1805.10157.
- Xingjian, S., Chen, Z., Wang, H., Yeung, D.-Y., Wong, W.-K., and Woo, W.-c. (2015), “Convolutional LSTM network: A machine learning approach for precipitation nowcasting,” in Advances in neural information processing systems, pp. 802–810.Google Scholar