Neural Networks

Cook, Thomas R.

doi:10.1007/978-3-030-31150-6_6

Neural Networks

Thomas R. Cook⁷

Chapter
First Online: 29 November 2019

2932 Accesses
6 Citations

Part of the book series: Advanced Studies in Theoretical and Applied Econometrics ((ASTA,volume 52))

Abstract

In the past 10 years, neural networks have emerged as a powerful tool for predictive modeling with “big data.” This chapter discusses the potential role of neural networks as applied to economic forecasting. It begins with a brief discussion of the history of neural networks, their use in economics, and their value as universal function approximators. It proceeds to introduce the elemental structures of neural networks, taking the classic feed forward, fully connected type of neural network as its point of reference. A broad set of design decisions are discussed including regularization, activation functions, and model architecture. Following this, two additional types of neural network model are discussed: recurrent neural networks and encoder-decoder models. The chapter concludes with an empirical application of all three models to the task of forecasting unemployment.

The views expressed are those of the author and do not necessarily reflect the views of the Federal Reserve Bank of Kansas City or the Federal Reserve System.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Hardcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Specifically, a subset of the imagenet dataset. See Russakovsky et al. (2015).
2.
We use log-loss because it is the convention (in both the machine learning and statistical literature) for this type of categorization problem. Other loss functions, including mean squared error would likely work as well.
3.
See Bland (1998), Rumelhart et al. (1985) for further discussion.
4.
In this setting, the goal of the model fitting process would be to minimize this objective function.
5.
An alternative form of the wavelet neural network uses wavelet functions as activation functions for hidden nodes in the network. This form of wavelet network, however, is designed to improve optimization speeds, create self-assembling networks, or achieve ends other than accommodating non-stationary data.
6.
We focus here on a sequence of scalar values. All discussion in this section extends to sequences of multi-dimensional input (e.g., a sequence of vectors).
7.
In other words, the output of the decoder LSTM prior to the final, fully connected layer.
8.
This is to be contrasted with an iterative model, in which the next-step-ahead is forecast and then iterative extrapolation is used to generate a prediction for the desired forecast horizon.

References

Altman, E. I., Marco, G., & Varetto, F. (1994). Corporate distress diagnosis: Comparisons using linear discriminant analysis and neural networks (the Italian experience). Journal of Banking & Finance, 18(3), 505–529.
Article Google Scholar
Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise training of deep networks. In Advances in Neural Information Processing Systems (pp. 153–160).
Google Scholar
Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166. Retrieved from http://www.comp.hkbu.edu.hk/~markus/teaching/comp7650/tnn-%2094-gradient.pdf
Article Google Scholar
Bland, R. (1998). Learning xor: Exploring the space of a classic problem. Stirling: Department of Computing Science and Mathematics, University of Stirling.
Google Scholar
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint, 1406.1078.
Google Scholar
Cook, T. R., & Hall, A. S. (2017). Macroeconomic indicator forecasting with deep neural networks. Federal Reserve Bank of Kansas City Research Working Paper (pp. 17-11).
Google Scholar
Cybenko, G. (1989). Approximations by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2, 183–192.
Article Google Scholar
Dauphin, Y. N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., & Bengio, Y. (2014). Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In Advances in Neural Information Processing Systems (pp. 2933–2941).
Google Scholar
Dijk, D. v., Teräsvirta, T., & Franses, P. H. (2002). Smooth transition autoregressive models—A survey of recent developments. Econometric Reviews, 21(1), 1–47.
Article Google Scholar
Dixon, M., Klabjan, D., & Bang, J. H. (2017). Classification-based financial markets prediction using deep neural networks. Algorithmic Finance, 6(3–4), 67–77.
Article Google Scholar
Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(7), 2121–2159.
Google Scholar
Gal, Y., & Ghahramani, Z. (2016). Dropout as a Bayesian approximation. In Proceedings of the 33rd International Conference on Machine Learning (Vol. 3, pp. 1661–1680).
Google Scholar
Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: Continual prediction with LSTM. Neural Computation, 12(10), 2451–2471.
Article Google Scholar
Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feed-forward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (pp. 249–256). Retrieved from http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf?%20hc_location=ufi
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. Cambridge: MIT Press. http://www.deeplearningbook.org
Google Scholar
Hastad, J. (1986). Almost optimal lower bounds for small depth circuits. In Proceedings of the Eighteenth Annual ACM Symposium on Theory of Computing (pp. 6–20).
Google Scholar
Heaton, J., Polson, N. G., & Witte, J. H. (2016). Deep learning in finance. arXiv preprint, 1602.06561.
Google Scholar
Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554.
Article Google Scholar
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Article Google Scholar
Hornik, K. (1991). Approximation capabilities of multilayer feedforward networks. Neural Networks, 4(2), 251–257.
Article Google Scholar
Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359–366.
Article Google Scholar
Huang, G.-B. (2003). Learning capability and storage capacity of two-hidden-layer feedforward networks. IEEE Transactions on Neural Networks, 14(2), 274–281.
Article Google Scholar
Huang, G.-B., & Babri, H. A. (1997). General approximation theorem on feedforward networks. In Proceedings of the 1997 International Conference on Information, Communications and Signal Processing (Vol. 2, pp. 698–702). Piscataway: IEEE.
Google Scholar
Jothimani, D., Yadav, S. S., & Shankar, R. (2015). Discrete wavelet transform-based prediction of stock index: A study on national stock exchange fifty index. Journal of Financial Management and Analysis, 28(2), 35–42.
Google Scholar
Karlik, B., & Olgac, A. V. (2011). Performance analysis of various activation functions in generalized MLP architectures of neural networks. International Journal of Artificial Intelligence and Expert Systems, 1(4), 111–122. Retrieved from https://www.researchgate.net/publication/%20228813985_Performance_Analysis_of_Various_Activation_Functions_in_Generalized_MLP_Architectures_of_Neural_Networks
Google Scholar
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint, 1412.6980.
Google Scholar
Kristjanpoller, W., & Minutolo, M. C. (2015). Gold price volatility: A forecasting approach using the artificial neural network–GARCH model. Expert Systems with Applications, 42(20), 7245–7251.
Article Google Scholar
Krogh, A., & Hertz, J. A. (1992). A simple weight decay can improve generalization. In Advances in Neural Information Processing Systems (pp. 950–957).
Google Scholar
Lineesh, M., Minu, K., & John, C. J. (2010). Analysis of nonstationary nonlinear economic time series of gold price: A comparative study. In International Mathematical Forum (Vol. 5, 34, pp. 1673–1683). Citeseer.
Google Scholar
Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the 30th International Conference on Machine Learning (Vol. 30, 1, p. 3). Retrieved from http://robotics.stanford.edu/~amaas/papers/%20relu_hybrid_icml2013_final.pdf
Marcellino, M., Stock, J. H., & Watson, M. W. (2006). A comparison of direct and iterated multistep AR methods for forecasting macroeconomic time series. Journal of Econometrics, 135, 499–526.
Article Google Scholar
McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5(4), 115–133.
Article Google Scholar
McNelis, P. (2005). Neural networks in finance: Gaining predictive edge in the market. Amsterdam: Elsevier.
Google Scholar
Minsky, M., & Papert, S. (1969). Perceptrons: An introduction to computation geometry (Vol. 200, pp. 355–368). Cambridge: MIT Press.
Google Scholar
Minu, K., Lineesh, M., & John, C. J. (2010). Wavelet neural networks for nonlinear time series analysis. Applied Mathematical Sciences, 4(50), 2485–2495.
Google Scholar
Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (pp. 807–814). Retrieved from http://www.cs.toronto.edu/~fritz/absps/reluICML.pdf
Odom, M. D., & Sharda, R. (1990). A neural network model for bankruptcy prediction. In Proceedings of the 1990 International Joint Conference on Neural Networks (pp. 163–168). Piscataway: IEEE.
Chapter Google Scholar
Ramachandran, P., Zoph, B., & Le, Q. V. (2017). Searching for activation functions. arXiv preprint, 1710.05941. Retrieved from https://arxiv.org/pdf/1710.05941
Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv preprint, 1609.04747.
Google Scholar
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1985). Learning internal representations by error propagation. San Diego: California University, La Jolla Institute for Cognitive Science.
Book Google Scholar
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536. Retrieved from http://www.cs.toronto.edu/~hinton/absps/naturebp.pdf
Article Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S.,… Bernstein, M., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.
Article Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929–1958.
Google Scholar
Stark, T. (2017). Error statistics for the survey of professional forecasters for unemployment rate. Philadelphia: Federal Reserve Bank of Philadelphia. Retrieved from https://www.philadelphiafed.org/-/media/research-and-data/%20real-time-center/survey-of-professional-forecasters/data-%20files/unemp/spf_error_statistics_unemp_1_aic.pdf?la=en
Google Scholar
Sussillo, D., & Abbott, L. (2014). Random walk initialization for training very deep feedforward networks. arXiv preprint, 1412.6558.
Google Scholar
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems (pp. 3104–3112).
Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D.,… Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1–9).
Google Scholar
Tam, K. Y. (1991). Neural network models and the prediction of bank bankruptcy. Omega, 19(5), 429–445.
Article Google Scholar
Telgarsky, M. (2016). Benefits of depth in neural networks. arXiv preprint, 1602.04485.
Google Scholar
Terasvirta, T., & Anderson, H. M. (1992). Characterizing nonlinearities in business cycles using smooth transition autoregressive models. Journal of Applied Econometrics, 7(S1), S119–S136.
Article Google Scholar
Werbos, P. J. (1990). Backpropagation through time: What it does and how to do it. Proceedings of the IEEE, 78(10), 1550–1560.
Article Google Scholar
Zou, D., Cao, Y., Zhou, D., & Gu, Q. (2018). Stochastic gradient descent optimizes over-parameterized deep ReLU networks. arXiv preprint, 1811.08888.
Google Scholar

Download references

Author information

Authors and Affiliations

Federal Reserve Bank of Kansas City, Kansas City, MO, USA
Thomas R. Cook

Authors

Thomas R. Cook
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas R. Cook .

Editor information

Editors and Affiliations

UHERO and Department of Economics, University of Hawaii at Manoa, Honolulu, HI, USA
Peter Fuleky

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cook, T.R. (2020). Neural Networks. In: Fuleky, P. (eds) Macroeconomic Forecasting in the Era of Big Data. Advanced Studies in Theoretical and Applied Econometrics, vol 52. Springer, Cham. https://doi.org/10.1007/978-3-030-31150-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-31150-6_6
Published: 29 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31149-0
Online ISBN: 978-3-030-31150-6
eBook Packages: Economics and FinanceEconomics and Finance (R0)

Publish with us

Policies and ethics