Neural Networks

  • Thomas R. CookEmail author
Part of the Advanced Studies in Theoretical and Applied Econometrics book series (ASTA, volume 52)


In the past 10 years, neural networks have emerged as a powerful tool for predictive modeling with “big data.” This chapter discusses the potential role of neural networks as applied to economic forecasting. It begins with a brief discussion of the history of neural networks, their use in economics, and their value as universal function approximators. It proceeds to introduce the elemental structures of neural networks, taking the classic feed forward, fully connected type of neural network as its point of reference. A broad set of design decisions are discussed including regularization, activation functions, and model architecture. Following this, two additional types of neural network model are discussed: recurrent neural networks and encoder-decoder models. The chapter concludes with an empirical application of all three models to the task of forecasting unemployment.


  1. Altman, E. I., Marco, G., & Varetto, F. (1994). Corporate distress diagnosis: Comparisons using linear discriminant analysis and neural networks (the Italian experience). Journal of Banking & Finance, 18(3), 505–529.CrossRefGoogle Scholar
  2. Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise training of deep networks. In Advances in Neural Information Processing Systems (pp. 153–160).Google Scholar
  3. Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166. Retrieved from CrossRefGoogle Scholar
  4. Bland, R. (1998). Learning xor: Exploring the space of a classic problem. Stirling: Department of Computing Science and Mathematics, University of Stirling.Google Scholar
  5. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint, 1406.1078.Google Scholar
  6. Cook, T. R., & Hall, A. S. (2017). Macroeconomic indicator forecasting with deep neural networks. Federal Reserve Bank of Kansas City Research Working Paper (pp. 17-11).Google Scholar
  7. Cybenko, G. (1989). Approximations by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2, 183–192.CrossRefGoogle Scholar
  8. Dauphin, Y. N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., & Bengio, Y. (2014). Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In Advances in Neural Information Processing Systems (pp. 2933–2941).Google Scholar
  9. Dijk, D. v., Teräsvirta, T., & Franses, P. H. (2002). Smooth transition autoregressive models—A survey of recent developments. Econometric Reviews, 21(1), 1–47.CrossRefGoogle Scholar
  10. Dixon, M., Klabjan, D., & Bang, J. H. (2017). Classification-based financial markets prediction using deep neural networks. Algorithmic Finance, 6(3–4), 67–77.CrossRefGoogle Scholar
  11. Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(7), 2121–2159.Google Scholar
  12. Gal, Y., & Ghahramani, Z. (2016). Dropout as a Bayesian approximation. In Proceedings of the 33rd International Conference on Machine Learning (Vol. 3, pp. 1661–1680).Google Scholar
  13. Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: Continual prediction with LSTM. Neural Computation, 12(10), 2451–2471.CrossRefGoogle Scholar
  14. Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feed-forward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (pp. 249–256). Retrieved from
  15. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. Cambridge: MIT Press. Google Scholar
  16. Hastad, J. (1986). Almost optimal lower bounds for small depth circuits. In Proceedings of the Eighteenth Annual ACM Symposium on Theory of Computing (pp. 6–20).Google Scholar
  17. Heaton, J., Polson, N. G., & Witte, J. H. (2016). Deep learning in finance. arXiv preprint, 1602.06561.Google Scholar
  18. Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554.CrossRefGoogle Scholar
  19. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.CrossRefGoogle Scholar
  20. Hornik, K. (1991). Approximation capabilities of multilayer feedforward networks. Neural Networks, 4(2), 251–257.CrossRefGoogle Scholar
  21. Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359–366.CrossRefGoogle Scholar
  22. Huang, G.-B. (2003). Learning capability and storage capacity of two-hidden-layer feedforward networks. IEEE Transactions on Neural Networks, 14(2), 274–281.CrossRefGoogle Scholar
  23. Huang, G.-B., & Babri, H. A. (1997). General approximation theorem on feedforward networks. In Proceedings of the 1997 International Conference on Information, Communications and Signal Processing (Vol. 2, pp. 698–702). Piscataway: IEEE.Google Scholar
  24. Jothimani, D., Yadav, S. S., & Shankar, R. (2015). Discrete wavelet transform-based prediction of stock index: A study on national stock exchange fifty index. Journal of Financial Management and Analysis, 28(2), 35–42.Google Scholar
  25. Karlik, B., & Olgac, A. V. (2011). Performance analysis of various activation functions in generalized MLP architectures of neural networks. International Journal of Artificial Intelligence and Expert Systems, 1(4), 111–122. Retrieved from Google Scholar
  26. Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint, 1412.6980.Google Scholar
  27. Kristjanpoller, W., & Minutolo, M. C. (2015). Gold price volatility: A forecasting approach using the artificial neural network–GARCH model. Expert Systems with Applications, 42(20), 7245–7251.CrossRefGoogle Scholar
  28. Krogh, A., & Hertz, J. A. (1992). A simple weight decay can improve generalization. In Advances in Neural Information Processing Systems (pp. 950–957).Google Scholar
  29. Lineesh, M., Minu, K., & John, C. J. (2010). Analysis of nonstationary nonlinear economic time series of gold price: A comparative study. In International Mathematical Forum (Vol. 5, 34, pp. 1673–1683). Citeseer.Google Scholar
  30. Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the 30th International Conference on Machine Learning (Vol. 30, 1, p. 3). Retrieved from
  31. Marcellino, M., Stock, J. H., & Watson, M. W. (2006). A comparison of direct and iterated multistep AR methods for forecasting macroeconomic time series. Journal of Econometrics, 135, 499–526.CrossRefGoogle Scholar
  32. McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5(4), 115–133.CrossRefGoogle Scholar
  33. McNelis, P. (2005). Neural networks in finance: Gaining predictive edge in the market. Amsterdam: Elsevier.Google Scholar
  34. Minsky, M., & Papert, S. (1969). Perceptrons: An introduction to computation geometry (Vol. 200, pp. 355–368). Cambridge: MIT Press.Google Scholar
  35. Minu, K., Lineesh, M., & John, C. J. (2010). Wavelet neural networks for nonlinear time series analysis. Applied Mathematical Sciences, 4(50), 2485–2495.Google Scholar
  36. Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (pp. 807–814). Retrieved from
  37. Odom, M. D., & Sharda, R. (1990). A neural network model for bankruptcy prediction. In Proceedings of the 1990 International Joint Conference on Neural Networks (pp. 163–168). Piscataway: IEEE.CrossRefGoogle Scholar
  38. Ramachandran, P., Zoph, B., & Le, Q. V. (2017). Searching for activation functions. arXiv preprint, 1710.05941. Retrieved from
  39. Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv preprint, 1609.04747.Google Scholar
  40. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1985). Learning internal representations by error propagation. San Diego: California University, La Jolla Institute for Cognitive Science.CrossRefGoogle Scholar
  41. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536. Retrieved from CrossRefGoogle Scholar
  42. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S.,… Bernstein, M., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.CrossRefGoogle Scholar
  43. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929–1958.Google Scholar
  44. Stark, T. (2017). Error statistics for the survey of professional forecasters for unemployment rate. Philadelphia: Federal Reserve Bank of Philadelphia. Retrieved from Google Scholar
  45. Sussillo, D., & Abbott, L. (2014). Random walk initialization for training very deep feedforward networks. arXiv preprint, 1412.6558.Google Scholar
  46. Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems (pp. 3104–3112).Google Scholar
  47. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D.,… Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1–9).Google Scholar
  48. Tam, K. Y. (1991). Neural network models and the prediction of bank bankruptcy. Omega, 19(5), 429–445.CrossRefGoogle Scholar
  49. Telgarsky, M. (2016). Benefits of depth in neural networks. arXiv preprint, 1602.04485.Google Scholar
  50. Terasvirta, T., & Anderson, H. M. (1992). Characterizing nonlinearities in business cycles using smooth transition autoregressive models. Journal of Applied Econometrics, 7(S1), S119–S136.CrossRefGoogle Scholar
  51. Werbos, P. J. (1990). Backpropagation through time: What it does and how to do it. Proceedings of the IEEE, 78(10), 1550–1560.CrossRefGoogle Scholar
  52. Zou, D., Cao, Y., Zhou, D., & Gu, Q. (2018). Stochastic gradient descent optimizes over-parameterized deep ReLU networks. arXiv preprint, 1811.08888.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Federal Reserve Bank of Kansas CityKansas CityUSA

Personalised recommendations