Skip to main content

Neural Networks

  • Chapter
  • First Online:

Part of the book series: Advanced Studies in Theoretical and Applied Econometrics ((ASTA,volume 52))

Abstract

In the past 10 years, neural networks have emerged as a powerful tool for predictive modeling with “big data.” This chapter discusses the potential role of neural networks as applied to economic forecasting. It begins with a brief discussion of the history of neural networks, their use in economics, and their value as universal function approximators. It proceeds to introduce the elemental structures of neural networks, taking the classic feed forward, fully connected type of neural network as its point of reference. A broad set of design decisions are discussed including regularization, activation functions, and model architecture. Following this, two additional types of neural network model are discussed: recurrent neural networks and encoder-decoder models. The chapter concludes with an empirical application of all three models to the task of forecasting unemployment.

The views expressed are those of the author and do not necessarily reflect the views of the Federal Reserve Bank of Kansas City or the Federal Reserve System.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   279.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Specifically, a subset of the imagenet dataset. See Russakovsky et al. (2015).

  2. 2.

    We use log-loss because it is the convention (in both the machine learning and statistical literature) for this type of categorization problem. Other loss functions, including mean squared error would likely work as well.

  3. 3.

    See Bland (1998), Rumelhart et al. (1985) for further discussion.

  4. 4.

    In this setting, the goal of the model fitting process would be to minimize this objective function.

  5. 5.

    An alternative form of the wavelet neural network uses wavelet functions as activation functions for hidden nodes in the network. This form of wavelet network, however, is designed to improve optimization speeds, create self-assembling networks, or achieve ends other than accommodating non-stationary data.

  6. 6.

    We focus here on a sequence of scalar values. All discussion in this section extends to sequences of multi-dimensional input (e.g., a sequence of vectors).

  7. 7.

    In other words, the output of the decoder LSTM prior to the final, fully connected layer.

  8. 8.

    This is to be contrasted with an iterative model, in which the next-step-ahead is forecast and then iterative extrapolation is used to generate a prediction for the desired forecast horizon.

References

  • Altman, E. I., Marco, G., & Varetto, F. (1994). Corporate distress diagnosis: Comparisons using linear discriminant analysis and neural networks (the Italian experience). Journal of Banking & Finance, 18(3), 505–529.

    Article  Google Scholar 

  • Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise training of deep networks. In Advances in Neural Information Processing Systems (pp. 153–160).

    Google Scholar 

  • Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166. Retrieved from http://www.comp.hkbu.edu.hk/~markus/teaching/comp7650/tnn-%2094-gradient.pdf

    Article  Google Scholar 

  • Bland, R. (1998). Learning xor: Exploring the space of a classic problem. Stirling: Department of Computing Science and Mathematics, University of Stirling.

    Google Scholar 

  • Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint, 1406.1078.

    Google Scholar 

  • Cook, T. R., & Hall, A. S. (2017). Macroeconomic indicator forecasting with deep neural networks. Federal Reserve Bank of Kansas City Research Working Paper (pp. 17-11).

    Google Scholar 

  • Cybenko, G. (1989). Approximations by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2, 183–192.

    Article  Google Scholar 

  • Dauphin, Y. N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., & Bengio, Y. (2014). Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In Advances in Neural Information Processing Systems (pp. 2933–2941).

    Google Scholar 

  • Dijk, D. v., Teräsvirta, T., & Franses, P. H. (2002). Smooth transition autoregressive models—A survey of recent developments. Econometric Reviews, 21(1), 1–47.

    Article  Google Scholar 

  • Dixon, M., Klabjan, D., & Bang, J. H. (2017). Classification-based financial markets prediction using deep neural networks. Algorithmic Finance, 6(3–4), 67–77.

    Article  Google Scholar 

  • Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(7), 2121–2159.

    Google Scholar 

  • Gal, Y., & Ghahramani, Z. (2016). Dropout as a Bayesian approximation. In Proceedings of the 33rd International Conference on Machine Learning (Vol. 3, pp. 1661–1680).

    Google Scholar 

  • Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: Continual prediction with LSTM. Neural Computation, 12(10), 2451–2471.

    Article  Google Scholar 

  • Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feed-forward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (pp. 249–256). Retrieved from http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf?%20hc_location=ufi

  • Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. Cambridge: MIT Press. http://www.deeplearningbook.org

    Google Scholar 

  • Hastad, J. (1986). Almost optimal lower bounds for small depth circuits. In Proceedings of the Eighteenth Annual ACM Symposium on Theory of Computing (pp. 6–20).

    Google Scholar 

  • Heaton, J., Polson, N. G., & Witte, J. H. (2016). Deep learning in finance. arXiv preprint, 1602.06561.

    Google Scholar 

  • Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554.

    Article  Google Scholar 

  • Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.

    Article  Google Scholar 

  • Hornik, K. (1991). Approximation capabilities of multilayer feedforward networks. Neural Networks, 4(2), 251–257.

    Article  Google Scholar 

  • Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359–366.

    Article  Google Scholar 

  • Huang, G.-B. (2003). Learning capability and storage capacity of two-hidden-layer feedforward networks. IEEE Transactions on Neural Networks, 14(2), 274–281.

    Article  Google Scholar 

  • Huang, G.-B., & Babri, H. A. (1997). General approximation theorem on feedforward networks. In Proceedings of the 1997 International Conference on Information, Communications and Signal Processing (Vol. 2, pp. 698–702). Piscataway: IEEE.

    Google Scholar 

  • Jothimani, D., Yadav, S. S., & Shankar, R. (2015). Discrete wavelet transform-based prediction of stock index: A study on national stock exchange fifty index. Journal of Financial Management and Analysis, 28(2), 35–42.

    Google Scholar 

  • Karlik, B., & Olgac, A. V. (2011). Performance analysis of various activation functions in generalized MLP architectures of neural networks. International Journal of Artificial Intelligence and Expert Systems, 1(4), 111–122. Retrieved from https://www.researchgate.net/publication/%20228813985_Performance_Analysis_of_Various_Activation_Functions_in_Generalized_MLP_Architectures_of_Neural_Networks

    Google Scholar 

  • Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint, 1412.6980.

    Google Scholar 

  • Kristjanpoller, W., & Minutolo, M. C. (2015). Gold price volatility: A forecasting approach using the artificial neural network–GARCH model. Expert Systems with Applications, 42(20), 7245–7251.

    Article  Google Scholar 

  • Krogh, A., & Hertz, J. A. (1992). A simple weight decay can improve generalization. In Advances in Neural Information Processing Systems (pp. 950–957).

    Google Scholar 

  • Lineesh, M., Minu, K., & John, C. J. (2010). Analysis of nonstationary nonlinear economic time series of gold price: A comparative study. In International Mathematical Forum (Vol. 5, 34, pp. 1673–1683). Citeseer.

    Google Scholar 

  • Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the 30th International Conference on Machine Learning (Vol. 30, 1, p. 3). Retrieved from http://robotics.stanford.edu/~amaas/papers/%20relu_hybrid_icml2013_final.pdf

  • Marcellino, M., Stock, J. H., & Watson, M. W. (2006). A comparison of direct and iterated multistep AR methods for forecasting macroeconomic time series. Journal of Econometrics, 135, 499–526.

    Article  Google Scholar 

  • McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5(4), 115–133.

    Article  Google Scholar 

  • McNelis, P. (2005). Neural networks in finance: Gaining predictive edge in the market. Amsterdam: Elsevier.

    Google Scholar 

  • Minsky, M., & Papert, S. (1969). Perceptrons: An introduction to computation geometry (Vol. 200, pp. 355–368). Cambridge: MIT Press.

    Google Scholar 

  • Minu, K., Lineesh, M., & John, C. J. (2010). Wavelet neural networks for nonlinear time series analysis. Applied Mathematical Sciences, 4(50), 2485–2495.

    Google Scholar 

  • Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (pp. 807–814). Retrieved from http://www.cs.toronto.edu/~fritz/absps/reluICML.pdf

  • Odom, M. D., & Sharda, R. (1990). A neural network model for bankruptcy prediction. In Proceedings of the 1990 International Joint Conference on Neural Networks (pp. 163–168). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Ramachandran, P., Zoph, B., & Le, Q. V. (2017). Searching for activation functions. arXiv preprint, 1710.05941. Retrieved from https://arxiv.org/pdf/1710.05941

  • Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv preprint, 1609.04747.

    Google Scholar 

  • Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1985). Learning internal representations by error propagation. San Diego: California University, La Jolla Institute for Cognitive Science.

    Book  Google Scholar 

  • Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536. Retrieved from http://www.cs.toronto.edu/~hinton/absps/naturebp.pdf

    Article  Google Scholar 

  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S.,… Bernstein, M., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.

    Article  Google Scholar 

  • Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929–1958.

    Google Scholar 

  • Stark, T. (2017). Error statistics for the survey of professional forecasters for unemployment rate. Philadelphia: Federal Reserve Bank of Philadelphia. Retrieved from https://www.philadelphiafed.org/-/media/research-and-data/%20real-time-center/survey-of-professional-forecasters/data-%20files/unemp/spf_error_statistics_unemp_1_aic.pdf?la=en

    Google Scholar 

  • Sussillo, D., & Abbott, L. (2014). Random walk initialization for training very deep feedforward networks. arXiv preprint, 1412.6558.

    Google Scholar 

  • Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems (pp. 3104–3112).

    Google Scholar 

  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D.,… Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1–9).

    Google Scholar 

  • Tam, K. Y. (1991). Neural network models and the prediction of bank bankruptcy. Omega, 19(5), 429–445.

    Article  Google Scholar 

  • Telgarsky, M. (2016). Benefits of depth in neural networks. arXiv preprint, 1602.04485.

    Google Scholar 

  • Terasvirta, T., & Anderson, H. M. (1992). Characterizing nonlinearities in business cycles using smooth transition autoregressive models. Journal of Applied Econometrics, 7(S1), S119–S136.

    Article  Google Scholar 

  • Werbos, P. J. (1990). Backpropagation through time: What it does and how to do it. Proceedings of the IEEE, 78(10), 1550–1560.

    Article  Google Scholar 

  • Zou, D., Cao, Y., Zhou, D., & Gu, Q. (2018). Stochastic gradient descent optimizes over-parameterized deep ReLU networks. arXiv preprint, 1811.08888.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas R. Cook .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Cook, T.R. (2020). Neural Networks. In: Fuleky, P. (eds) Macroeconomic Forecasting in the Era of Big Data. Advanced Studies in Theoretical and Applied Econometrics, vol 52. Springer, Cham. https://doi.org/10.1007/978-3-030-31150-6_6

Download citation

Publish with us

Policies and ethics