Skip to main content

Introduction

  • Chapter
  • First Online:
Machine Learning in Finance

Abstract

This chapter introduces the industry context for machine learning in finance, discussing the critical events that have shaped the finance industry’s need for machine learning and the unique barriers to adoption. The finance industry has adopted machine learning to varying degrees of sophistication. How it has been adopted is heavily fragmented by the academic disciplines underpinning the applications. We view some key mathematical examples that demonstrate the nature of machine learning and how it is used in practice, with the focus on building intuition for more technical expositions in later chapters. In particular, we begin to address many finance practitioner’s concerns that neural networks are a “black-box” by showing how they are related to existing well-established techniques such as linear regression, logistic regression, and autoregressive time series models. Such arguments are developed further in later chapters. This chapter also introduces reinforcement learning for finance and is followed by more in-depth case studies highlighting the design concepts and practical challenges of applying machine learning in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The model is referred to as non-parametric if the parameter space is infinite dimensional and parametric if the parameter space is finite dimensional.

  2. 2.

    Photo: Jacobs, Konrad [CC BY-SA 2.0 de (https://creativecommons.org/licenses/by-sa/2.0/de/deed.en)].

  3. 3.

    C. Shannon, A Mathematical Theory of Communication, The Bell System Technical Journal, Vol. 27, pp. 379-423, 623-656, July, October, 1948.

  4. 4.

    Note that we do not treat the input as a layer. So there are L − 1 hidden layers and an output layer.

  5. 5.

    While the functional form of the map is the same as linear regression, neural networks do not assume a data generation process and hence inference is not identical to ordinary least squares regression.

  6. 6.

    A wealth process is self-financing if, at each time step, any purchase of an additional quantity of the risky asset is funded from the bank account. Vice versa, any proceeds from a sale of some quantity of the asset go to the bank account.

  7. 7.

    Note, for avoidance of doubt, that the risk-aversion parameter must be scaled by a factor of \(\frac {1}{2}\) to ensure consistency with the finance literature.

  8. 8.

    The question of how much data is needed to train a neural network is a central one, with the immediate concern being insufficient data to avoid over-fitting. The amount of data needed is complex to assess; however, it is partly dependent on the number of edges in the network and can be assessed through bias–variance analysis, as described in Chap. D.

  9. 9.

    Note that the composition of the S&P 500 changes over time and so we should interpret a feature as a fixed symbol.

  10. 10.

    The strategy refers the choice of weight if Player 2 is to choose a payoff V = wV 1 + (1 − w)V 2, i.e. a weighted combination of payoffs V 1 and V 2.

References

  • Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle (pp. 267–281).

    Google Scholar 

  • Akcora, C. G., Dixon, M. F., Gel, Y. R., & Kantarcioglu, M. (2018). Bitcoin risk modeling with blockchain graphs. Economics Letters,173(C), 138–142.

    MATH  Google Scholar 

  • Arnold, V. I. (1957). On functions of three variables (Vol. 114, pp. 679–681).

    Google Scholar 

  • Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics,31, 307–327.

    MathSciNet  MATH  Google Scholar 

  • Box, G. E. P., & Jenkins, G. M. (1976). Time series analysis, forecasting, and control. San Francisco: Holden-Day.

    MATH  Google Scholar 

  • Box, G. E. P., Jenkins, G. M., & Reinsel, G. C. (1994). Time series analysis, forecasting, and control (third ed.). Englewood Cliffs, NJ: Prentice-Hall.

    MATH  Google Scholar 

  • Breiman, L. (2001). Statistical modeling: the two cultures (with comments and a rejoinder by the author). Statistical Science,16(3), 199–231.

    MathSciNet  MATH  Google Scholar 

  • Cont, R., & de Larrard, A. (2013). Price dynamics in a Markovian limit order market. SIAM Journal on Financial Mathematics,4(1), 1–25.

    MathSciNet  MATH  Google Scholar 

  • de Prado, M. (2018). Advances in financial machine learning. Wiley.

    Google Scholar 

  • de Prado, M. L. (2019). Beyond econometrics: A roadmap towards financial machine learning. SSRN. Available at SSRN: https://ssrn.com/abstract=3365282 or http://dx.doi.org/10.2139/ssrn.3365282.

  • DeepMind (2016). DeepMind AI reduces Google data centre cooling bill by 40%. https://deepmind.com/blog/deepmind-ai-reduces-google-data-centre-cooling-bill-40/.

  • DeepMind (2017). The story of AlphaGo so far. https://deepmind.com/research/alphago/.

  • Dhar, V. (2013, December). Data science and prediction. Commun. ACM,56(12), 64–73.

    Google Scholar 

  • Dixon, M. (2018a). A high frequency trade execution model for supervised learning. High Frequency,1(1), 32–52.

    Google Scholar 

  • Dixon, M. (2018b). Sequence classification of the limit order book using recurrent neural networks. Journal of Computational Science,24, 277–286.

    MathSciNet  Google Scholar 

  • Dixon, M., & Halperin, I. (2019). The four horsemen of machine learning in finance.

    Google Scholar 

  • Dixon, M., Polson, N., & Sokolov, V. (2018). Deep learning for spatio-temporal modeling: Dynamic traffic flows and high frequency trading. ASMB.

    Google Scholar 

  • Dixon, M. F., & Polson, N. G. (2019, Mar). Deep fundamental factor models. arXiv e-prints, arXiv:1903.07677.

    Google Scholar 

  • Dyhrberg, A. (2016). Bitcoin, gold and the dollar – a GARCH volatility analysis. Finance Research Letters.

    Google Scholar 

  • Elman, J. L. (1991, Sep). Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning,7(2), 195–225.

    Google Scholar 

  • Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., et al. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature,542(7639), 115–118.

    Google Scholar 

  • Flood, M., Jagadish, H. V., & Raschid, L. (2016). Big data challenges and opportunities in financial stability monitoring. Financial Stability Review, (20), 129–142.

    Google Scholar 

  • Gomber, P., Koch, J.-A., & Siering, M. (2017). Digital finance and fintech: current research and future research directions. Journal of Business Economics,7(5), 537–580.

    Google Scholar 

  • Gottlieb, O., Salisbury, C., Shek, H., & Vaidyanathan, V. (2006). Detecting corporate fraud: An application of machine learning. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.142.7470.

    Google Scholar 

  • Graves, A. (2012). Supervised sequence labelling with recurrent neural networks. Studies in Computational intelligence. Heidelberg, New York: Springer.

    Google Scholar 

  • Gu, S., Kelly, B. T., & Xiu, D. (2018). Empirical asset pricing via machine learning. Chicago Booth Research Paper 18–04.

    Google Scholar 

  • Harvey, C. R., Liu, Y., & Zhu, H. (2016). …and the cross-section of expected returns. The Review of Financial Studies,29(1), 5–68.

    Google Scholar 

  • Hornik, K., Stinchcombe, M., & White, H. (1989, July). Multilayer feedforward networks are universal approximators. Neural Netw.,2(5), 359–366.

    MATH  Google Scholar 

  • Kearns, M., & Nevmyvaka, Y. (2013). Machine learning for market microstructure and high frequency trading. High Frequency Trading - New Realities for Traders.

    Google Scholar 

  • Kercheval, A., & Zhang, Y. (2015). Modeling high-frequency limit order book dynamics with support vector machines. Journal of Quantitative Finance,15(8), 1315–1329.

    MATH  Google Scholar 

  • Kolmogorov, A. N. (1957). On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. Dokl. Akad. Nauk SSSR,114, 953–956.

    MathSciNet  MATH  Google Scholar 

  • Kubota, T. (2017, January). Artificial intelligence used to identify skin cancer.

    Google Scholar 

  • Kullback, S., & Leibler, R. A. (1951, 03). On information and sufficiency. Ann. Math. Statist.,22(1), 79–86.

    MATH  Google Scholar 

  • McCarthy, J., Minsky, M. L., Rochester, N., & Shannon, C. E. (1955, August). A proposal for the Dartmouth summer research project on artificial intelligence. http://www-formal.stanford.edu/jmc/history/dartmouth/dartmouth.html.

  • Philipp, G., & Carbonell, J. G. (2017, Dec). Nonparametric neural networks. arXiv e-prints, arXiv:1712.05440.

    Google Scholar 

  • Philippon, T. (2016). The fintech opportunity. CEPR Discussion Papers 11409, C.E.P.R. Discussion Papers.

    Google Scholar 

  • Pinar Saygin, A., Cicekli, I., & Akman, V. (2000, November). Turing test: 50 years later. Minds Mach.,10(4), 463–518.

    Google Scholar 

  • Poggio, T. (2016). Deep learning: mathematics and neuroscience. A Sponsored Supplement to ScienceBrain-Inspired intelligent robotics: The intersection of robotics and neuroscience, 9–12.

    Google Scholar 

  • Shannon, C. (1948). A mathematical theory of communication. Bell System Technical Journal,27.

    Google Scholar 

  • Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition.

    Google Scholar 

  • Sirignano, J., Sadhwani, A., & Giesecke, K. (2016, July). Deep learning for mortgage risk. ArXiv e-prints.

    Google Scholar 

  • Sirignano, J. A. (2016). Deep learning for limit order books. arXiv preprint arXiv:1601.01987.

    Google Scholar 

  • Sovbetov, Y. (2018). Factors influencing cryptocurrency prices: Evidence from Bitcoin, Ethereum, Dash, Litcoin, and Monero. Journal of Economics and Financial Analysis,2(2), 1–27.

    Google Scholar 

  • Stein, H. (2012). Counterparty risk, CVA, and Basel III.

    Google Scholar 

  • Turing, A. M. (1995). Computers & thought. Chapter Computing Machinery and Intelligence (pp. 11–35). Cambridge, MA, USA: MIT Press.

    Google Scholar 

  • Wiener, N. (1964). Extrapolation, interpolation, and smoothing of stationary time series. The MIT Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Appendix

Appendix

1.1 I Answers to Multiple Choice Questions

Question 1

Answer: 1, 2.

Answer 3 is incorrect. While it is true that unsupervised learning does not require a human supervisor to train the model, it is false to presume that the approach is superior.

Answer 4 is incorrect. Reinforcement learning cannot be viewed as a generalization of supervised learning to Markov Decision Processes. The reason is that reinforcement learning uses rewards to reinforce decisions, rather than labels to define the correct decision. For this reason, reinforcement learning uses a weaker form of supervision.

Question 2

Answer: 1,2,3.

Answer 4 is incorrect. Two separate binary models \(\{g^{(1)}_i(X | \theta )\}_{i=0}^{1}\) and \(\{g^{(2)}_i(X | \theta )\}_{i=0}^{1}\) will, in general, not produce the same output as a single, multi-class, model \(\{g_i(X | \theta )\}_{i=0}^{3}\). Consider, as a counter example, the logistic models \(g^{(1)}_0=g_0(X | \theta _1)=\frac {\exp \{-X^T\theta _1\}}{1+ \exp \{-X^T\theta _1\}}\) and \(g^{(2)}_0=g_0(X | \theta _2)=\frac {\exp \{-X^T\theta _2\}}{1+ \exp \{-X^T\theta _2\}}\), compared with the multi-class model

$$\displaystyle \begin{aligned} g_i(X | \boldsymbol{\theta}')=\text{softmax}(\exp\{X^T\boldsymbol{\theta}'\})=\frac{\exp\{(X^T\boldsymbol{\theta}')_i\}}{\sum_{k=0}^K\exp\{(X^T\boldsymbol{\theta}')_k\}}. \end{aligned} $$
(1.26)

If we set \(\theta _1=\boldsymbol {\theta }^{\prime }_0-\boldsymbol {\theta }^{\prime }_1\) and \(\boldsymbol {\theta }^{\prime }_2= \boldsymbol {\theta }^{\prime }_3=0\), then the multi-class model is equivalent to Model 1. Similarly if we set \(\theta _2=\boldsymbol {\theta }^{\prime }_2-\boldsymbol {\theta }^{\prime }_3\) and \(\boldsymbol {\theta }^{\prime }_0= \boldsymbol {\theta }^{\prime }_1=0\), then the multi-class model is equivalent to Model 2. However, we cannot simultaneously match the outputs of Model 1 and Model 2 with the multi-class model.

Question 3

Answer: 1,2,3.

Answer 4 is incorrect. The layers in a deep recurrent network provide more expressibility between each lagged input and the hidden state variable, but are unrelated to the amount of memory in the network. The hidden layers in any multilayered perceptron are not the hidden state variables in our time series model. It is the degree of unfolding, i.e. number of hidden state vectors which determines the amount of memory in any recurrent network.

Question 4

Answer: 2.

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Dixon, M.F., Halperin, I., Bilokon, P. (2020). Introduction. In: Machine Learning in Finance. Springer, Cham. https://doi.org/10.1007/978-3-030-41068-1_1

Download citation

Publish with us

Policies and ethics