Introduction

Dixon, Matthew F.; Halperin, Igor; Bilokon, Paul

doi:10.1007/978-3-030-41068-1_1

Matthew F. Dixon⁴,
Igor Halperin⁵ &
Paul Bilokon⁶

12k Accesses
1 Citations

Abstract

This chapter introduces the industry context for machine learning in finance, discussing the critical events that have shaped the finance industry’s need for machine learning and the unique barriers to adoption. The finance industry has adopted machine learning to varying degrees of sophistication. How it has been adopted is heavily fragmented by the academic disciplines underpinning the applications. We view some key mathematical examples that demonstrate the nature of machine learning and how it is used in practice, with the focus on building intuition for more technical expositions in later chapters. In particular, we begin to address many finance practitioner’s concerns that neural networks are a “black-box” by showing how they are related to existing well-established techniques such as linear regression, logistic regression, and autoregressive time series models. Such arguments are developed further in later chapters. This chapter also introduces reinforcement learning for finance and is followed by more in-depth case studies highlighting the design concepts and practical challenges of applying machine learning in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Hardcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The model is referred to as non-parametric if the parameter space is infinite dimensional and parametric if the parameter space is finite dimensional.
2.
Photo: Jacobs, Konrad [CC BY-SA 2.0 de (https://creativecommons.org/licenses/by-sa/2.0/de/deed.en)].
3.
C. Shannon, A Mathematical Theory of Communication, The Bell System Technical Journal, Vol. 27, pp. 379-423, 623-656, July, October, 1948.
4.
Note that we do not treat the input as a layer. So there are L − 1 hidden layers and an output layer.
5.
While the functional form of the map is the same as linear regression, neural networks do not assume a data generation process and hence inference is not identical to ordinary least squares regression.
6.
A wealth process is self-financing if, at each time step, any purchase of an additional quantity of the risky asset is funded from the bank account. Vice versa, any proceeds from a sale of some quantity of the asset go to the bank account.
7.
Note, for avoidance of doubt, that the risk-aversion parameter must be scaled by a factor of $\frac {1}{2}$ to ensure consistency with the finance literature.
8.
The question of how much data is needed to train a neural network is a central one, with the immediate concern being insufficient data to avoid over-fitting. The amount of data needed is complex to assess; however, it is partly dependent on the number of edges in the network and can be assessed through bias–variance analysis, as described in Chap. D.
9.
Note that the composition of the S&P 500 changes over time and so we should interpret a feature as a fixed symbol.
10.
The strategy refers the choice of weight if Player 2 is to choose a payoff V = wV ₁ + (1 − w)V ₂, i.e. a weighted combination of payoffs V ₁ and V ₂.

References

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle (pp. 267–281).
Google Scholar
Akcora, C. G., Dixon, M. F., Gel, Y. R., & Kantarcioglu, M. (2018). Bitcoin risk modeling with blockchain graphs. Economics Letters,173(C), 138–142.
MATH Google Scholar
Arnold, V. I. (1957). On functions of three variables (Vol. 114, pp. 679–681).
Google Scholar
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics,31, 307–327.
MathSciNet MATH Google Scholar
Box, G. E. P., & Jenkins, G. M. (1976). Time series analysis, forecasting, and control. San Francisco: Holden-Day.
MATH Google Scholar
Box, G. E. P., Jenkins, G. M., & Reinsel, G. C. (1994). Time series analysis, forecasting, and control (third ed.). Englewood Cliffs, NJ: Prentice-Hall.
MATH Google Scholar
Breiman, L. (2001). Statistical modeling: the two cultures (with comments and a rejoinder by the author). Statistical Science,16(3), 199–231.
MathSciNet MATH Google Scholar
Cont, R., & de Larrard, A. (2013). Price dynamics in a Markovian limit order market. SIAM Journal on Financial Mathematics,4(1), 1–25.
MathSciNet MATH Google Scholar
de Prado, M. (2018). Advances in financial machine learning. Wiley.
Google Scholar
de Prado, M. L. (2019). Beyond econometrics: A roadmap towards financial machine learning. SSRN. Available at SSRN: https://ssrn.com/abstract=3365282 or http://dx.doi.org/10.2139/ssrn.3365282.
DeepMind (2016). DeepMind AI reduces Google data centre cooling bill by 40%. https://deepmind.com/blog/deepmind-ai-reduces-google-data-centre-cooling-bill-40/.
DeepMind (2017). The story of AlphaGo so far. https://deepmind.com/research/alphago/.
Dhar, V. (2013, December). Data science and prediction. Commun. ACM,56(12), 64–73.
Google Scholar
Dixon, M. (2018a). A high frequency trade execution model for supervised learning. High Frequency,1(1), 32–52.
Google Scholar
Dixon, M. (2018b). Sequence classification of the limit order book using recurrent neural networks. Journal of Computational Science,24, 277–286.
MathSciNet Google Scholar
Dixon, M., & Halperin, I. (2019). The four horsemen of machine learning in finance.
Google Scholar
Dixon, M., Polson, N., & Sokolov, V. (2018). Deep learning for spatio-temporal modeling: Dynamic traffic flows and high frequency trading. ASMB.
Google Scholar
Dixon, M. F., & Polson, N. G. (2019, Mar). Deep fundamental factor models. arXiv e-prints, arXiv:1903.07677.
Google Scholar
Dyhrberg, A. (2016). Bitcoin, gold and the dollar – a GARCH volatility analysis. Finance Research Letters.
Google Scholar
Elman, J. L. (1991, Sep). Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning,7(2), 195–225.
Google Scholar
Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., et al. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature,542(7639), 115–118.
Google Scholar
Flood, M., Jagadish, H. V., & Raschid, L. (2016). Big data challenges and opportunities in financial stability monitoring. Financial Stability Review, (20), 129–142.
Google Scholar
Gomber, P., Koch, J.-A., & Siering, M. (2017). Digital finance and fintech: current research and future research directions. Journal of Business Economics,7(5), 537–580.
Google Scholar
Gottlieb, O., Salisbury, C., Shek, H., & Vaidyanathan, V. (2006). Detecting corporate fraud: An application of machine learning. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.142.7470.
Google Scholar
Graves, A. (2012). Supervised sequence labelling with recurrent neural networks. Studies in Computational intelligence. Heidelberg, New York: Springer.
Google Scholar
Gu, S., Kelly, B. T., & Xiu, D. (2018). Empirical asset pricing via machine learning. Chicago Booth Research Paper 18–04.
Google Scholar
Harvey, C. R., Liu, Y., & Zhu, H. (2016). …and the cross-section of expected returns. The Review of Financial Studies,29(1), 5–68.
Google Scholar
Hornik, K., Stinchcombe, M., & White, H. (1989, July). Multilayer feedforward networks are universal approximators. Neural Netw.,2(5), 359–366.
MATH Google Scholar
Kearns, M., & Nevmyvaka, Y. (2013). Machine learning for market microstructure and high frequency trading. High Frequency Trading - New Realities for Traders.
Google Scholar
Kercheval, A., & Zhang, Y. (2015). Modeling high-frequency limit order book dynamics with support vector machines. Journal of Quantitative Finance,15(8), 1315–1329.
MATH Google Scholar
Kolmogorov, A. N. (1957). On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. Dokl. Akad. Nauk SSSR,114, 953–956.
MathSciNet MATH Google Scholar
Kubota, T. (2017, January). Artificial intelligence used to identify skin cancer.
Google Scholar
Kullback, S., & Leibler, R. A. (1951, 03). On information and sufficiency. Ann. Math. Statist.,22(1), 79–86.
MATH Google Scholar
McCarthy, J., Minsky, M. L., Rochester, N., & Shannon, C. E. (1955, August). A proposal for the Dartmouth summer research project on artificial intelligence. http://www-formal.stanford.edu/jmc/history/dartmouth/dartmouth.html.
Philipp, G., & Carbonell, J. G. (2017, Dec). Nonparametric neural networks. arXiv e-prints, arXiv:1712.05440.
Google Scholar
Philippon, T. (2016). The fintech opportunity. CEPR Discussion Papers 11409, C.E.P.R. Discussion Papers.
Google Scholar
Pinar Saygin, A., Cicekli, I., & Akman, V. (2000, November). Turing test: 50 years later. Minds Mach.,10(4), 463–518.
Google Scholar
Poggio, T. (2016). Deep learning: mathematics and neuroscience. A Sponsored Supplement to ScienceBrain-Inspired intelligent robotics: The intersection of robotics and neuroscience, 9–12.
Google Scholar
Shannon, C. (1948). A mathematical theory of communication. Bell System Technical Journal,27.
Google Scholar
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition.
Google Scholar
Sirignano, J., Sadhwani, A., & Giesecke, K. (2016, July). Deep learning for mortgage risk. ArXiv e-prints.
Google Scholar
Sirignano, J. A. (2016). Deep learning for limit order books. arXiv preprint arXiv:1601.01987.
Google Scholar
Sovbetov, Y. (2018). Factors influencing cryptocurrency prices: Evidence from Bitcoin, Ethereum, Dash, Litcoin, and Monero. Journal of Economics and Financial Analysis,2(2), 1–27.
Google Scholar
Stein, H. (2012). Counterparty risk, CVA, and Basel III.
Google Scholar
Turing, A. M. (1995). Computers & thought. Chapter Computing Machinery and Intelligence (pp. 11–35). Cambridge, MA, USA: MIT Press.
Google Scholar
Wiener, N. (1964). Extrapolation, interpolation, and smoothing of stationary time series. The MIT Press.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Applied Mathematics, Illinois Institute of Technology, Chicago, IL, USA
Matthew F. Dixon
Tandon School of Engineering, New York University, Brooklyn, NY, USA
Igor Halperin
Department of Mathematics, Imperial College London, London, UK
Paul Bilokon

Authors

Matthew F. Dixon
View author publications
You can also search for this author in PubMed Google Scholar
Igor Halperin
View author publications
You can also search for this author in PubMed Google Scholar
Paul Bilokon
View author publications
You can also search for this author in PubMed Google Scholar

Appendix

1.1 I Answers to Multiple Choice Questions

Question 1

Answer: 1, 2.

Answer 3 is incorrect. While it is true that unsupervised learning does not require a human supervisor to train the model, it is false to presume that the approach is superior.

Answer 4 is incorrect. Reinforcement learning cannot be viewed as a generalization of supervised learning to Markov Decision Processes. The reason is that reinforcement learning uses rewards to reinforce decisions, rather than labels to define the correct decision. For this reason, reinforcement learning uses a weaker form of supervision.

Question 2

Answer: 1,2,3.

Answer 4 is incorrect. Two separate binary models $\{g^{(1)}_i(X | \theta )\}_{i=0}^{1}$ and $\{g^{(2)}_i(X | \theta )\}_{i=0}^{1}$ will, in general, not produce the same output as a single, multi-class, model $\{g_i(X | \theta )\}_{i=0}^{3}$. Consider, as a counter example, the logistic models $g^{(1)}_0=g_0(X | \theta _1)=\frac {\exp \{-X^T\theta _1\}}{1+ \exp \{-X^T\theta _1\}}$ and $g^{(2)}_0=g_0(X | \theta _2)=\frac {\exp \{-X^T\theta _2\}}{1+ \exp \{-X^T\theta _2\}}$, compared with the multi-class model

$$\displaystyle \begin{aligned} g_i(X | \boldsymbol{\theta}')=\text{softmax}(\exp\{X^T\boldsymbol{\theta}'\})=\frac{\exp\{(X^T\boldsymbol{\theta}')_i\}}{\sum_{k=0}^K\exp\{(X^T\boldsymbol{\theta}')_k\}}. \end{aligned} $$

(1.26)

If we set $\theta _1=\boldsymbol {\theta }^{\prime }_0-\boldsymbol {\theta }^{\prime }_1$ and $\boldsymbol {\theta }^{\prime }_2= \boldsymbol {\theta }^{\prime }_3=0$, then the multi-class model is equivalent to Model 1. Similarly if we set $\theta _2=\boldsymbol {\theta }^{\prime }_2-\boldsymbol {\theta }^{\prime }_3$ and $\boldsymbol {\theta }^{\prime }_0= \boldsymbol {\theta }^{\prime }_1=0$, then the multi-class model is equivalent to Model 2. However, we cannot simultaneously match the outputs of Model 1 and Model 2 with the multi-class model.

Question 3

Answer: 1,2,3.

Answer 4 is incorrect. The layers in a deep recurrent network provide more expressibility between each lagged input and the hidden state variable, but are unrelated to the amount of memory in the network. The hidden layers in any multilayered perceptron are not the hidden state variables in our time series model. It is the degree of unfolding, i.e. number of hidden state vectors which determines the amount of memory in any recurrent network.

Question 4

Answer: 2.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dixon, M.F., Halperin, I., Bilokon, P. (2020). Introduction. In: Machine Learning in Finance. Springer, Cham. https://doi.org/10.1007/978-3-030-41068-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-41068-1_1
Published: 02 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41067-4
Online ISBN: 978-3-030-41068-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Introduction

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Appendix

Appendix

1.1 I Answers to Multiple Choice Questions

Question 1

Question 2

Question 3

Question 4

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation