Abstract
This chapter introduces Bayesian regression and shows how it extends many of the concepts in the previous chapter. We develop kernel based machine learning methods—specifically Gaussian process regression, an important class of Bayesian machine learning methods—and demonstrate their application to “surrogate” models of derivative prices. This chapter also provides a natural starting point from which to develop intuition for the role and functional form of regularization in a frequentist setting—the subject of subsequent chapters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Surrogate models learn the output of an existing mathematical or statistical model as a function of input data.
- 2.
Note that the factor of 2 in the denominator of the second term does not cancel out because the derivative is w.r.t. \(\sigma _n^2\) and not σ n.
- 3.
This is in contrast to non-linear regressions commonly used in finance, which attempt to parameterize a non-linear function with a set of weights.
- 4.
This choice is not a real limitation in practice (since it is for the prior) and does not prevent the mean of the predictor from being nonzero.
- 5.
Gardner et al. (2018) explored 5 different approximation methods known in the numerical analysis literature.
- 6.
Note that the plot uses the original coordinates and not the re-scaled coordinates.
- 7.
Such maturities might correspond to exposure evaluation times in CVA simulation as in Crépey and Dixon (2020). The option model versus GP model are observed to produce very similar values.
References
Alvarez, M., Rosasco, L., & Lawrence, N. (2012). Kernels for vector-valued functions: A review. Foundations and Trends in Machine Learning,4(3), 195–266.
Bishop, C. M. (2006). Pattern recognition and machine learning (information science and statistics). Berlin, Heidelberg: Springer-Verlag.
Bonilla, E. V., Chai, K. M. A., & Williams, C. K. I. (2007). Multi-task Gaussian process prediction. In Proceedings of the 20th International Conference on Neural Information Processing Systems, NIPS’07, USA (pp. 153–160). Curran Associates Inc.
Chen, Z., Wang, B., & Gorban, A. N. (2017, March). Multivariate Gaussian and student− t process regression for multi-output prediction. ArXiv e-prints.
Cousin, A., Maatouk, H., & Rullière, D. (2016). Kriging of financial term structures. European Journal of Operational Research,255, 631–648.
Crépey, S., & M. Dixon (2020). Gaussian process regression for derivative portfolio modeling and application to CVA computations. Computational Finance.
da Barrosa, M. R., Salles, A. V., & de Oliveira Ribeiro, C. (2016). Portfolio optimization through kriging methods. Applied Economics,48(50), 4894–4905.
Fang, F., & Oosterlee, C. W. (2008). A novel pricing method for European options based on Fourier-cosine series expansions. SIAM J. SCI. COMPUT.
Gardner, J., Pleiss, G., Wu, R., Weinberger, K., & Wilson, A. (2018). Product kernel interpolation for scalable Gaussian processes. In International Conference on Artificial Intelligence and Statistics (pp. 1407–1416).
Gramacy, R., & D. Apley (2015). Local Gaussian process approximation for large computer experiments. Journal of Computational and Graphical Statistics,24(2), 561–578.
Hans Bühler, H., Gonon, L., Teichmann, J., & Wood, B. (2018). Deep hedging. Quantitative Finance. Forthcoming (preprint version available as arXiv:1802.03042).
Hernandez, A. (2017). Model calibration with neural networks. Risk Magazine (June 1–5). Preprint version available at SSRN.2812140, code available at https://github.com/Andres-Hernandez/CalibrationNN.
Liu, M., & Staum, J. (2010). Stochastic kriging for efficient nested simulation of expected shortfall. Journal of Risk,12(3), 3–27.
Ludkovski, M. (2018). Kriging metamodels and experimental design for Bermudan option pricing. Journal of Computational Finance,22(1), 37–77.
MacKay, D. J. (1998). Introduction to Gaussian processes. In C. M. Bishop (Ed.), Neural networks and machine learning. Springer-Verlag.
Melkumyan, A., & Ramos, F. (2011). Multi-kernel Gaussian processes. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence - Volume Two, IJCAI’11 (pp. 1408–1413). AAAI Press.
Micchelli, C. A., Xu, Y., & Zhang, H. (2006, December). Universal kernels. J. Mach. Learn. Res.,7, 2651–2667.
Murphy, K. (2012). Machine learning: a probabilistic perspective. The MIT Press.
Neal, R. M. (1996). Bayesian learning for neural networks, Volume 118 of Lecture Notes in Statistics. Springer.
Pillonetto, G., Dinuzzo, F., & Nicolao, G. D. (2010, Feb). Bayesian online multitask learning of Gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence,32(2), 193–205.
Rasmussen, C. E., & Ghahramani, Z. (2001). Occam’s razor. In In Advances in Neural Information Processing Systems 13 (pp. 294–300). MIT Press.
Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian processes for machine learning. MIT Press.
Roberts, S., Osborne, M., Ebden, M., Reece, S., Gibson, N., & Aigrain, S. (2013). Gaussian processes for time-series modelling. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences,371(1984).
Scholkopf, B., & Smola, A. J. (2001). Learning with kernels: support vector machines, regularization, optimization, and beyond. Cambridge, MA, USA: MIT Press.
Spiegeleer, J. D., Madan, D. B., Reyners, S., & Schoutens, W. (2018). Machine learning for quantitative finance: fast derivative pricing, hedging and fitting. Quantitative Finance,0(0), 1–9.
Weinan, E, Han, J., & Jentzen, A. (2017). Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. arXiv:1706.04702.
Whittle, P., & Sargent, T. J. (1983). Prediction and regulation by linear least-square methods (NED - New edition ed.). University of Minnesota Press.
Author information
Authors and Affiliations
Appendix
Appendix
Answers to Multiple Choice Questions
Question 1
Answer: 1,4,5.
Parametric Bayesian regression always treats the regression weights as random variables.
In Bayesian regression the data function f(x) is only observed if the data is assumed to be noise-free. Otherwise, the function is not directly observed.
The posterior distribution of the parameters will only be Gaussian if both the prior and the likelihood function are Gaussian. The distribution of the likelihood function depends on the assumed error distribution.
The posterior distribution of the regression weights will typically contract with increasing data. The precision matrix grows with decreasing variance and hence the variance of the posterior shrinks with increasing data. There are exceptions if, for example, there are outliers in the data.
The mean of the posterior distribution depends on both the mean and covariance of the prior if it is Gaussian. We can see this from Eq. 3.19.
Question 2
Answer: 1, 2, 4. Prediction under a Bayesian linear model requires first estimating the moments of the posterior distribution of the parameters. This is because the prediction is the expected likelihood of the new data under the posterior distribution.
The predictive distribution is Gaussian only if the posterior and likelihood distributions are Gaussian. The product of Gaussian density functions is also Gaussian.
The predictive distribution does not depend on the weights in the models - it is marginalized out under the expectation w.r.t. the posterior distribution. The variance of the predictive distribution typically contracts with increasing training data because the variance of the posterior and the likelihood typically decreases with increasing training data.
Question 3
Answer: 2, 3, 4.
Gaussian Process regression is a Bayesian modeling approach but they do not assume that the data is Gaussian distributed, neither do they make such an assumption about the error.
Gaussian Processes place a probabilistic prior directly on the space of functions and model the posterior of the predictor using a parameterized kernel representation of the covariance matrix. Gaussian Processes are fitted to data by maximizing the evidence for the kernel parameters. However, it is not necessarily the case that the choice of kernel is effectively a hyperparameter that can be optimized. While this could be achieved in an ad hoc way, there are other considerations which dictate the choice of kernel concerning smoothness and ability to extrapolate.
Python Notebooks
A number of notebooks are provided in the accompanying source code repository, beyond the two described in this chapter. These notebooks demonstrate the use of Multi-GPs and application to CVA modeling (see Crépey and Dixon (2020) for details of these models). Further details of the notebooks are included in the README.md file.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Dixon, M.F., Halperin, I., Bilokon, P. (2020). Bayesian Regression and Gaussian Processes. In: Machine Learning in Finance. Springer, Cham. https://doi.org/10.1007/978-3-030-41068-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-41068-1_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41067-4
Online ISBN: 978-3-030-41068-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)