Bayesian Regression and Gaussian Processes

Dixon, Matthew F.; Halperin, Igor; Bilokon, Paul

doi:10.1007/978-3-030-41068-1_3

Matthew F. Dixon⁴,
Igor Halperin⁵ &
Paul Bilokon⁶

12k Accesses

Abstract

This chapter introduces Bayesian regression and shows how it extends many of the concepts in the previous chapter. We develop kernel based machine learning methods—specifically Gaussian process regression, an important class of Bayesian machine learning methods—and demonstrate their application to “surrogate” models of derivative prices. This chapter also provides a natural starting point from which to develop intuition for the role and functional form of regularization in a frequentist setting—the subject of subsequent chapters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Hardcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Surrogate models learn the output of an existing mathematical or statistical model as a function of input data.
2.
Note that the factor of 2 in the denominator of the second term does not cancel out because the derivative is w.r.t. \(\sigma _n^2\) and not σ _n.
3.
This is in contrast to non-linear regressions commonly used in finance, which attempt to parameterize a non-linear function with a set of weights.
4.
This choice is not a real limitation in practice (since it is for the prior) and does not prevent the mean of the predictor from being nonzero.
5.
Gardner et al. (2018) explored 5 different approximation methods known in the numerical analysis literature.
6.
Note that the plot uses the original coordinates and not the re-scaled coordinates.
7.
Such maturities might correspond to exposure evaluation times in CVA simulation as in Crépey and Dixon (2020). The option model versus GP model are observed to produce very similar values.

References

Alvarez, M., Rosasco, L., & Lawrence, N. (2012). Kernels for vector-valued functions: A review. Foundations and Trends in Machine Learning,4(3), 195–266.
Article MATH Google Scholar
Bishop, C. M. (2006). Pattern recognition and machine learning (information science and statistics). Berlin, Heidelberg: Springer-Verlag.
MATH Google Scholar
Bonilla, E. V., Chai, K. M. A., & Williams, C. K. I. (2007). Multi-task Gaussian process prediction. In Proceedings of the 20th International Conference on Neural Information Processing Systems, NIPS’07, USA (pp. 153–160). Curran Associates Inc.
Google Scholar
Chen, Z., Wang, B., & Gorban, A. N. (2017, March). Multivariate Gaussian and student− t process regression for multi-output prediction. ArXiv e-prints.
Google Scholar
Cousin, A., Maatouk, H., & Rullière, D. (2016). Kriging of financial term structures. European Journal of Operational Research,255, 631–648.
Article MathSciNet MATH Google Scholar
Crépey, S., & M. Dixon (2020). Gaussian process regression for derivative portfolio modeling and application to CVA computations. Computational Finance.
Google Scholar
da Barrosa, M. R., Salles, A. V., & de Oliveira Ribeiro, C. (2016). Portfolio optimization through kriging methods. Applied Economics,48(50), 4894–4905.
Article Google Scholar
Fang, F., & Oosterlee, C. W. (2008). A novel pricing method for European options based on Fourier-cosine series expansions. SIAM J. SCI. COMPUT.
Google Scholar
Gardner, J., Pleiss, G., Wu, R., Weinberger, K., & Wilson, A. (2018). Product kernel interpolation for scalable Gaussian processes. In International Conference on Artificial Intelligence and Statistics (pp. 1407–1416).
Google Scholar
Gramacy, R., & D. Apley (2015). Local Gaussian process approximation for large computer experiments. Journal of Computational and Graphical Statistics,24(2), 561–578.
Article MathSciNet Google Scholar
Hans Bühler, H., Gonon, L., Teichmann, J., & Wood, B. (2018). Deep hedging. Quantitative Finance. Forthcoming (preprint version available as arXiv:1802.03042).
Google Scholar
Hernandez, A. (2017). Model calibration with neural networks. Risk Magazine (June 1–5). Preprint version available at SSRN.2812140, code available at https://github.com/Andres-Hernandez/CalibrationNN.
Liu, M., & Staum, J. (2010). Stochastic kriging for efficient nested simulation of expected shortfall. Journal of Risk,12(3), 3–27.
Article Google Scholar
Ludkovski, M. (2018). Kriging metamodels and experimental design for Bermudan option pricing. Journal of Computational Finance,22(1), 37–77.
Article Google Scholar
MacKay, D. J. (1998). Introduction to Gaussian processes. In C. M. Bishop (Ed.), Neural networks and machine learning. Springer-Verlag.
Google Scholar
Melkumyan, A., & Ramos, F. (2011). Multi-kernel Gaussian processes. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence - Volume Two, IJCAI’11 (pp. 1408–1413). AAAI Press.
Google Scholar
Micchelli, C. A., Xu, Y., & Zhang, H. (2006, December). Universal kernels. J. Mach. Learn. Res.,7, 2651–2667.
MathSciNet MATH Google Scholar
Murphy, K. (2012). Machine learning: a probabilistic perspective. The MIT Press.
MATH Google Scholar
Neal, R. M. (1996). Bayesian learning for neural networks, Volume 118 of Lecture Notes in Statistics. Springer.
Book Google Scholar
Pillonetto, G., Dinuzzo, F., & Nicolao, G. D. (2010, Feb). Bayesian online multitask learning of Gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence,32(2), 193–205.
Article Google Scholar
Rasmussen, C. E., & Ghahramani, Z. (2001). Occam’s razor. In In Advances in Neural Information Processing Systems 13 (pp. 294–300). MIT Press.
Google Scholar
Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian processes for machine learning. MIT Press.
MATH Google Scholar
Roberts, S., Osborne, M., Ebden, M., Reece, S., Gibson, N., & Aigrain, S. (2013). Gaussian processes for time-series modelling. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences,371(1984).
Google Scholar
Scholkopf, B., & Smola, A. J. (2001). Learning with kernels: support vector machines, regularization, optimization, and beyond. Cambridge, MA, USA: MIT Press.
Google Scholar
Spiegeleer, J. D., Madan, D. B., Reyners, S., & Schoutens, W. (2018). Machine learning for quantitative finance: fast derivative pricing, hedging and fitting. Quantitative Finance,0(0), 1–9.
Google Scholar
Weinan, E, Han, J., & Jentzen, A. (2017). Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. arXiv:1706.04702.
Google Scholar
Whittle, P., & Sargent, T. J. (1983). Prediction and regulation by linear least-square methods (NED - New edition ed.). University of Minnesota Press.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Applied Mathematics, Illinois Institute of Technology, Chicago, IL, USA
Matthew F. Dixon
Tandon School of Engineering, New York University, Brooklyn, NY, USA
Igor Halperin
Department of Mathematics, Imperial College London, London, UK
Paul Bilokon

Authors

Matthew F. Dixon
View author publications
You can also search for this author in PubMed Google Scholar
Igor Halperin
View author publications
You can also search for this author in PubMed Google Scholar
Paul Bilokon
View author publications
You can also search for this author in PubMed Google Scholar

Appendix

Answers to Multiple Choice Questions

Question 1

Answer: 1,4,5.

Parametric Bayesian regression always treats the regression weights as random variables.

In Bayesian regression the data function f(x) is only observed if the data is assumed to be noise-free. Otherwise, the function is not directly observed.

The posterior distribution of the parameters will only be Gaussian if both the prior and the likelihood function are Gaussian. The distribution of the likelihood function depends on the assumed error distribution.

The posterior distribution of the regression weights will typically contract with increasing data. The precision matrix grows with decreasing variance and hence the variance of the posterior shrinks with increasing data. There are exceptions if, for example, there are outliers in the data.

The mean of the posterior distribution depends on both the mean and covariance of the prior if it is Gaussian. We can see this from Eq. 3.19.

Question 2

Answer: 1, 2, 4. Prediction under a Bayesian linear model requires first estimating the moments of the posterior distribution of the parameters. This is because the prediction is the expected likelihood of the new data under the posterior distribution.

The predictive distribution is Gaussian only if the posterior and likelihood distributions are Gaussian. The product of Gaussian density functions is also Gaussian.

The predictive distribution does not depend on the weights in the models - it is marginalized out under the expectation w.r.t. the posterior distribution. The variance of the predictive distribution typically contracts with increasing training data because the variance of the posterior and the likelihood typically decreases with increasing training data.

Question 3

Answer: 2, 3, 4.

Gaussian Process regression is a Bayesian modeling approach but they do not assume that the data is Gaussian distributed, neither do they make such an assumption about the error.

Gaussian Processes place a probabilistic prior directly on the space of functions and model the posterior of the predictor using a parameterized kernel representation of the covariance matrix. Gaussian Processes are fitted to data by maximizing the evidence for the kernel parameters. However, it is not necessarily the case that the choice of kernel is effectively a hyperparameter that can be optimized. While this could be achieved in an ad hoc way, there are other considerations which dictate the choice of kernel concerning smoothness and ability to extrapolate.

Python Notebooks

A number of notebooks are provided in the accompanying source code repository, beyond the two described in this chapter. These notebooks demonstrate the use of Multi-GPs and application to CVA modeling (see Crépey and Dixon (2020) for details of these models). Further details of the notebooks are included in the README.md file.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dixon, M.F., Halperin, I., Bilokon, P. (2020). Bayesian Regression and Gaussian Processes. In: Machine Learning in Finance. Springer, Cham. https://doi.org/10.1007/978-3-030-41068-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-41068-1_3
Published: 02 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41067-4
Online ISBN: 978-3-030-41068-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Bayesian Regression and Gaussian Processes

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Appendix

Appendix

Question 1

Question 2

Question 3

Python Notebooks

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation