Skip to main content

Bayesian Regression and Gaussian Processes

  • Chapter
  • First Online:
Book cover Machine Learning in Finance

Abstract

This chapter introduces Bayesian regression and shows how it extends many of the concepts in the previous chapter. We develop kernel based machine learning methods—specifically Gaussian process regression, an important class of Bayesian machine learning methods—and demonstrate their application to “surrogate” models of derivative prices. This chapter also provides a natural starting point from which to develop intuition for the role and functional form of regularization in a frequentist setting—the subject of subsequent chapters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Surrogate models learn the output of an existing mathematical or statistical model as a function of input data.

  2. 2.

    Note that the factor of 2 in the denominator of the second term does not cancel out because the derivative is w.r.t. \(\sigma _n^2\) and not σ n.

  3. 3.

    This is in contrast to non-linear regressions commonly used in finance, which attempt to parameterize a non-linear function with a set of weights.

  4. 4.

    This choice is not a real limitation in practice (since it is for the prior) and does not prevent the mean of the predictor from being nonzero.

  5. 5.

    Gardner et al. (2018) explored 5 different approximation methods known in the numerical analysis literature.

  6. 6.

    Note that the plot uses the original coordinates and not the re-scaled coordinates.

  7. 7.

    Such maturities might correspond to exposure evaluation times in CVA simulation as in Crépey and Dixon (2020). The option model versus GP model are observed to produce very similar values.

References

  • Alvarez, M., Rosasco, L., & Lawrence, N. (2012). Kernels for vector-valued functions: A review. Foundations and Trends in Machine Learning,4(3), 195–266.

    Article  MATH  Google Scholar 

  • Bishop, C. M. (2006). Pattern recognition and machine learning (information science and statistics). Berlin, Heidelberg: Springer-Verlag.

    MATH  Google Scholar 

  • Bonilla, E. V., Chai, K. M. A., & Williams, C. K. I. (2007). Multi-task Gaussian process prediction. In Proceedings of the 20th International Conference on Neural Information Processing Systems, NIPS’07, USA (pp. 153–160). Curran Associates Inc.

    Google Scholar 

  • Chen, Z., Wang, B., & Gorban, A. N. (2017, March). Multivariate Gaussian and student− t process regression for multi-output prediction. ArXiv e-prints.

    Google Scholar 

  • Cousin, A., Maatouk, H., & Rullière, D. (2016). Kriging of financial term structures. European Journal of Operational Research,255, 631–648.

    Article  MathSciNet  MATH  Google Scholar 

  • Crépey, S., & M. Dixon (2020). Gaussian process regression for derivative portfolio modeling and application to CVA computations. Computational Finance.

    Google Scholar 

  • da Barrosa, M. R., Salles, A. V., & de Oliveira Ribeiro, C. (2016). Portfolio optimization through kriging methods. Applied Economics,48(50), 4894–4905.

    Article  Google Scholar 

  • Fang, F., & Oosterlee, C. W. (2008). A novel pricing method for European options based on Fourier-cosine series expansions. SIAM J. SCI. COMPUT.

    Google Scholar 

  • Gardner, J., Pleiss, G., Wu, R., Weinberger, K., & Wilson, A. (2018). Product kernel interpolation for scalable Gaussian processes. In International Conference on Artificial Intelligence and Statistics (pp. 1407–1416).

    Google Scholar 

  • Gramacy, R., & D. Apley (2015). Local Gaussian process approximation for large computer experiments. Journal of Computational and Graphical Statistics,24(2), 561–578.

    Article  MathSciNet  Google Scholar 

  • Hans Bühler, H., Gonon, L., Teichmann, J., & Wood, B. (2018). Deep hedging. Quantitative Finance. Forthcoming (preprint version available as arXiv:1802.03042).

    Google Scholar 

  • Hernandez, A. (2017). Model calibration with neural networks. Risk Magazine (June 1–5). Preprint version available at SSRN.2812140, code available at https://github.com/Andres-Hernandez/CalibrationNN.

  • Liu, M., & Staum, J. (2010). Stochastic kriging for efficient nested simulation of expected shortfall. Journal of Risk,12(3), 3–27.

    Article  Google Scholar 

  • Ludkovski, M. (2018). Kriging metamodels and experimental design for Bermudan option pricing. Journal of Computational Finance,22(1), 37–77.

    Article  Google Scholar 

  • MacKay, D. J. (1998). Introduction to Gaussian processes. In C. M. Bishop (Ed.), Neural networks and machine learning. Springer-Verlag.

    Google Scholar 

  • Melkumyan, A., & Ramos, F. (2011). Multi-kernel Gaussian processes. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence - Volume Two, IJCAI’11 (pp. 1408–1413). AAAI Press.

    Google Scholar 

  • Micchelli, C. A., Xu, Y., & Zhang, H. (2006, December). Universal kernels. J. Mach. Learn. Res.,7, 2651–2667.

    MathSciNet  MATH  Google Scholar 

  • Murphy, K. (2012). Machine learning: a probabilistic perspective. The MIT Press.

    MATH  Google Scholar 

  • Neal, R. M. (1996). Bayesian learning for neural networks, Volume 118 of Lecture Notes in Statistics. Springer.

    Book  Google Scholar 

  • Pillonetto, G., Dinuzzo, F., & Nicolao, G. D. (2010, Feb). Bayesian online multitask learning of Gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence,32(2), 193–205.

    Article  Google Scholar 

  • Rasmussen, C. E., & Ghahramani, Z. (2001). Occam’s razor. In In Advances in Neural Information Processing Systems 13 (pp. 294–300). MIT Press.

    Google Scholar 

  • Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian processes for machine learning. MIT Press.

    MATH  Google Scholar 

  • Roberts, S., Osborne, M., Ebden, M., Reece, S., Gibson, N., & Aigrain, S. (2013). Gaussian processes for time-series modelling. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences,371(1984).

    Google Scholar 

  • Scholkopf, B., & Smola, A. J. (2001). Learning with kernels: support vector machines, regularization, optimization, and beyond. Cambridge, MA, USA: MIT Press.

    Google Scholar 

  • Spiegeleer, J. D., Madan, D. B., Reyners, S., & Schoutens, W. (2018). Machine learning for quantitative finance: fast derivative pricing, hedging and fitting. Quantitative Finance,0(0), 1–9.

    Google Scholar 

  • Weinan, E, Han, J., & Jentzen, A. (2017). Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. arXiv:1706.04702.

    Google Scholar 

  • Whittle, P., & Sargent, T. J. (1983). Prediction and regulation by linear least-square methods (NED - New edition ed.). University of Minnesota Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Appendix

Appendix

Answers to Multiple Choice Questions

Question 1

Answer: 1,4,5.

Parametric Bayesian regression always treats the regression weights as random variables.

In Bayesian regression the data function f(x) is only observed if the data is assumed to be noise-free. Otherwise, the function is not directly observed.

The posterior distribution of the parameters will only be Gaussian if both the prior and the likelihood function are Gaussian. The distribution of the likelihood function depends on the assumed error distribution.

The posterior distribution of the regression weights will typically contract with increasing data. The precision matrix grows with decreasing variance and hence the variance of the posterior shrinks with increasing data. There are exceptions if, for example, there are outliers in the data.

The mean of the posterior distribution depends on both the mean and covariance of the prior if it is Gaussian. We can see this from Eq. 3.19.

Question 2

Answer: 1, 2, 4. Prediction under a Bayesian linear model requires first estimating the moments of the posterior distribution of the parameters. This is because the prediction is the expected likelihood of the new data under the posterior distribution.

The predictive distribution is Gaussian only if the posterior and likelihood distributions are Gaussian. The product of Gaussian density functions is also Gaussian.

The predictive distribution does not depend on the weights in the models - it is marginalized out under the expectation w.r.t. the posterior distribution. The variance of the predictive distribution typically contracts with increasing training data because the variance of the posterior and the likelihood typically decreases with increasing training data.

Question 3

Answer: 2, 3, 4.

Gaussian Process regression is a Bayesian modeling approach but they do not assume that the data is Gaussian distributed, neither do they make such an assumption about the error.

Gaussian Processes place a probabilistic prior directly on the space of functions and model the posterior of the predictor using a parameterized kernel representation of the covariance matrix. Gaussian Processes are fitted to data by maximizing the evidence for the kernel parameters. However, it is not necessarily the case that the choice of kernel is effectively a hyperparameter that can be optimized. While this could be achieved in an ad hoc way, there are other considerations which dictate the choice of kernel concerning smoothness and ability to extrapolate.

Python Notebooks

A number of notebooks are provided in the accompanying source code repository, beyond the two described in this chapter. These notebooks demonstrate the use of Multi-GPs and application to CVA modeling (see Crépey and Dixon (2020) for details of these models). Further details of the notebooks are included in the README.md file.

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Dixon, M.F., Halperin, I., Bilokon, P. (2020). Bayesian Regression and Gaussian Processes. In: Machine Learning in Finance. Springer, Cham. https://doi.org/10.1007/978-3-030-41068-1_3

Download citation

Publish with us

Policies and ethics