Braverman Readings in Machine Learning. Key Ideas from Inception to Current State pp 103-121 | Cite as

# Conformal Predictive Distributions with Kernels

## Abstract

This paper reviews the checkered history of predictive distributions in statistics and discusses two developments, one from recent literature and the other new. The first development is bringing predictive distributions into machine learning, whose early development was so deeply influenced by two remarkable groups at the Institute of Automation and Remote Control. As result, they become more robust and their validity ceases to depend on Bayesian or narrow parametric assumptions. The second development is combining predictive distributions with kernel methods, which were originated by one of those groups, including Emmanuel Braverman. As result, they become more flexible and, therefore, their predictive efficiency improves significantly for realistic non-linear data sets.

## Keywords

Conformal prediction Fiducial inference Predictive distributions## Notes

### Acknowledgements

This work has been supported by the EU Horizon 2020 Research and Innovation programme (in the framework of the ExCAPE project under grant agreement 671555) and Astra Zeneca (in the framework of the project “Machine Learning for Chemical Synthesis”).

## References

- 1.Burnaev, E., Vovk, V.: Efficiency of conformalized ridge regression. In: JMLR: Workshop and Conference Proceedings, COLT 2014, vol. 35, pp. 605–622 (2014)Google Scholar
- 2.Burnaev, E.V., Nazarov, I.N.: Conformalized Kernel Ridge Regression. Technical report arXiv:1609.05959 [stat.ML], arXiv.org e-Print archive, September 2016. Conference version: Proceedings of the Fifteenth International Conference on Machine Learning and Applications (ICMLA 2016), pp. 45–52
- 3.Chatterjee, S., Hadi, A.S.: Sensitivity Analysis in Linear Regression. Wiley, New York (1988)CrossRefGoogle Scholar
- 4.Cox, D.R.: Some problems connected with statistical inference. Ann. Math. Stat.
**29**, 357–372 (1958)MathSciNetCrossRefGoogle Scholar - 5.Dawid, A.P.: Statistical theory: the prequential approach (with discussion). J. Royal Stat. Soc. A
**147**, 278–292 (1984)CrossRefGoogle Scholar - 6.Dawid, A.P., Vovk, V.: Prequential probability: principles and properties. Bernoulli
**5**, 125–162 (1999)MathSciNetCrossRefGoogle Scholar - 7.Efron, B.: R. A. Fisher in the 21st century. Stat. Sci.
**13**, 95–122 (1998)MathSciNetCrossRefGoogle Scholar - 8.Gneiting, T., Katzfuss, M.: Probabilistic forecasting. Ann. Rev. Stat. Appl.
**1**, 125–151 (2014)CrossRefGoogle Scholar - 9.Goldberg, P.W., Williams, C.K.I., Bishop, C.M.: Regression with input-dependent noise: a Gaussian process treatment. In: Jordan, M.I., Kearns, M.J., Solla, S.A. (eds.) Advances in Neural Information Processing Systems 10, pp. 493–499. MIT Press, Cambridge (1998)Google Scholar
- 10.Henderson, H.V., Searle, S.R.: On deriving the inverse of a sum of matrices. SIAM Rev.
**23**, 53–60 (1981)MathSciNetCrossRefGoogle Scholar - 11.Knight, F.H.: Risk, Uncertainty, and Profit. Houghton Mifflin Company, Boston (1921)Google Scholar
- 12.Le, Q.V., Smola, A.J., Canu, S.: Heteroscedastic Gaussian process regression. In: Dechter, R., Richardson, T. (eds.) Proceedings of the Twenty Second International Conference on Machine Learning, pp. 461–468. ACM, New York (2005)Google Scholar
- 13.McCullagh, P., Vovk, V., Nouretdinov, I., Devetyarov, D., Gammerman, A.: Conditional prediction intervals for linear regression. In: Proceedings of the Eighth International Conference on Machine Learning and Applications (ICMLA 2009), pp. 131–138 (2009). http://www.stat.uchicago.edu/~pmcc/reports/predict.pdf
- 14.Montgomery, D.C., Peck, E.A., Vining, G.G.: Introduction to Linear Regression Analysis, 5th edn. Wiley, Hoboken (2012)zbMATHGoogle Scholar
- 15.Platt, J.C.: Probabilities for SV machines. In: Smola, A.J., Bartlett, P.L., Schölkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers, pp. 61–74. MIT Press (2000)Google Scholar
- 16.Schweder, T., Hjort, N.L.: Confidence, Likelihood, Probability: Statistical Inference with Confidence Distributions. Cambridge University Press, Cambridge (2016)CrossRefGoogle Scholar
- 17.Shen, J., Liu, R., Xie, M.: Prediction with confidence–A general framework for predictive inference. J. Stat. Plann. Infer.
**195**, 126–140 (2018)MathSciNetCrossRefGoogle Scholar - 18.Shiryaev, A.N.: Open image in new window (Probability), 3rd edn. Open image in new window, Moscow (2004)Google Scholar
- 19.Snelson, E., Ghahramani, Z.: Variable noise and dimensionality reduction for sparse Gaussian processes. In: Dechter, R., Richardson, T. (eds.) Proceedings of the Twenty Second Conference on Uncertainty in Artifical Intelligence (UAI 2006), pp. 461–468. AUAI Press, Arlington (2006)Google Scholar
- 20.Steinwart, I.: On the influence of the kernel on the consistency of support vector machines. J. Mach. Learn. Res.
**2**, 67–93 (2001)MathSciNetzbMATHGoogle Scholar - 21.Thomas-Agnan, C.: Computing a family of reproducing kernels for statistical applications. Numer. Algorithms
**13**, 21–32 (1996)MathSciNetCrossRefGoogle Scholar - 22.Vovk, V.: Universally consistent predictive distributions. Technical report. arXiv:1708.01902 [cs.LG], arXiv.org e-Print archive, August 2017
- 23.Vovk, V., Gammerman, A., Shafer, G.: Algorithmic Learning in a Random World. Springer, New York (2005)zbMATHGoogle Scholar
- 24.Vovk, V., Nouretdinov, I., Gammerman, A.: On-line predictive linear regression. Ann. Stat.
**37**, 1566–1590 (2009)MathSciNetCrossRefGoogle Scholar - 25.Vovk, V., Papadopoulos, H., Gammerman, A. (eds.): Measures of Complexity: Festschrift for Alexey Chervonenkis. Springer, Heidelberg (2015)zbMATHGoogle Scholar
- 26.Vovk, V., Petej, I.: Venn-Abers predictors. In: Zhang, N.L., Tian, J. (eds.) Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, pp. 829–838. AUAI Press, Corvallis (2014)Google Scholar
- 27.Vovk, V., Shen, J., Manokhin, V., Xie, M.: Nonparametric predictive distributions based on conformal prediction. In: Proceedings of Machine Learning Research, COPA 2017, vol. 60, pp. 82–102 (2017)Google Scholar
- 28.Wasserman, L.: Frasian inference. Stat. Sci.
**26**, 322–325 (2011)CrossRefGoogle Scholar - 29.Zadrozny, B., Elkan, C.: Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In: Brodley, C.E., Danyluk, A.P. (eds.) Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), pp. 609–616. Morgan Kaufmann, San Francisco (2001)Google Scholar