Conformal Predictive Distributions with Kernels

  • Vladimir VovkEmail author
  • Ilia Nouretdinov
  • Valery Manokhin
  • Alex Gammerman
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11100)


This paper reviews the checkered history of predictive distributions in statistics and discusses two developments, one from recent literature and the other new. The first development is bringing predictive distributions into machine learning, whose early development was so deeply influenced by two remarkable groups at the Institute of Automation and Remote Control. As result, they become more robust and their validity ceases to depend on Bayesian or narrow parametric assumptions. The second development is combining predictive distributions with kernel methods, which were originated by one of those groups, including Emmanuel Braverman. As result, they become more flexible and, therefore, their predictive efficiency improves significantly for realistic non-linear data sets.


Conformal prediction Fiducial inference Predictive distributions 



This work has been supported by the EU Horizon 2020 Research and Innovation programme (in the framework of the ExCAPE project under grant agreement 671555) and Astra Zeneca (in the framework of the project “Machine Learning for Chemical Synthesis”).


  1. 1.
    Burnaev, E., Vovk, V.: Efficiency of conformalized ridge regression. In: JMLR: Workshop and Conference Proceedings, COLT 2014, vol. 35, pp. 605–622 (2014)Google Scholar
  2. 2.
    Burnaev, E.V., Nazarov, I.N.: Conformalized Kernel Ridge Regression. Technical report arXiv:1609.05959 [stat.ML], e-Print archive, September 2016. Conference version: Proceedings of the Fifteenth International Conference on Machine Learning and Applications (ICMLA 2016), pp. 45–52
  3. 3.
    Chatterjee, S., Hadi, A.S.: Sensitivity Analysis in Linear Regression. Wiley, New York (1988)CrossRefGoogle Scholar
  4. 4.
    Cox, D.R.: Some problems connected with statistical inference. Ann. Math. Stat. 29, 357–372 (1958)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Dawid, A.P.: Statistical theory: the prequential approach (with discussion). J. Royal Stat. Soc. A 147, 278–292 (1984)CrossRefGoogle Scholar
  6. 6.
    Dawid, A.P., Vovk, V.: Prequential probability: principles and properties. Bernoulli 5, 125–162 (1999)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Efron, B.: R. A. Fisher in the 21st century. Stat. Sci. 13, 95–122 (1998)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Gneiting, T., Katzfuss, M.: Probabilistic forecasting. Ann. Rev. Stat. Appl. 1, 125–151 (2014)CrossRefGoogle Scholar
  9. 9.
    Goldberg, P.W., Williams, C.K.I., Bishop, C.M.: Regression with input-dependent noise: a Gaussian process treatment. In: Jordan, M.I., Kearns, M.J., Solla, S.A. (eds.) Advances in Neural Information Processing Systems 10, pp. 493–499. MIT Press, Cambridge (1998)Google Scholar
  10. 10.
    Henderson, H.V., Searle, S.R.: On deriving the inverse of a sum of matrices. SIAM Rev. 23, 53–60 (1981)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Knight, F.H.: Risk, Uncertainty, and Profit. Houghton Mifflin Company, Boston (1921)Google Scholar
  12. 12.
    Le, Q.V., Smola, A.J., Canu, S.: Heteroscedastic Gaussian process regression. In: Dechter, R., Richardson, T. (eds.) Proceedings of the Twenty Second International Conference on Machine Learning, pp. 461–468. ACM, New York (2005)Google Scholar
  13. 13.
    McCullagh, P., Vovk, V., Nouretdinov, I., Devetyarov, D., Gammerman, A.: Conditional prediction intervals for linear regression. In: Proceedings of the Eighth International Conference on Machine Learning and Applications (ICMLA 2009), pp. 131–138 (2009).
  14. 14.
    Montgomery, D.C., Peck, E.A., Vining, G.G.: Introduction to Linear Regression Analysis, 5th edn. Wiley, Hoboken (2012)zbMATHGoogle Scholar
  15. 15.
    Platt, J.C.: Probabilities for SV machines. In: Smola, A.J., Bartlett, P.L., Schölkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers, pp. 61–74. MIT Press (2000)Google Scholar
  16. 16.
    Schweder, T., Hjort, N.L.: Confidence, Likelihood, Probability: Statistical Inference with Confidence Distributions. Cambridge University Press, Cambridge (2016)CrossRefGoogle Scholar
  17. 17.
    Shen, J., Liu, R., Xie, M.: Prediction with confidence–A general framework for predictive inference. J. Stat. Plann. Infer. 195, 126–140 (2018)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Shiryaev, A.N.: Open image in new window (Probability), 3rd edn. Open image in new window, Moscow (2004)Google Scholar
  19. 19.
    Snelson, E., Ghahramani, Z.: Variable noise and dimensionality reduction for sparse Gaussian processes. In: Dechter, R., Richardson, T. (eds.) Proceedings of the Twenty Second Conference on Uncertainty in Artifical Intelligence (UAI 2006), pp. 461–468. AUAI Press, Arlington (2006)Google Scholar
  20. 20.
    Steinwart, I.: On the influence of the kernel on the consistency of support vector machines. J. Mach. Learn. Res. 2, 67–93 (2001)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Thomas-Agnan, C.: Computing a family of reproducing kernels for statistical applications. Numer. Algorithms 13, 21–32 (1996)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Vovk, V.: Universally consistent predictive distributions. Technical report. arXiv:1708.01902 [cs.LG], e-Print archive, August 2017
  23. 23.
    Vovk, V., Gammerman, A., Shafer, G.: Algorithmic Learning in a Random World. Springer, New York (2005)zbMATHGoogle Scholar
  24. 24.
    Vovk, V., Nouretdinov, I., Gammerman, A.: On-line predictive linear regression. Ann. Stat. 37, 1566–1590 (2009)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Vovk, V., Papadopoulos, H., Gammerman, A. (eds.): Measures of Complexity: Festschrift for Alexey Chervonenkis. Springer, Heidelberg (2015)zbMATHGoogle Scholar
  26. 26.
    Vovk, V., Petej, I.: Venn-Abers predictors. In: Zhang, N.L., Tian, J. (eds.) Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, pp. 829–838. AUAI Press, Corvallis (2014)Google Scholar
  27. 27.
    Vovk, V., Shen, J., Manokhin, V., Xie, M.: Nonparametric predictive distributions based on conformal prediction. In: Proceedings of Machine Learning Research, COPA 2017, vol. 60, pp. 82–102 (2017)Google Scholar
  28. 28.
    Wasserman, L.: Frasian inference. Stat. Sci. 26, 322–325 (2011)CrossRefGoogle Scholar
  29. 29.
    Zadrozny, B., Elkan, C.: Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In: Brodley, C.E., Danyluk, A.P. (eds.) Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), pp. 609–616. Morgan Kaufmann, San Francisco (2001)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Vladimir Vovk
    • 1
    Email author
  • Ilia Nouretdinov
    • 1
  • Valery Manokhin
    • 1
  • Alex Gammerman
    • 1
  1. 1.Royal HollowayUniversity of LondonEgham, SurreyUK

Personalised recommendations