Explaining the Predictions of an Arbitrary Prediction Model: Feature Contributions and Quasi-nomograms

Štrumbelj, Erik; Kononenko, Igor

doi:10.1007/978-3-319-90403-0_8

Erik Štrumbelj⁵ &
Igor Kononenko⁵

Part of the book series: Human–Computer Interaction Series ((HCIS))

Abstract

Acquisition of knowledge from data is the quintessential task of machine learning. The knowledge we extract this way might not be suitable for immediate use and one or more data postprocessing methods could be applied as well. Data postprocessing includes the integration, filtering, evaluation, and explanation of acquired knowledge. Nomograms, graphical devices for approximate calculations of functions, are a useful tool for visualising and comparing prediction models. It is well known that any generalised additive model can be represented by a quasi-nomogram – a nomogram where some summation performed by the human is required. Nomograms of this type are widely used, especially in medical prognostics. Methods for constructing such a nomogram were developed for specific types of prediction models thus assuming that the structure of the model is known. In this chapter we extend our previous work on a general method for explaining arbitrary prediction models (classification or regression) to a general methodology for constructing a quasi-nomogram for a black-box prediction model. We show that for an additive model, such a quasi-nomogram is equivalent to the one we would construct if the structure of the model was known.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
www.sciencedirect.com currently lists 1393 research papers that feature the word “nomogram” in the title, keywords, or abstract and were published between 2006 and 2015. Most of them are from the medical field.
2.
Linear regression is, of course, just a special case of generalised additive model with identity link function and linear effect functions

References

Achen, C.H.: Intepreting and Using Regression. Sage Publications (1982)
Google Scholar
Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., MÃžller, K.R.: How to explain individual classification decisions. J. Mach. Learn. Res. 11, 1803–1831 (2010)
MathSciNet MATH Google Scholar
Bosnić, Z., Vračar, P., Radović, M.D., Devedzić, G., Filipović, N.D., Kononenko, I.: Mining data from hemodynamic simulations for generating prediction and explanation models. IEEE Trans. Inf. Technol. Biomed. 16(2), 248–254 (2012)
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. J. 45, 5–32 (2001)
Article Google Scholar
Cho, B.H., Yu, H., Lee, J., Chee, Y.J., Kim, I.Y., Kim, S.I.: Nonlinear support vector machine visualization for risk factor analysis using nomograms and localized radial basis function kernels. IEEE Trans. Inf. Technol. Biomed. 12(2), 247–256 (2008)
Article Google Scholar
Chun, F.K.H., Briganti, A., Karakiewicz, P.I., Graefen, M.: Should we use nomograms to predict outcome?. Eur. Urol. Suppl. 7(5), 396–399 (2008). Update Uro-Oncology 2008, Fifth Fall Meeting of the European Society of Oncological Urology (ESOU)
Google Scholar
Demšar, J., Zupan, B., Leban, G., Curk, T.: Orange: From experimental machine learning to interactive data mining. In: PKDD’04, pp. 537–539 (2004)
Google Scholar
d’Ocagne, M.: Traité de nomographie. Gauthier-Villars, Paris (1899)
Google Scholar
Doerfler, R.: The lost art of nomography. UMAP J. 30(4), 457–493 (2009)
Google Scholar
Eastham, J.A., Scardino, P.T., Kattan, M.W.: Predicting an optimal outcome after radical prostatectomy: the trifecta nomogram. J. Urol. 79(6), 2011–2207 (2008)
Google Scholar
Grömping, U.: Estimators of relative importance in linear regression based on variance decomposition. Am. Stat. 61(2), (2007)
Article MathSciNet Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Article Google Scholar
Jaeckel, P.: Monte Carlo Methods in Finance. Wiley, New York (2002)
Google Scholar
Jakulin, A., Možina, M., Demšar, J., Bratko, I., Zupan, B.: Nomograms for visualizing support vector machines. In: KDD ’05: Proceeding of the eleventh ACM SIGKDD International Conference on Knowledge Discovery In Data Mining, pp. 108–117. ACM, New York, USA (2005)
Google Scholar
Kanao, K., Mizuno, R., Kikuchi, E., Miyajima, A., Nakagawa, K., Ohigashi, T., Nakashima, J., Oya, M.: Preoperative prognostic nomogram (probability table) for renal cell carcinoma based on tnm classification. J. Urol. 181(2), 480–485 (2009)
Article Google Scholar
Kattan, M.W., Marasco, J.: What is a real nomogram. Semin. Oncol. 37(1), 23–26 (2010)
Article Google Scholar
Kubatko, J., Oliver, D., Pelton, K., Rosenbaum, D.T.: A starting point for analyzing basketball statistics. J. Quantit. Anal. Sports 3(3), 00–01 (2007)
MathSciNet Google Scholar
Kukar, M., Grošelj, C.: Supporting diagnostics of coronary artery disease with neural networks. In: Adaptive and Natural Computing Algorithms, pp. 80–89. Springer, Berlin (2011)
Chapter Google Scholar
Kukar, M., Kononenko, I., Grošelj, C.: Modern parameterization and explanation techniques in diagnostic decision support system: a case study in diagnostics of coronary artery disease. Artif. Intell. Med. 52(2), 77–90 (2011)
Article Google Scholar
Lee, K.M., Kim, W.J., Ryu, K.H., Lee, S.H.: A nomogram construction method using genetic algorithm and naive Bayesian technique. In: Proceedings of the 11th WSEAS International Conference on Mathematical and Computational Methods In Science And Engineering, pp. 145–149. World Scientific and Engineering Academy and Society (WSEAS), Stevens Point, Wisconsin, USA (2009)
Google Scholar
Lemaire, V., Féraud, R., Voisine, N.: Contact personalization using a score understanding method. In: International Joint Conference on Neural Networks (IJCNN) (2008)
Google Scholar
Lughofer, E., Richter, R., Neissl, U., Heidl, W., Eitzinger, C., Radauer, T.: Advanced linguistic explanations of classifier decisions for users’ annotation support. In: 2016 IEEE 8th International Conference on Intelligent Systems (IS), pp. 421–432. IEEE, New York (2016)
Google Scholar
Možina, M., Demšar, J., Kattan, M., Zupan, B.: Nomograms for visualization of naive Bayesian classifier. In: PKDD ’04: Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 337–348. Springer, New York, USA (2004)
Google Scholar
Nguyen, C.T., Stephenson, A.J., Kattan, M.W.: Are nomograms needed in the management of bladder cancer?. Urol. Oncol. Semin. Orig. Investig. 28(1), 102 – 107 (2010). Proceedings: Midwinter Meeting of the Society of Urologic Oncology (December 2008): Updated Issues in Kidney, Bladder, Prostate, and Testis Cancer
Google Scholar
Niederreiter, H.: Low-discrepancy and low-dispersion sequences. J. Number Theory 30(1), 51–70 (1988)
Article MathSciNet Google Scholar
Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA (1992)
Book Google Scholar
Pregeljc, M., Štrumbelj, E., Mihelcic, M., Kononenko, I.: Learning and explaining the impact of enterprises organizational quality on their economic results. Intelligent Data Analysis for Real-Life Applications: Theory and Practice pp. 228–248 (2012)
Google Scholar
Radović, M.D., Filipović, N.D., Bosnić, Z., Vračar, P., Kononenko, I.: Mining data from hemodynamic simulations for generating prediction and explanation models. In: 2010 10th IEEE International Conference on Information Technology and Applications in Biomedicine (ITAB), pp. 1–4. IEEE, New York (2010)
Google Scholar
Robnik-Šikonja, M., Kononenko, I.: Explaining classifications for individual instances. IEEE Trans. Knowl. Data Eng. 20(5), 589–600 (2008)
Article Google Scholar
Robnik-Šikonja, M., Kononenko, I., Štrumbelj, E.: Quality of classification explanations with prbf. Neurocomputing 96, 37–46 (2012)
Article Google Scholar
Robnik-Šikonja, M., Likas, A., Constantinopoulos, C., Kononenko, I., Štrumbelj, E.: Efficiently explaining decisions of probabilistic RBF classification networks. In: Adaptive and Natural Computing Algorithms, pp. 169–179. Springer, Berlin (2011)
Chapter Google Scholar
Robnik-Šikonja, M., Kononenko, I.: Explaining classifications for individual instances. IEEE TKDE 20, 589–600 (2008)
Google Scholar
Shapley, L.S.: A Value for n-person games. Contributions to the Theory of Games, vol. II. Princeton University Press, Princeton (1953)
Google Scholar
Štrumbelj, E., Bosnić, Z., Zakotnik, B., Grašič-Kuhar, C., Kononenko, I.: Explanation and reliability of breast cancer recurrence predictions. Knowl. Inf. Syst. 24(2), 305–324 (2010)
Article Google Scholar
Štrumbelj, E., Kononenko, I.: An efficient explanation of individual classifications using game theory. J. Mach. Learn. Res. 11, 1–18 (2010)
MathSciNet MATH Google Scholar
Štrumbelj, E., Kononenko, I.: A general method for visualizing and explaining black-box regression models. In: Dobnikar A., Lotric U., Ster B. (eds.) ICANNGA (2). Lecture Notes in Computer Science, vol. 6594, pp. 21–30. Springer, Berlin (2011)
Chapter Google Scholar
Štrumbelj, E., Kononenko, I.: Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst. 41(3), 647–665 (2014)
Article Google Scholar
Vien, N.A., Viet, N.H., Chung, T., Yu, H., Kim, S., Cho, B.H.: Vrifa: a nonlinear SVM visualization tool using nomogram and localized radial basis function (LRBF) kernels. In: CIKM, pp. 2081–2082 (2009)
Google Scholar
Zien, A., Krämer, N., Sonnenburg, S., Rätsch, G.: The feature importance ranking measure. In: ECML PKDD 2009, Part II, pp. 694–709. Springer, Berlin (2009)
Chapter Google Scholar
Zlotnik, A., Abraira, V.: A general-purpose nomogram generator for predictive logistic regression models. Stata J. 15(2), 537–546 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, 1000, Ljubljana, Slovenia
Erik Štrumbelj & Igor Kononenko

Authors

Erik Štrumbelj
View author publications
You can also search for this author in PubMed Google Scholar
Igor Kononenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erik Štrumbelj .

Editor information

Editors and Affiliations

DATA61, CSIRO, Eveleigh, New South Wales, Australia
Jianlong Zhou
DATA61, CSIRO, Eveleigh, New South Wales, Australia
Fang Chen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Štrumbelj, E., Kononenko, I. (2018). Explaining the Predictions of an Arbitrary Prediction Model: Feature Contributions and Quasi-nomograms. In: Zhou, J., Chen, F. (eds) Human and Machine Learning. Human–Computer Interaction Series. Springer, Cham. https://doi.org/10.1007/978-3-319-90403-0_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-90403-0_8
Published: 08 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-90402-3
Online ISBN: 978-3-319-90403-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics