The importance of interpretability and visualization in machine learning for applications in medicine and health care

  • Alfredo VellidoEmail author
WSOM 2017


In a short period of time, many areas of science have made a sharp transition towards data-dependent methods. In some cases, this process has been enabled by simultaneous advances in data acquisition and the development of networked system technologies. This new situation is particularly clear in the life sciences, where data overabundance has sparked a flurry of new methodologies for data management and analysis. This can be seen as a perfect scenario for the use of machine learning and computational intelligence techniques to address problems in which more traditional data analysis approaches might struggle. But, this scenario also poses some serious challenges. One of them is model interpretability and explainability, especially for complex nonlinear models. In some areas such as medicine and health care, not addressing such challenge might seriously limit the chances of adoption, in real practice, of computer-based systems that rely on machine learning and computational intelligence methods for data analysis. In this paper, we reflect on recent investigations about the interpretability and explainability of machine learning methods and discuss their impact on medicine and health care. We pay specific attention to one of the ways in which interpretability and explainability in this context can be addressed, which is through data and model visualization. We argue that, beyond improving model interpretability as a goal in itself, we need to integrate the medical experts in the design of data analysis interpretation strategies. Otherwise, machine learning is unlikely to become a part of routine clinical and health care practice.


Interpretability Explainability Machine learning Visualization Medicine Health care 



This work was funded by the MINECO Spanish TIN2016-79576-R Project.

Compliance with ethical standards

Conflict of interest

The author declares that he has no conflict of interest.


  1. 1.
    Wu Q, Zhu Y, Wang X, Li M, Hou J, Masoumi A (2017) Exploring high efficiency hardware accelerator for the key algorithm of Square Kilometer Array telescope data processing. In: Proceedings of the IEEE \(25{\rm th}\) annual international symposium on field-programmable custom computing machines (FCCM), pp 195–195Google Scholar
  2. 2.
    Britton D, Lloyd SL (2014) How to deal with petabytes of data: the LHC Grid project. Rep Prog Phys 77(6):065902CrossRefGoogle Scholar
  3. 3.
    Adam-Bourdarios C, Cowan G, Germain-Renaud C, Guyon I, Kégl B, Rousseau D (2015) The Higgs machine learning challenge. J Phys Conf 664(7):072015CrossRefGoogle Scholar
  4. 4.
    Leonelli S (2016) Data-centric biology: a philosophical study. University of Chicago Press, ChicagoCrossRefGoogle Scholar
  5. 5.
    Kashyap H, Ahmed HA, Hoque N, Roy S, Bhattacharyya DK (2015) Big data analytics in bioinformatics: a machine learning perspective. arXiv preprint arXiv:1506.05101
  6. 6.
    Marx V (2013) Biology: the big challenges of big data. Nature 498(7453):255–260CrossRefGoogle Scholar
  7. 7.
    Stein LD (2010) The case for cloud computing in genome informatics. Genome Biol 11(5):207CrossRefGoogle Scholar
  8. 8.
    Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ, Iyer R, Schatz MC, Sinha S, Robinson GE (2015) Big data: Astronomical or genomical? PLoS Biol 13(7):e1002195CrossRefGoogle Scholar
  9. 9.
    Vellido A, Martín-Guerrero JD, Lisboa PJG (2012) Making machine learning models interpretable. In: Proceedings of the \(20{\rm th}\) European symposium on artificial neural networks, computational intelligence and machine learning (ESANN), Bruges, Belgium, pp 163–172Google Scholar
  10. 10.
    Dong Y, Su H, Zhu J, Bao F (2017) Towards interpretable deep neural networks by leveraging adversarial examples. arXiv preprint arXiv:1708.05493
  11. 11.
    Schwartz-Ziv R, Tishby N (2017) Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810v3
  12. 12.
    Biran O, Cotton C (2017) Explanation and justification in machine learning: a survey. In: IJCAI-17 workshop on explainable AI (XAI), p 8Google Scholar
  13. 13.
    Pereira-Fariña M, Reed C (2017) Preface to proceedings of the 1st workshop on explainable computational intelligence (XCI 2017)Google Scholar
  14. 14.
    Adadi A, Berrada M (2018) Peeking inside the black-box: a survey on explainable Artificial Intelligence (XAI). IEEE Access 6:52138–52160CrossRefGoogle Scholar
  15. 15.
    Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. arXiv preprint. arXiv:1702.08608
  16. 16.
    Doshi-Velez F, Kortz M, Budish R, Bavitz C, Gershman S, O’Brien D, Schieber S, Waldo J, Weinberger D, Wood A (2017) Accountability of AI under the law: the role of explanation. arXiv preprint arXiv:1711.01134
  17. 17.
    Vignard K (2014) The weaponization of increasingly autonomous technologies: considering how meaningful human control might move discussion forward. UNIDIR Resour 2:1Google Scholar
  18. 18.
    Davison N (2018) A legal perspective: autonomous weapon systems under international humanitarian law. United Nations Office of Disarmament Affairs (UNODA) Occasional Papers, pp 5–18Google Scholar
  19. 19.
    Press M (2016) Of robots and rules: autonomous weapon systems in the law of armed conflict. Geo J Int Law (Georgetown J of Int Law) 48:1337Google Scholar
  20. 20.
    Kroll JA (2018) The fallacy of inscrutability. Philos Trans R Soc A 376(2133):20180084CrossRefGoogle Scholar
  21. 21.
    Goodman B, Flaxman S (2017) European Union regulations on algorithmic decision making and a “right to explanation. AI Magz 38(3):76Google Scholar
  22. 22.
    Rossi F (2016) Artificial intelligence: potential benefits and ethical considerations. Eur Parliam Policy Dep C Citiz Rights Const Affairs Brief PE 571:380Google Scholar
  23. 23.
    Wachter S, Mittelstadt B, Floridi L (2017) Why a right to explanation of automated decision-making does not exist in the general data protection regulation. Int Data Priv Law 7(2):76–99CrossRefGoogle Scholar
  24. 24.
    Miller T, Howe P, Sonenberg L (2017) Explainable AI: beware of inmates running the asylum. In: IJCAI-17 workshop on explainable AI (XAI), p 36Google Scholar
  25. 25.
    Cath C (2018) Governing artificial intelligence: ethical, legal and technical opportunities and challenges. Philos Trans R Soc A 376(2133):20180080CrossRefGoogle Scholar
  26. 26.
    Keim DA, Mansmann F, Schneidewind J, Thomas J, Ziegler H (2008) Visual analytics: scope and challenges. In: Visual data mining, LNCS, vol 4404. Springer, pp 76–90Google Scholar
  27. 27.
    Liu S, Wang X, Liu M, Zhu J (2017) Towards better analysis of machine learning models: a visual analytics perspective. Vis Inf 1(1):48–56Google Scholar
  28. 28.
    Vellido A, Martín JD, Rossi F, Lisboa PJ (2011) Seeing is believing: the importance of visualization in real-world machine learning applications. In: Proceedings of the \(19{\rm th}\) European symposium on artificial neural networks, computational intelligence and machine learning (ESANN), Bruges, Belgium, pp 219–226Google Scholar
  29. 29.
    Liu M, Shi J, Li Z, Li C, Zhu J, Liu S (2017) Towards better analysis of deep convolutional neural networks. IEEE Trans Vis Comput Gr 23(1):91–100CrossRefGoogle Scholar
  30. 30.
    Selvaraju RR, Das A, Vedantam R, Cogswell M, Parikh D, Batra D (2017) Grad-CAM: Why did you say that? visual explanations from deep networks via gradient-based localization. In: Proceedings of the international conference on computer vision (ICCV 2017), pp. 618–626Google Scholar
  31. 31.
    Simonyan K, Vedaldi A, Zisserman A (2014) Deep inside convolutional networks: visualising image classification models and saliency maps. In: Workshop proceedings of the international conference on learning representations (ICLR)Google Scholar
  32. 32.
    Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Proceedings of the European conference on computer vision (ECCV), pp 818–833Google Scholar
  33. 33.
    Sacha D, Sedlmair M, Zhang L, Lee JA, Peltonen J, Weiskopf D, North SC, Keim DA (2017) What you see is what you can change: human-centred machine learning by interactive visualization. Neurocomputing 268:164–175CrossRefGoogle Scholar
  34. 34.
    Reza SM (2016) Transforming big data into computational models for personalized medicine and health care. Dialog Clin Neurosci 18(3):339–343Google Scholar
  35. 35.
    Jensen PB, Jensen LJ, Brunak S (2012) Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet 13(6):395–405CrossRefGoogle Scholar
  36. 36.
    Hoff T (2011) Deskilling and adaptation among primary care physicians using two work innovations. Health Care Manage R 36(4):338–348CrossRefGoogle Scholar
  37. 37.
    Cabitza F, Rasoini R, Gensini GF (2017) Unintended consequences of machine learning in medicine. JAMA 318(6):517–518CrossRefGoogle Scholar
  38. 38.
    Safdar S, Zafar S, Zafar N, Khan NF (2017) Machine learning based decision support systems (DSS) for heart disease diagnosis: a review. Artif Intell Rev 50(4):597–623CrossRefGoogle Scholar
  39. 39.
    Pombo N, Araújo P, Viana J (2014) Knowledge discovery in clinical decision support systems for pain management: a systematic review. Artif Intell Med 60(1):1–11CrossRefGoogle Scholar
  40. 40.
    Vellido A, Ribas V, Morales C, Ruiz-Sanmartín A, Ruiz-Rodríguez JC (2018) Machine learning for critical care: state-of-the-art and a sepsis case study. BioMed Eng OnLine 17(S1):135CrossRefGoogle Scholar
  41. 41.
    Dreiseitl S, Binder M (2005) Do physicians value decision support? A look at the effect of decision support systems on physician opinion. Artif Intell Med 33(1):25–30CrossRefGoogle Scholar
  42. 42.
    Tu JV (1996) Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. J Clin Epidemiol 49(11):1225–1231CrossRefGoogle Scholar
  43. 43.
    Angermueller C, Pärnamaa T, Parts L, Stegle O (2016) Deep learning for computational biology. Mol Syst Biol 12(7):878CrossRefGoogle Scholar
  44. 44.
    Mamoshina P, Vieira A, Putin E, Zhavoronkov A (2016) Applications of deep learning in biomedicine. Mol Pharm 13(5):1445–1454CrossRefGoogle Scholar
  45. 45.
    Miotto R, Li L, Kidd BA, Dudley JT (2016) Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci Rep 6:26094CrossRefGoogle Scholar
  46. 46.
    Jackups R (2017) Deep learning makes its way to the clinical laboratory. Clin Chem 63(12):1790–1791CrossRefGoogle Scholar
  47. 47.
    Ravì D, Wong C, Deligianni F, Berthelot M, Andreu-Pérez J, Lo B, Yang GZ (2017) Deep learning for health informatics. IEEE J Biomed Health 21(1):4–21CrossRefGoogle Scholar
  48. 48.
    Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, Ferrero E, Agapow PM, Zietz M, Hoffman MM, Xie W (2018) Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface 15(141):20170387CrossRefGoogle Scholar
  49. 49.
    Bacciu D, Lisboa PJ, Martín JD, Stoean R, Vellido A (2018) Bioinformatics and medicine in the era of deep learning. In: Proceedings of the \(26{\rm th}\) European symposium on artificial neural networks, computational intelligence and machine learning (ESANN 2018), Bruges, Belgium, pp 345–354Google Scholar
  50. 50.
    Che Z, Purushotham S, Khemani R, Liu Y (2015) Distilling knowledge from deep networks with applications to healthcare domain. arXiv preprint arXiv:1512.03542
  51. 51.
    Wu M, Hughes M, Parbhoo S, Doshi-Velez F (2017) Beyond sparsity: tree-based regularization of deep models for interpretability. In: Neural information processing systems (NIPS) conference. Transparent and interpretable machine learning in safety critical environments (TIML) workshop, Long Beach (CA), USAGoogle Scholar
  52. 52.
    Che Z, Purushotham S, Khemani R, Liu Y (2016) Interpretable deep models for ICU outcome prediction. In: AMIA annual symposium proceedings, vol 2016. American Medical Informatics Association, p 371Google Scholar
  53. 53.
    Choi E, Bahadori MT, Sun J, Kulas J, Schuetz A, Stewart W (2016) Retain: an interpretable predictive model for healthcare using reverse time attention mechanism. In: Advances in neural information processing systems (NIPS), pp 3504–3512Google Scholar
  54. 54.
    Ma F, Chitta R, Zhou J, You Q, Sun T, Gao J (2017) Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In: Proceedings of the \(23{\rm rd}\) ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 1903–1911Google Scholar
  55. 55.
    Sha Y, Wang MD (2017) Interpretable predictions of clinical outcomes with an attention-based recurrent neural network. In: Proceedings of the \(8{\rm th}\) ACM international conference on bioinformatics, computational biology, and health informatics (ACM-BCB), pp 233–240Google Scholar
  56. 56.
    Zhang Z, Xie Y, Xing F, McGough M, Yang L (2017) MDNet: a semantically and visually interpretable medical image diagnosis network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6428–6436Google Scholar
  57. 57.
    Nguyen P, Tran T, Wickramasinghe N, Venkatesh S (2017) Deepr: a convolutional net for medical records. IEEE J Biomed Health Inform 21:2230CrossRefGoogle Scholar
  58. 58.
    Hicks SA, Eskeland S, Lux M, de Lange T, Randel KR, Jeppsson M, Pogorelov K, Halvorsen P, Riegler M (2018) Mimir: an automatic reporting and reasoning system for deep learning based analysis in the medical domain. In: Proceedings of the \(9{\rm th}\) ACM multimedia systems conference (MMSys), pp 369–374Google Scholar
  59. 59.
    Rögnvaldsson T, Etchells TA, You L, Garwicz D, Jarman I, Lisboa PJ (2009) How to find simple and accurate rules for viral protease cleavage specificities. BMC Bioinform 10(1):149CrossRefGoogle Scholar
  60. 60.
    Van Belle V, Van Calster B, Van Huffel S, Suykens JAK, Lisboa P (2016) Explaining support vector machines: a color based nomogram. PLoS ONE 11(10):e0164568CrossRefGoogle Scholar
  61. 61.
    Vellido A, Romero E, Julià-Sapé M, Majós C, Moreno-Torres À, Arús C (2012) Robust discrimination of glioblastomas from metastatic brain tumors on the basis of single-voxel proton MRS. NMR Biomed 25(6):819828CrossRefGoogle Scholar
  62. 62.
    Ash JS, Berg M, Coiera E (2004) Some unintended consequences of information technology in health care: the nature of patient care information system-related errors. JAMA 11(2):104–112Google Scholar
  63. 63.
    Reid MJ (2017) Black-box machine learning: implications for healthcare. Polygeia, LondonGoogle Scholar
  64. 64.
    Berner ES, Graber ML (2008) Overconfidence as a cause of diagnostic error in medicine. Am J Med 121(5):S2–S23CrossRefGoogle Scholar
  65. 65.
    Bhanot G, Biehl M, Villmann T, Zühlke D (2017) Biomedical data analysis in translational research: Integration of expert knowledge and interpretable models. In: Proceedings of the \(25{\rm th}\) European symposium on artificial neural networks, computational intelligence and machine learning (ESANN), pp 177–186Google Scholar
  66. 66.
    Holzinger A (2016) Interactive machine learning for health informatics: When do we need the human-in-the-loop? Brain Inform 3(2):119–131CrossRefGoogle Scholar
  67. 67.
    Julià-Sapé M, Acosta D, Mier M, Arús C, Watson D, The INTERPRET Consortium (2006) A multi-centre, web-accessible and quality control-checked database of in vivo MR spectra of brain tumour patients. Magn Reson Mater Phys 19(1):22–33CrossRefGoogle Scholar
  68. 68.
    Julià-Sapé M, Lurgi M, Mier M, Estanyol F, Rafael X, Candiota AP, Barceló A, García A, Martínez-Bisbal MC, Ferrer-Luna R, Moreno-Torres À (2012) Strategies for annotation and curation of translational databases: the eTUMOUR project. Database 2012:bas035Google Scholar
  69. 69.
    Vellido A, Romero E, González-Navarro FF, Belanche-Muñoz LA, Julià-Sapé M, Arús C (2009) Outlier exploration and diagnostic classification of a multi-centre 1H-MRS brain tumour database. Neurocomputing 72(13–15):3085–3097CrossRefGoogle Scholar
  70. 70.
    Vellido A, Romero E, Julià-Sapé M, Majós C, Moreno-Torres À, Pujol J, Arús C (2012) Robust discrimination of glioblastomas from metastatic brain tumors on the basis of single-voxel \(^{1}\text{ H } \text{ MRS }\). NMR Biomed 25(6):819–828CrossRefGoogle Scholar
  71. 71.
    Mocioiu V, Kyathanahally SP, Arús C, Vellido A, Julià-Sapé M (2016) Automated quality control for proton magnetic resonance spectroscopy data using convex non-negative matrix factorization, In: Proceedings of the \(4{\rm th}\) international conference on bioinformatics and biomedical engineering (IWBBIO), LNCS/LNBI, Vol 9656, pp 719–727Google Scholar
  72. 72.
    Rajkomar A et al (2018) Scalable and accurate deep learning for electronic health records. NPJ Digit Med 1(1):18CrossRefGoogle Scholar
  73. 73.
    Shah H (2017) The DeepMind debacle demands dialogue on data. Nature 547:259CrossRefGoogle Scholar
  74. 74.
    Miotto R, Wang F, Wang S, Jiang X, Dudley JT (2017) Deep learning for healthcare: review, opportunities and challenges. Brief Bioinform 19(6):1236–1246CrossRefGoogle Scholar
  75. 75.
    Litjens G et al (2017) A survey on deep learning in medical image analysis. Med Image Anal 42:60–88CrossRefGoogle Scholar
  76. 76.
    Chartrand G et al (2017) Deep learning: a primer for radiologists. Radiographics 37(7):2113–2131CrossRefGoogle Scholar
  77. 77.
    Shickel B, Tighe PJ, Bihorac A, Rashidi P (2018) Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J Biomed Health Inform 22(5):1589–1604CrossRefGoogle Scholar
  78. 78.
    Zaharchuk G, Gong E, Wintermark M, Rubin D, Langlotz CP (2018) Deep learning in neuroradiology. AJNR Am J Neuroradiol 39(10):1776–1784CrossRefGoogle Scholar
  79. 79.
    Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T (2018) The rise of deep learning in drug discovery. Drug Discov Today 23(6):1241–1250CrossRefGoogle Scholar
  80. 80.
    Kwon BC, Choi MJ, Kim JT, Choi E, Kim YB, Kwon S, Sun J, Choo J (2019) RetainVis: visual analytics with interpretable and interactive recurrent neural networks on electronic medical records. IEEE Trans Vis Comput Graph 25(1):299–309CrossRefGoogle Scholar
  81. 81.
    Wu J, Peck D, Hsieh S, Dialani V, Lehman CD, Zhou B, Syrgkanis V, Mackey L, Patterson G (2018) Expert identification of visual primitives used by CNNs during mammogram classification. In: SPIE medical imaging 2018: computer-aided diagnosis, p 10575:105752TGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Computer Science Department, Intelligent Data Science and Artificial Intelligence (IDEAI) Research CenterUniversitat Politècnica de CatalunyaBarcelonaSpain

Personalised recommendations