Concept-Centric Visual Turing Tests for Method Validation

  • Tatiana FountoukidouEmail author
  • Raphael Sznitman
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11768)


Recent advances in machine learning for medical imaging have led to impressive increases in model complexity and overall capabilities. However, the ability to discern the precise information a machine learning method is using to make decisions has lagged behind and it is often unclear how these performances are in fact achieved. Conventional evaluation metrics that reduce method performance to a single number or a curve only provide limited insights. Yet, systems used in clinical practice demand thorough validation that such crude characterizations miss. To this end, we present a framework to evaluate classification methods based on a number of interpretable concepts that are crucial for a clinical task. Our approach is inspired by the Turing Test concept and how to devise a test that adaptively questions a method for its ability to interpret medical images. To do this, we make use of a Twenty Questions paradigm whereby we use a probabilistic model to characterize the method’s capacity to grasp task-specific concepts, and we introduce a strategy to sequentially query the method according to its previous answers. The results show that the probabilistic model is able to expose both the dataset’s and the method’s biases, and can be used to reduce the number of queries needed for confident performance evaluation.


  1. 1.
    Antol, S., et al.: VQA: visual question answering. In: The IEEE International Conference on Computer Vision (ICCV), December 2015Google Scholar
  2. 2.
    Bendig, A.: Twenty questions: an information analysis. J. Exp. Psychol. 5, 345–348 (1953)CrossRefGoogle Scholar
  3. 3.
    Chuquicusma, M.J., Hussein, S., Burt, J., Bagci, U.: How to fool radiologists with generative adversarial networks? a visual turing test for lung cancer diagnosis. In: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), pp. 240–244. IEEE (2018)Google Scholar
  4. 4.
    Geman, D., Geman, S., Hallonquist, N., Younes, L.: Visual turing test for computer vision systems. Proc. Natl. Acad. Sci. 112(12), 3618–3623 (2015)Google Scholar
  5. 5.
    Hasan, S.A., Ling, Y., Farri, O., Liu, J., Lungren, M., Müller, H.: Overview of the ImageCLEF 2018 medical domain visual question answering task. In: CLEF 2018 Working Notes (2018)Google Scholar
  6. 6.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  7. 7.
    Jedynak, B., Frazier, P., Sznitman, R.: Twenty questions with noise: Bayes optimal policies for entropy loss. J. Appl. Probab. 1, 114–136 (2012)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Lau, J.J., Gayen, S., Abacha, A.B., Demner-Fushman, D.: A dataset of clinically generated visual questions and answers about radiology images. Sci. Data 5, 180251 (2018)CrossRefGoogle Scholar
  9. 9.
    Maier-Hein, L., et al.: Author correction: why rankings of biomedical image analysis competitions should be interpreted with care. Nat. Commun. 10(1), 588 (2019)CrossRefGoogle Scholar
  10. 10.
    Prasanna, P., et al.: Indian diabetic retinopathy image dataset (IDRiD) (2018)Google Scholar
  11. 11.
    Rasmussen, C.E.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)Google Scholar
  12. 12.
    Schlegl, T., Seeböck, P., Waldstein, S.M., Langs, G., Schmidt-Erfurth, U.: f-AnoGAN: fast unsupervised anomaly detection with generative adversarial networks. Med. Image Anal. 54, 30–44 (2019)CrossRefGoogle Scholar
  13. 13.
    Turing, A.: Computing machinery and intelligence. Mind 49(236), 433–460 (1950)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Wu, Q., Teney, D., Wang, P., Shen, C., Dick, A., van den Hengel, A.: Visual question answering: a survey of methods and datasets. Comput. Vis. Image Underst. 163, 21–40 (2017)CrossRefGoogle Scholar
  15. 15.
    Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks, May 2017Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.ARTORG Center, University of BernBernSwitzerland

Personalised recommendations