Skip to main content

Human-Understandable Decision Making for Visual Recognition

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12714))

Included in the following conference series:

Abstract

The widespread use of deep neural networks has achieved substantial success in many tasks. However, there still exists a huge gap between the operating mechanism of deep learning models and human-understandable decision making, so that humans cannot fully trust the predictions made by these models. To date, little work has been done on how to align the behaviors of deep learning models with human perception in order to train a human-understandable model. To fill this gap, we propose a new framework to train a deep neural network by incorporating the prior of human perception into the model learning process. Our proposed model mimics the process of perceiving conceptual parts from images and assessing their relative contributions towards the final recognition. The effectiveness of our proposed model is evaluated on two classical visual recognition tasks. The experimental results and analysis confirm our model is able to provide interpretable explanations for its predictions, but also maintain competitive recognition accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., Kim, B.: Sanity checks for saliency maps. In: NeurIPS, pp. 9505–9515 (2018)

    Google Scholar 

  2. Biederman, I.: Recognition-by-components: a theory of human image understanding. Psychol. Rev. 94(2), 115 (1987)

    Article  Google Scholar 

  3. Chang, D., et al.: The devil is in the channels: mutual-channel loss for fine-grained image classification. IEEE Trans. Image Process. 29, 4683–4695 (2020)

    Google Scholar 

  4. Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad-cam++: generalized gradient-based visual explanations for deep convolutional networks. In: WACV, pp. 839–847. IEEE (2018)

    Google Scholar 

  5. Dong, Y., Su, H., Zhu, J., Zhang, B.: Improving interpretability of deep neural networks with semantic information. In: CVPR, pp. 4306–4314 (2017)

    Google Scholar 

  6. Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 (2017)

  7. Ghorbani, A., Abid, A., Zou, J.: Interpretation of neural networks is fragile. In: AAAI, Vol. 33, No. 01, pp. 3681–3688 (2019)

    Google Scholar 

  8. Ghorbani, A., Wexler, J., Zou, J.Y., Kim, B.: Towards automatic concept-based explanations. In: NeurIPS, pp. 9273–9282 (2019)

    Google Scholar 

  9. Goodfellow, I.J., et al.: Challenges in representation learning: a report on three machine learning contests. In: Lee, M., Hirose, A., Hou, Z.-G., Kil, R.M. (eds.) ICONIP 2013. LNCS, vol. 8228, pp. 117–124. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-42051-1_16

    Chapter  Google Scholar 

  10. Hendricks, L.A., Akata, Z., Rohrbach, M., Donahue, J., Schiele, B., Darrell, T.: Generating visual explanations. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 3–19. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_1

    Chapter  Google Scholar 

  11. Huang, Z., Li, Y.: Interpretable and accurate fine-grained recognition via region grouping. In: CVPR, pp. 8662–8672 (2020)

    Google Scholar 

  12. Jacobs, R.A., et al.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)

    Article  Google Scholar 

  13. Lake, B.M., Ullman, T.D., Tenenbaum, J.B., Gershman, S.J.: Building machines that learn and think like people. Behav. Brain Sci. 40, (2017)

    Google Scholar 

  14. Laugel, T., Lesot, M.J., Marsala, C., Renard, X., Detyniecki, M.: The dangers of post-hoc interpretability: unjustified counterfactual explanations. In: IJCAI, pp. 2801–2807. AAAI Press (2019)

    Google Scholar 

  15. Mordan, T., Thome, N., Henaff, G., Cord, M.: End-to-end learning of latent deformable part-based representations for object detection. Int. J. Compute. Vision 127(11–12), 1659–1679 (2019)

    Article  Google Scholar 

  16. Paçacı, G., Johnson, D., McKeever, S., Hamfelt, A.: “Why did you do that?”: explaining black box models with inductive synthesis. In: ICCS, Faro, Algarve, Portugal (2019)

    Google Scholar 

  17. Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you?: explaining the predictions of any classifier. In: SIGKDD, pp. 1135–1144. ACM (2016)

    Google Scholar 

  18. Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. In: ICML, pp. 3145–3153. JMLR. org (2017)

    Google Scholar 

  19. Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: ICML, pp. 3319–3328. JMLR. org (2017)

    Google Scholar 

  20. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-ucsd birds-200-2011 dataset. Tech. Rep. CNS-TR-2011-001, Caltech (2011)

    Google Scholar 

  21. Wu, H., Wang, C., Yin, J., Lu, K., Zhu, L.: Sharing deep neural network models with interpretation. In: WWW, pp. 177–186. WWW Steering Committee (2018)

    Google Scholar 

  22. Zhang, H., et al.: SPDA-CNN: unifying semantic part detection and abstraction for fine-grained recognition. In: CVPR, pp. 1143–1152 (2016)

    Google Scholar 

  23. Zhang, Q., Nian Wu, Y., Zhu, S.C.: Interpretable convolutional neural networks. In: CVPR, pp. 8827–8836 (2018)

    Google Scholar 

Download references

Acknowledgments

This work is partially supported by ARC under Grant DP180100106 and DP200101328. Xiaowei Zhou is supported by a Data61 Student Scholarship from CSIRO.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaowei Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, X., Yin, J., Tsang, I., Wang, C. (2021). Human-Understandable Decision Making for Visual Recognition. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12714. Springer, Cham. https://doi.org/10.1007/978-3-030-75768-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-75768-7_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-75767-0

  • Online ISBN: 978-3-030-75768-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics