Quantifying the Amount of Visual Information Used by Neural Caption Generators
This paper addresses the sensitivity of neural image caption generators to their visual input. A sensitivity analysis and omission analysis based on image foils is reported, showing that the extent to which image captioning architectures retain and are sensitive to visual information varies depending on the type of word being generated and the position in the caption as a whole. We motivate this work in the context of broader goals in the field to achieve more explainability in AI.
KeywordsImage captioning Sensitivity analysis Explainable AI
The research in this paper is partially funded by the Endeavour Scholarship Scheme (Malta). Scholarships are part-financed by the European Union - European Social Fund (ESF) - Operational Programme II Cohesion Policy 2014–2020 Investing in human capital to create more opportunities and promote the well-being of society.
- 3.Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the CVPR 2015 (2015). https://doi.org/10.1109/cvpr.2015.7298932
- 5.Samek, W., Wiegand, T., Müller, K.R.: Explainable artificial intelligence: understanding, visualizing and interpreting deep learning models. ITU Journal: ICT Discoveries - Special Issue 1 - Impact Artif. Intell. (AI) Commun. Netw. Serv. 1(1), 39–48 (2018). https://www.itu.int/en/journal/001/Pages/05.aspx
- 6.Shekhar, R., et al.: Foil it! find one mismatch between image and language caption. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 255–265. Association for Computational Linguistics, Vancouver, Canada, July 2017. http://aclweb.org/anthology/P17-1024
- 7.Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR 1409.1556 (2014)Google Scholar
- 8.Tanti, M., Gatt, A., Camilleri, K.P.: Where to put the image in an image caption generator. Natural Language Engineering 24(3), 467–489 (2018). https://doi.org/10.1017/S1351324918000098. https://www.cambridge.org/core/journals/natural-language-engineering/article/where-to-put-the-image-in-an-image-caption-generator/A5B0ACFFE8E4AEAA5840DC61F93153F3#fndtn-informationCrossRefGoogle Scholar