Understanding Deep Neural Network by Filter Sensitive Area Generation Network
Deep convolutional networks have recently gained much attention because of their impressive performance on some visual tasks. However, it is still not clear why they achieve such great success. In this paper, a novel approach called Filter Sensitive Area Generation Network (FSAGN), has been proposed to interpret what the convolutional filters have learnt after training CNNs. Given any trained CNN model, the proposed method aims to figure out which object part each filter represents in a high conv-layer, through appropriate input image mask which filters out unrelated area. In order to obtain such a mask, a mask generation network is designed and the corresponding loss function is defined to evaluate the changes of feature maps before and after mask operation. Experiments on multiple datasets and networks show that FSAGN clarifies the knowledge representations of each filter and how small disturbance on specific object parts affects the performance of CNNs.
KeywordsConvolutional neural network Interpretability Knowledge representations
This work was supported in part by the National Key Research and Development Program of China (2017YFB1300203), in part by the National Natural Science Foundation of China under Grant 91648205.
- 2.Zhang, X., Xiong, H., Zhou, W., Tian, Q.: Picking deep filter responses for fine-grained image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, pp. 1134–1142 (2016)Google Scholar
- 3.Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: International Conference on Neural Information Processing Systems (NIPS 2015), vol. 39, pp. 91–99. MIT Press (2015)Google Scholar
- 4.Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Computer Vision and Pattern Recognition (CVPR 2016), pp. 779–788. IEEE Computer Society (2016)Google Scholar
- 6.Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: IEEE International Conference on Computer Vision (ICCV 2015), pp. 1520–1528. IEEE Computer Society (2015)Google Scholar
- 8.Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., Lipson, H.: Understanding neural networks through deep visualization. In: International Conference on Machine Learning — Deep Learning Workshop, pp. 12 (2015)Google Scholar
- 9.Kumar, D., Wong, A., Taylor, G.W., Kumar, D., Wong, A., Taylor, G.W.: Explaining the unexplained: A CLass-Enhanced Attentive Response (CLEAR) approach to understanding deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR 2017), pp. 1686–1694. IEEE (2017)Google Scholar
- 10.Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. (2014)
- 11.Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151. (2013)
- 12.Zheng, H., Fu, J., Mei, T., Luo, J.: Learning multi-attention convolutional neural network for fine-grained image recognition. In: IEEE International Conference on Computer Vision, pp. 5219–5227. IEEE Computer Society (2017)Google Scholar
- 13.Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Li, F.F.: ImageNet: a large-scale hierarchical image database. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 2009), pp. 248–255. IEEE (2009)Google Scholar