Not Just a Matter of Semantics: The Relationship Between Visual and Semantic Similarity
Abstract
Knowledge transfer, zero-shot learning and semantic image retrieval are methods that aim at improving accuracy by utilizing semantic information, e.g., from WordNet. It is assumed that this information can augment or replace missing visual data in the form of labeled training images because semantic similarity correlates with visual similarity.
This assumption may seem trivial, but is crucial for the application of such semantic methods. Any violation can cause mispredictions. Thus, it is important to examine the visual-semantic relationship for a certain target problem. In this paper, we use five different semantic and visual similarity measures each to thoroughly analyze the relationship without relying too much on any single definition.
We postulate and verify three highly consequential hypotheses on the relationship. Our results show that it indeed exists and that WordNet semantic similarity carries more information about visual similarity than just the knowledge of “different classes look different”. They suggest that classification is not the ideal application for semantic methods and that wrong semantic information is much worse than none.
Notes
Acknowledgements
This work was supported by the DAWI research infrastructure project, funded by the federal state of Thuringia (grant no. 2017 FGI 0031), including access to computing and storage facilities.
References
- 1.Barz, B., Denzler, J.: Hierarchy-based image embeddings for semantic image retrieval. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 638–647. IEEE (2019)Google Scholar
- 2.Bilal, A., Jourabloo, A., Ye, M., Liu, X., Ren, L.: Do convolutional neural networks learn class hierarchy? 24(1), 152–162. https://doi.org/10.1109/TVCG.2017.2744683
- 3.Van den Branden Lambrecht, C.J., Verscheure, O.: Perceptual quality measure using a spatiotemporal model of the human visual system. In: Digital Video Compression: Algorithms and Technologies 1996, vol. 2668, pp. 450–462. International Society for Optics and Photonics (1996)Google Scholar
- 4.Brust, C.A., et al.: Towards automated visual monitoring of individual gorillas in the wild. In: International Conference on Computer Vision Workshop (ICCV-WS) (2017)Google Scholar
- 5.Chen, G., Han, T.X., He, Z., Kays, R., Forrester, T.: Deep convolutional neural network based species recognition for wild animal monitoring. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 858–862. IEEE (2014)Google Scholar
- 6.Deselaers, T., Ferrari, V.: Visual and semantic similarity in ImageNet. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1777–1784. IEEE (2011)Google Scholar
- 7.Freytag, A., Rodner, E., Simon, M., Loos, A., Kühl, H., Denzler, J.: Chimpanzee faces in the wild: Log-Euclidean CNNs for predicting identities and attributes of primates. In: German Conference on Pattern Recognition (GCPR), pp. 51–63 (2016)Google Scholar
- 8.Frome, A., et al.: Devise: a deep visual-semantic embedding model. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 2121–2129. Curran Associates, Inc. (2013)Google Scholar
- 9.Harispe, S., Ranwez, S., Janaqi, S., Montmain, J.: Semantic similarity from natural language and ontology analysis. Synth. Lect. Hum. Lang. Technol. 8(1), 1–254 (2015)CrossRefGoogle Scholar
- 10.He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
- 11.Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint arXiv:cmp-lg/9709008 (1997)
- 12.Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. (IJCV) 123(1), 32–73 (2017)MathSciNetCrossRefGoogle Scholar
- 13.Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report 4, University of Toronto (2009)Google Scholar
- 14.Kumar, A.: Computer-vision-based fabric defect detection: a survey. IEEE Trans. Industr. Electron. 55(1), 348–363 (2008)CrossRefGoogle Scholar
- 15.Litjens, G., et al.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017)CrossRefGoogle Scholar
- 16.Liu, Y., Zhang, D., Lu, G., Ma, W.Y.: A survey of content-based image retrieval with high-level semantics. Pattern Recogn. 40(1), 262–282 (2007)CrossRefGoogle Scholar
- 17.Maedche, A., Staab, S.: Comparing ontologies-similarity measures and a comparison study. Technical report, Institute AIFB, University of Karlsruhe (2001)Google Scholar
- 18.Malamas, E.N., Petrakis, E.G., Zervakis, M., Petit, L., Legat, J.D.: A survey on industrial vision systems, applications and tools. Image Vis. Comput. 21(2), 171–188 (2003)CrossRefGoogle Scholar
- 19.Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995). https://doi.org/10.1145/219717.219748
- 20.Niemann, H.: Pattern Analysis. Springer Series in Information Sciences. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-96650-7. https://books.google.de/books?id=mdOoCAAAQBAJ
- 21.Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vision 42(3), 145–175 (2001)CrossRefGoogle Scholar
- 22.Rada, R., Mili, H., Bicknell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Trans. Syst. Man Cybern. 19(1), 17–30 (1989)CrossRefGoogle Scholar
- 23.Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint arXiv:cmp-lg/9511007 (1995)
- 24.Rohrbach, M., Stark, M., Schiele, B.: Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1641–1648. IEEE (2011)Google Scholar
- 25.Ross, S.M.: A First Course in Probability. Macmillan, New York (1976)zbMATHGoogle Scholar
- 26.Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision (IJCV) 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
- 27.Salem, M.A.M., Atef, A., Salah, A., Shams, M.: Recent survey on medical image segmentation. In: Computer Vision: Concepts, Methodologies, Tools, and Applications, pp. 129–169. IGI Global (2018)Google Scholar
- 28.Sánchez, D., Batet, M., Isern, D., Valls, A.: Ontology-based semantic similarity: a new feature-based approach. Expert Syst. Appl. 39(9), 7718–7728 (2012)CrossRefGoogle Scholar
- 29.Silla, C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Disc. 22(1), 31–72 (2011)MathSciNetCrossRefGoogle Scholar
- 30.Spearman, C.: The proof and measurement of association between two things. Am. J. Psychol. 15(1), 72–101 (1904)CrossRefGoogle Scholar
- 31.Thevenot, J., López, M.B., Hadid, A.: A survey on computer vision for assistive medical diagnosis from faces. IEEE J. Biomed. Health Inform. 22(5), 1497–1511 (2018)CrossRefGoogle Scholar
- 32.Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: a large data set for nonparametric object and scene recognition. Trans. Pattern Anal. Mach. Intell. (PAMI) 30(11), 1958–1970 (2008)CrossRefGoogle Scholar
- 33.Tversky, A.: Features of similarity. Psychol. Rev. 84(4), 327 (1977)CrossRefGoogle Scholar
- 34.Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)CrossRefGoogle Scholar
- 35.Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. arXiv preprint arXiv:1801.03924
- 36.Zhou, Z., Wang, Y., Gu, J.: A new model of information content for semantic similarity in WordNet. In: Second International Conference on Future Generation Communication and Networking Symposia, 2008, FGCNS 2008, vol. 3, pp. 85–89. IEEE (2008)Google Scholar