Integrating Domain Knowledge: Using Hierarchies to Improve Deep Classifiers

  • Clemens-Alexander BrustEmail author
  • Joachim Denzler
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12046)


One of the most prominent problems in machine learning in the age of deep learning is the availability of sufficiently large annotated datasets. For specific domains, e.g. animal species, a long-tail distribution means that some classes are observed and annotated insufficiently. Additional labels can be prohibitively expensive, e.g. because domain experts need to be involved. However, there is more information available that is to the best of our knowledge not exploited accordingly.

In this paper, we propose to make use of preexisting class hierarchies like WordNet to integrate additional domain knowledge into classification. We encode the properties of such a class hierarchy into a probabilistic model. From there, we derive a novel label encoding and a corresponding loss function. On the ImageNet and NABirds datasets our method offers a relative improvement of \(10.4\%\) and \(9.6\%\) in accuracy over the baseline respectively. After less than a third of training time, it is already able to match the baseline’s fine-grained recognition performance. Both results show that our suggested method is efficient and effective.


Class hierarchy Knowledge integration Hierarchical classification 


  1. 1.
    Bart, E., et al.: Unsupervised learning of visual taxonomies. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)Google Scholar
  2. 2.
    Barz, B., Denzler, J.: Hierarchy-based image embeddings for semantic image retrieval. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 638–647, January 2019Google Scholar
  3. 3.
    Benkhalifa, M., Mouradi, A., Bouyakhf, H.: Integrating WordNet knowledge to supplement training data in semi-supervised agglomerative hierarchical clustering for text categorization. Int. J. Intell. Syst. 16(8), 929–947 (2001)CrossRefGoogle Scholar
  4. 4.
    Bilal, A., et al.: Do convolutional neural networks learn class hierarchy? IEEE Trans. Vis. Comput. Graph. 24(1), 152–162 (2018)CrossRefGoogle Scholar
  5. 5.
    Brust, C.-A., Denzler, J.: Not just a matter of semantics: the relationship between visual similarity and semantic similarity. arXiv:1811.07120 [cs], 17 November 2018
  6. 6.
    Deng, J., et al.: Imagenet: a large-scale hierarchical image database. In: Computer Vision and Pattern Recognition (CVPR), pp. 248–255 (2009)Google Scholar
  7. 7.
    Deng, J., et al.: Large-scale object classification using label relation graphs. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 48–64. Springer, Cham (2014). Scholar
  8. 8.
    Deng, J., Berg, A.C., Li, K., Fei-Fei, L.: What does classifying more than 10,000 image categories tell us? In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 71–84. Springer, Heidelberg (2010). Scholar
  9. 9.
    Deselaers, T., Ferrari, V.: Visual and semantic similarity in imagenet. In: Computer Vision and Pattern Recognition (CVPR), pp. 1777–1784 (2011)Google Scholar
  10. 10.
    Faghri, F., et al.: VSE++: improving visual-semantic embeddings with hard negatives. arXiv:1707.05612 [cs], 18 July 2017
  11. 11.
    Fellbaum, C.: WordNet. Wiley Online Library (1998)Google Scholar
  12. 12.
    Fergus, R., Bernal, H., Weiss, Y., Torralba, A.: Semantic label sharing for learning with many categories. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 762–775. Springer, Heidelberg (2010). Scholar
  13. 13.
    Frome, A., et al.: DeViSE: a deep visual-semantic embedding model. In: Burges, C.J.C., et al. (eds.) Advances in Neural Information Processing Systems 26, pp. 2121–2129. Curran Associates Inc. (2013)Google Scholar
  14. 14.
    Gaussier, E., Goutte, C., Popat, K., Chen, F.: A hierarchical model for clustering and categorising documents. In: Crestani, F., Girolami, M., van Rijsbergen, C.J. (eds.) ECIR 2002. LNCS, vol. 2291, pp. 229–247. Springer, Heidelberg (2002). Scholar
  15. 15.
    He, K., et al.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  16. 16.
    He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). Scholar
  17. 17.
    Hoffman, J., et al.: LSDA: large scale detection through adaptation. arXiv preprint arXiv:1407.5035, 18 July 2014
  18. 18.
    Huo, Y., Ding, M., Zhao, A., Hu, J., Wen, J.-R., Lu, Z.: Zero-shot learning with superclasses. In: Cheng, L., Leung, A.C.S., Ozawa, S. (eds.) ICONIP 2018. LNCS, vol. 11303, pp. 460–472. Springer, Cham (2018). Scholar
  19. 19.
    Hwang, S.J.: Discriminative object categorization with external semantic knowledge. Ph.D. thesis, August 2013Google Scholar
  20. 20.
    Hwang, S.J., Grauman, K., Sha, F.: Learning a tree of metrics with disjoint visual features. In: Shawe-Taylor, J., et al. (eds.) Advances in Neural Information Processing Systems 24, pp. 621–629. Curran Associates Inc. (2011)Google Scholar
  21. 21.
    Hwang, S.J., Sigal, L.: A unified semantic embedding: relating taxonomies and attributes. In: Advances in Neural Information Processing Systems 27, p. 9 (2014)Google Scholar
  22. 22.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference for Learning Representations (ICLR), 22 December 2014. arXiv: 1412.6980v9
  23. 23.
    Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. (IJCV) 123(1), 32–73 (2017)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)Google Scholar
  25. 25.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)Google Scholar
  26. 26.
    Liu, C., et al.: Progressive neural architecture search. arXiv preprint arXiv:1712.00559 (2017)
  27. 27.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Computer Vision and Pattern Recognition (CVPR) (2015). arXiv: 1411.4038v2
  28. 28.
    Marszalek, M., Schmid, C.: Semantic hierarchies for visual object recognition. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7, June 2007Google Scholar
  29. 29.
    Partalas, I., et al.: LSHTC: a benchmark for large-scale text classification. arXiv preprint arXiv:1503.08581 (2015)
  30. 30.
    Rodner, E., Denzler, J.: One-shot learning of object categories using dependent Gaussian processes. In: Goesele, M., Roth, S., Kuijper, A., Schiele, B., Schindler, K. (eds.) DAGM 2010. LNCS, vol. 6376, pp. 232–241. Springer, Heidelberg (2010). Scholar
  31. 31.
    Rohrbach, M., Ebert, S., Schiele, B.: Transfer learning in a transductive setting. In: Burges, C.J.C., et al. (eds.) Advances in Neural Information Processing Systems 26, pp. 46–54. Curran Associates Inc. (2013)Google Scholar
  32. 32.
    Settles, B.: Active learning literature survey. Technical report 1648, University of Wisconsin-Madison (2009)Google Scholar
  33. 33.
    Sharif Razavian, A., et al.: CNN features off-the-shelf: an astounding baseline for recognition. In: Computer Vision and Pattern Recognition Workshops (CVPR-WS) (2014)Google Scholar
  34. 34.
    Srivastava, N., Salakhutdinov, R.R.: Discriminative transfer learning with tree-based priors. In: Burges, C.J.C., et al. (eds.) Advances in Neural Information Processing Systems 26, pp. 2094–2102. Curran Associates Inc. (2013)Google Scholar
  35. 35.
    Sun, C., et al.: Revisiting unreasonable effectiveness of data in deep learning era. In: International Conference on Computer Vision (ICCV), pp. 843–852 (2017)Google Scholar
  36. 36.
    Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: a large data set for nonparametric object and scene recognition. Trans. Pattern Anal. Mach. Intell. (PAMI) 30(11), 1958–1970 (2008)CrossRefGoogle Scholar
  37. 37.
    Van Horn, G., et al.: Building a bird recognition app and large scale dataset with citizen scientists: the fine print in fine-grained dataset collection. In: Computer Vision and Pattern Recognition (CVPR), pp. 595–604 (2015)Google Scholar
  38. 38.
    Van Horn, G., et al.: The iNaturalist challenge 2017 dataset. arXiv preprint arXiv:1707.06642 (2017)
  39. 39.
    Vapnik, V., Vashist, A.: A new learning paradigm: learning using privileged information. Neural Netw. 22(5–6), 544–557 (2009)CrossRefGoogle Scholar
  40. 40.
    Verma, N., et al.: Learning hierarchical similarity metrics. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2280–2287, June 2012Google Scholar
  41. 41.
    Vrandecic, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)CrossRefGoogle Scholar
  42. 42.
    Wu, Q., et al.: Image captioning and visual question answering based on attributes and external knowledge. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1367–1381 (2018)MathSciNetCrossRefGoogle Scholar
  43. 43.
    Yan, Z., et al.: HD-CNN: hierarchical deep convolutional neural networks for large scale visual recognition. In: 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, pp. 2740–2748. IEEE, December 2015Google Scholar
  44. 44.
    Zhang, X., et al.: Embedding label structures for fine-grained feature representation, pp. 1114–1123 (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Computer Vision GroupFriedrich Schiller University JenaJenaGermany
  2. 2.Michael Stifel Center JenaJenaGermany

Personalised recommendations