How to Read Paintings: Semantic Art Understanding with Multi-modal Retrieval

  • Noa GarciaEmail author
  • George Vogiatzis
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11130)


Automatic art analysis has been mostly focused on classifying artworks into different artistic styles. However, understanding an artistic representation involves more complex processes, such as identifying the elements in the scene or recognizing author influences. We present SemArt, a multi-modal dataset for semantic art understanding. SemArt is a collection of fine-art painting images in which each image is associated to a number of attributes and a textual artistic comment, such as those that appear in art catalogues or museum collections. To evaluate semantic art understanding, we envisage the Text2Art challenge, a multi-modal retrieval task where relevant paintings are retrieved according to an artistic text, and vice versa. We also propose several models for encoding visual and textual artistic representations into a common semantic space. Our best approach is able to find the correct image within the top 10 ranked images in the 45.5% of the test samples. Moreover, our models show remarkable levels of art understanding when compared against human evaluation.


Semantic art understanding Art analysis Image-text retrieval Multi-modal retrieval 


  1. 1.
    Bar, Y., Levy, N., Wolf, L.: Classification of artistic styles using binarized features derived from a deep neural network. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8925, pp. 71–84. Springer, Cham (2015). Scholar
  2. 2.
    Carneiro, G., da Silva, N.P., Del Bue, A., Costeira, J.P.: Artistic image classification: an analysis on the PRINTART database. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 143–157. Springer, Heidelberg (2012). Scholar
  3. 3.
    Crowley, E., Zisserman, A.: The state of the art: object retrieval in paintings using discriminative regions. In: BMVC (2014)Google Scholar
  4. 4.
    Crowley, E.J., Parkhi, O.M., Zisserman, A.: Face painting: querying art with photos. In: BMVC, pp. 65.1–65.13 (2015)Google Scholar
  5. 5.
    Crowley, E.J., Zisserman, A.: In search of art. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8925, pp. 54–70. Springer, Cham (2015). Scholar
  6. 6.
    Crowley, E.J., Zisserman, A.: The art of detection. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9913, pp. 721–737. Springer, Cham (2016). Scholar
  7. 7.
    Gong, Y., Ke, Q., Isard, M., Lazebnik, S.: A multi-view embedding space for modeling internet images, tags, and their semantics. Int. J. Comput. Vis. 106(2), 210–233 (2014)CrossRefGoogle Scholar
  8. 8.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  9. 9.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  10. 10.
    Johnson, C.R., et al.: Image processing for artist identification. IEEE Signal Process. Mag. 25(4) (2008)CrossRefGoogle Scholar
  11. 11.
    Karayev, S., et al.: Recognizing image style. In: BMVC (2014)Google Scholar
  12. 12.
    Khan, F.S., Beigpour, S., Van de Weijer, J., Felsberg, M.: Painting-91: a large scale database for computational painting categorization. Mach. Vis. Appl. 25, 1385–1397 (2014)CrossRefGoogle Scholar
  13. 13.
    Kiros, R., et al.: Skip-thought vectors. In: Advances in Neural Information Processing Systems (2015)Google Scholar
  14. 14.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  15. 15.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). Scholar
  16. 16.
    Ma, D., et al.: From part to whole: who is behind the painting? In: Proceedings of the 2017 ACM on Multimedia Conference. ACM (2017)Google Scholar
  17. 17.
    Manning, C.D., Schütze, H.: Foundations of statistical natural language processing. Inf. Retrieval 4, 80–81 (2001)CrossRefGoogle Scholar
  18. 18.
    Mao, H., Cheung, M., She, J.: DeepArt: learning joint representations of visual arts. In: ACM on Multimedia Conference (2017)Google Scholar
  19. 19.
    Mensink, T., Van Gemert, J.: The Rijksmuseum challenge: museum-centered visual recognition. In: ICMR (2014)Google Scholar
  20. 20.
    Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Saleh, B., Elgammal, A.M.: Large-scale classification of fine-art paintings: learning the right metric on the right feature. CoRR (2015)Google Scholar
  22. 22.
    Seguin, B., Striolo, C., diLenardo, I., Kaplan, F.: Visual link retrieval in a database of paintings. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9913, pp. 753–767. Springer, Cham (2016). Scholar
  23. 23.
    Shamir, L., Macura, T., Orlov, N., Eckley, D.M., Goldberg, I.G.: Impressionism, expressionism, surrealism: automated recognition of painters and schools of art. ACM Trans. Appl. Percept. 7, 8 (2010)CrossRefGoogle Scholar
  24. 24.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)Google Scholar
  25. 25.
    Tan, W.R., Chan, C.S., Aguirre, H.E., Tanaka, K.: Ceci n’est pas une pipe: a deep convolutional network for fine-art paintings classification. In: ICIP (2016)Google Scholar
  26. 26.
    Tolias, G., Sicre, R., Jégou, H.: Particular object retrieval with integral max-pooling of CNN activations. In: International Conference on Learning Representations (2015)Google Scholar
  27. 27.
    Wang, L., Li, Y., Huang, J., Lazebnik, S.: Learning two-branch neural networks for image-text matching tasks. IEEE Trans. Pattern Anal. Mach. Intell. (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Aston UniversityBirminghamUK

Personalised recommendations