Advertisement

Weakly Supervised Object Detection in Artworks

  • Nicolas GonthierEmail author
  • Yann Gousseau
  • Said Ladjal
  • Olivier Bonfait
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11130)

Abstract

We propose a method for the weakly supervised detection of objects in paintings. At training time, only image-level annotations are needed. This, combined with the efficiency of our multiple-instance learning method, enables one to learn new classes on-the-fly from globally annotated databases, avoiding the tedious task of manually marking objects. We show on several databases that dropping the instance-level annotations only yields mild performance losses. We also introduce a new database, IconArt, on which we perform detection experiments on classes that could not be learned on photographs, such as Jesus Child or Saint Sebastian. To the best of our knowledge, these are the first experiments dealing with the automatic (and in our case weakly supervised) detection of iconographic elements in paintings. We believe that such a method is of great benefit for helping art historians to explore large digital databases.

Keywords

Weakly supervised detection Transfer learning Art analysis Multiple instance learning 

Notes

Acknowledgements

This work is supported by the “IDI 2017” project funded by the IDEX Paris-Saclay, ANR-11-IDEX-0003-02.

References

  1. 1.
    Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Advances in Neural Information Processing Systems, pp. 577–584 (2003)Google Scholar
  2. 2.
    Aubry, M., Russell, B.C., Sivic, J.: Painting-to-3D model alignment via discriminative visual elements. ACM Trans. Graph. (ToG) 33(2), 14 (2014)CrossRefGoogle Scholar
  3. 3.
    Bianco, S., Mazzini, D., Schettini, R.: Deep multibranch neural network for painting categorization. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds.) ICIAP 2017. LNCS, vol. 10484, pp. 414–423. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-68560-1_37CrossRefGoogle Scholar
  4. 4.
    Bilen, H., Vedaldi, A.: Weakly supervised deep detection networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  5. 5.
    de Bosio, S.: Master and judge: the mirror as dialogical device in Italian renaissance art theory. In: Zimmermann, M. (ed.) Dialogical Imaginations: Debating Aisthesis as Social Perception. Diaphanes (2017)Google Scholar
  6. 6.
    Carbonneau, M.A., Cheplygina, V., Granger, E., Gagnon, G.: Multiple instance learning: a survey of problem characteristics and applications. Pattern Recogn. 77, 329–353 (2016).  https://doi.org/10.1016/j.patcog.2017.10.009CrossRefGoogle Scholar
  7. 7.
    Chen, X., Gupta, A.: An implementation of faster RCNN with study for region sampling. arXiv:1702.02138 [cs], February 2017
  8. 8.
    Cinbis, R.G., Verbeek, J., Schmid, C.: Weakly supervised object localization with multi-fold multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 39(1), 189–203 (2016).  https://doi.org/10.1109/TPAMI.2016.2535231CrossRefGoogle Scholar
  9. 9.
    Crowley, E., Zisserman, A.: The state of the art: object retrieval in paintings using discriminative regions. In: BMVC (2014)Google Scholar
  10. 10.
    Crowley, E.J., Zisserman, A.: In search of art. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8925, pp. 54–70. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-16178-5_4CrossRefGoogle Scholar
  11. 11.
    Crowley, E.J., Zisserman, A.: The art of detection. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9913, pp. 721–737. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46604-0_50CrossRefGoogle Scholar
  12. 12.
    Del Bimbo, A., Pala, P.: Visual image retrieval by elastic matching of user sketches. IEEE Trans. Pattern Anal. Mach. Intell. 19(2), 121–132 (1997)CrossRefGoogle Scholar
  13. 13.
    Dietterich, T.G., Lathrop, R.H., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 89(1–2), 31–71 (1997)CrossRefGoogle Scholar
  14. 14.
    Donahue, J., et al.: DeCAF: a deep convolutional activation feature for generic visual recognition. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning. Proceedings of Machine Learning Research, PMLR, Bejing, China, vol. 32, pp. 647–655, 22–24 June 2014. http://proceedings.mlr.press/v32/donahue14.html
  15. 15.
    Durand, T., Mordan, T., Thome, N., Cord, M.: WILDCAT: weakly supervised learning of deep ConvNets for image classification, pointwise localization and segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017). IEEE, Honolulu, July 2017Google Scholar
  16. 16.
    Europeana: collections Europeana (2018). https://www.europeana.eu/portal/en
  17. 17.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes challenge 2007 (VOC2007) results (2007). http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html
  18. 18.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  19. 19.
    Florea, C., Badea, M., Florea, L., Vertan, C.: Domain transfer for delving into deep networks capacity to de-abstract art. In: Sharma, P., Bianchi, F.M. (eds.) SCIA 2017. LNCS, vol. 10269, pp. 337–349. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-59126-1_28CrossRefGoogle Scholar
  20. 20.
    Gasparro, D.: Dal lato dell’immagine: destra e sinistra nelle descrizioni di Bellori e altri. Ed. Belvedere (2008)Google Scholar
  21. 21.
    Gehler, P.V., Chapelle, O.: Deterministic annealing for multiple-instance learning. In: Artificial Intelligence and Statistics, pp. 123–130 (2007)Google Scholar
  22. 22.
    Ginosar, S., Haas, D., Brown, T., Malik, J.: Detecting people in cubist art. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8925, pp. 101–116. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-16178-5_7CrossRefGoogle Scholar
  23. 23.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587, June 2014.  https://doi.org/10.1109/CVPR.2014.81
  24. 24.
    Girshick, R.: Fast R-CNN. In: International Conference on Computer Vision (ICCV) (2015)Google Scholar
  25. 25.
    Hall, P., Cai, H., Wu, Q., Corradi, T.: Cross-depiction problem: recognition and synthesis of photographs and artwork. Comput. Vis. Media 1(2), 91–103 (2015)CrossRefGoogle Scholar
  26. 26.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  27. 27.
    Iconclass: Home—Iconclass (2018). http://www.iconclass.nl/home
  28. 28.
    Inoue, N., Furuta, R., Yamasaki, T., Aizawa, K.: Cross-domain weakly-supervised object detection through progressive domain adaptation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018). IEEE (2018)Google Scholar
  29. 29.
    Joulin, A., Bach, F.: A convex relaxation for weakly supervised classifiers. arXiv preprint arXiv:1206.6413 (2012)
  30. 30.
    Kornblith, S., Shlens, J., Le, Q.V.: Do better ImageNet models transfer better? arXiv:1805.08974 [cs, stat], May 2018
  31. 31.
    Lecoutre, A., Negrevergne, B., Yger, F.: Recognizing art style automatically in painting with deep learning. In: ACML, pp. 1–17 (2017)Google Scholar
  32. 32.
    Li, D., Yang, Y., Song, Y.Z., Hospedales, T.M.: Deeper, broader and artier domain generalization. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5543–5551, October 2017.  https://doi.org/10.1109/ICCV.2017.591
  33. 33.
    Li, D., Huang, J.B., Li, Y., Wang, S., Yang, M.H.: Weakly supervised object localization with progressive domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3512–3520 (2016)Google Scholar
  34. 34.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  35. 35.
    Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_2CrossRefGoogle Scholar
  36. 36.
    Mao, H., Cheung, M., She, J.: DeepArt: learning joint representations of visual arts. In: Proceedings of the 2017 ACM on Multimedia Conference, pp. 1183–1191. ACM Press (2017).  https://doi.org/10.1145/3123266.3123405
  37. 37.
    Mensink, T., Van Gemert, J.: The Rijksmuseum challenge: museum-centered visual recognition. In: Proceedings of International Conference on Multimedia Retrieval, p. 451. ACM (2014)Google Scholar
  38. 38.
    MET: image and data resources | the metropolitan museum of art (2018). https://www.metmuseum.org/about-the-met/policies-and-documents/image-resources
  39. 39.
    Pharos consortium: PHAROS: the international consortium of photo archives (2018). http://pharosartresearch.org/
  40. 40.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)Google Scholar
  41. 41.
    Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017). IEEE (2017)Google Scholar
  42. 42.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 91–99. Curran Associates, Inc. (2015). http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf
  43. 43.
    Réunion des Musées Nationaux-Grand Palais: Images d’Art (2018). https://art.rmngp.fr/en
  44. 44.
    Rijksmuseum: online collection catalogue - research (2018). https://www.rijksmuseum.nl/en/research/online-collection-catalogue
  45. 45.
    Seguin, B., Striolo, C., diLenardo, I., Kaplan, F.: Visual link retrieval in a database of paintings. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9913, pp. 753–767. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46604-0_52CrossRefGoogle Scholar
  46. 46.
    Shrivastava, A., Malisiewicz, T., Gupta, A., Efros, A.A.: Data-driven visual similarity for cross-domain image matching. ACM Trans. Graph. (ToG) 30(6), 154 (2011)CrossRefGoogle Scholar
  47. 47.
    Song, H.O., Girshick, R., Jegelka, S., Mairal, J., Harchaoui, Z., Darrell, T.: On learning to localize objects with minimal supervision. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning. Proceedings of Machine Learning Research, PMLR, Bejing, China, pp. 1611–1619, No. 2, 22–24 June 2014, http://proceedings.mlr.press/v32/songb14.html
  48. 48.
    Strezoski, G., Worring, M.: OmniArt: multi-task deep learning for artistic data analysis. arXiv:1708.00684 [cs], August 2017
  49. 49.
    Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI, p. 4 (2017)Google Scholar
  50. 50.
    Tang, P., Wang, X., Bai, X., Liu, W.: Multiple instance detection network with online instance classifier refinement. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3059–3067 (2017)Google Scholar
  51. 51.
    van Noord, N., Postma, E.: Learning scale-variant and scale-invariant features for deep image classification. Pattern Recogn. 61, 583–592 (2017).  https://doi.org/10.1016/j.patcog.2016.06.005CrossRefGoogle Scholar
  52. 52.
    Westlake, N., Cai, H., Hall, P.: Detecting people in artwork with CNNs. In: ECCV Workshops (2016)Google Scholar
  53. 53.
    Wilber, M.J., Fang, C., Jin, H., Hertzmann, A., Collomosse, J., Belongie, S.: BAM! The behance artistic media dataset for recognition beyond photography. In: IEEE International Conference on Computer Vision (ICCV). IEEE (2017)Google Scholar
  54. 54.
    Wu, Q., Cai, H., Hall, P.: Learning graphs to model visual objects across different depictive styles. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 313–328. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10584-0_21CrossRefGoogle Scholar
  55. 55.
    Yin, R., Monson, E., Honig, E., Daubechies, I., Maggioni, M.: Object recognition in art drawings: transfer of a neural network. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2299–2303. IEEE (2016)Google Scholar
  56. 56.
    Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: 2017 IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.LTCI, Telecom ParisTech, Universite Paris-SaclayParisFrance
  2. 2.Universite de Bourgogne, UMR CNRS UB 5605DijonFrance

Personalised recommendations