Skip to main content
Log in

Object instance identification with fully convolutional networks

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper presents a novel approach for instance search and object detection, applied to museum visits. This approach relies on fully convolutional networks (FCN) to obtain region proposals and object representation. Our proposal consists in four steps: a classical convolutional network is first fined-tuned as classifier over the dataset, next we build from this network a second one, fully convolutional, trained as classifier, that focuses on all regions of the corpus images, this network is used in a third step to define image global descriptors in a siamese architecture using triplets of images, and eventually these descriptors are then used for retrieval using classical scalar product between vectors. Our framework has the following features: i) it is well suited for small datasets with low objects variability as we use transfer learning, ii) it does not require any additional component in the network as we rely on classical (i.e. not fully convolutional) and fully convolutional networks, and iii) it does not need region annotations in the dataset as it deals with regions in a unsupervised way. Through multiple experiments on two image datasets taken from museum visits, we detail the effect of each parameter, and we show that the descriptors obtained using our proposed network outperform those from previous state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://www.gerhard-richter.com/en/art/microsites/4900-colours

References

  1. Arandjelović R, Zisserman A (2012) Three things everyone should know to improve object retrieval. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2911–2918. http://ieeexplore.ieee.org/abstract/document/6248018/

  2. Arandjelovic R, Gronat P, Torii A, Pajdla T, Sivic J (2016) Netvlad: Cnn architecture for weakly supervised place recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5297–5307

  3. Babenko A, Lempitsky V (2015) Aggregating local deep features for image retrieval. In: Proceedings of the IEEE international conference on computer vision, pp 1269–1277

  4. Babenko A, Slesarev A, Chigorin A, Lempitsky V (2014) Neural codes for image retrieval. In: European conference on computer vision. Springer, pp 584–599

  5. Barroso LA, Dean J, Holzle U (2003) Web search for a planet: the google cluster architecture. IEEE Micro 23(2):22–28

    Article  Google Scholar 

  6. Bromley J, Guyon I, LeCun Y, Säckinger E, Shah R (1994) Signature verification using a “siamese” time delay neural network. In: Advances in neural information processing systems, pp 737–744

  7. Fischer P, Dosovitskiy A, Brox T (2014) Descriptor matching with convolutional neural networks: a comparison to sift. arXiv:1405.5769

  8. Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: European conference on computer vision. Springer, pp 392–407

  9. Gordo A, Almazán J, Revaud J, Larlus D (2016) Deep image retrieval: learning global representations for image search. In: Computer vision – ECCV 2016. Springer, Cham, pp 241–257

  10. Gordo A, Almazan J, Revaud J, Larlus D (2016) End-to-end learning of deep visual representations for image retrieval. arXiv:1610.07940

  11. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  12. Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: European conference on computer vision. Springer, Berlin, Heidelberg, pp 304–317

  13. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25. Curran Associates, Inc., pp 1097–1105. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

  14. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  15. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  MathSciNet  Google Scholar 

  16. Paulin M, Douze M, Harchaoui Z, Mairal J, Perronin F, Schmid C (2015) Local convolutional features with unsupervised training for image retrieval. In: Proceedings of the IEEE international conference on computer vision, pp 91–99

  17. Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: IEEE conference on computer vision and pattern recognition, 2007. CVPR’07. IEEE, pp 1–8

  18. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: IEEE conference on computer vision and pattern recognition, 2007. CVPR’07. IEEE, pp 1–8. http://ieeexplore.ieee.org/abstract/document/4270197/

  19. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: improving particular object retrieval in large scale image databases. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, pp 1–8. http://ieeexplore.ieee.org/abstract/document/4587635/

  20. Portaz M, Poignant J, Budnik M, Mulhem P, Chevallet J, Goeuriot L (2017) Construction et évaluation d’un corpus pour la recherche d’instances d’images muséales. In: COnférence en recherche d’informations et applications - CORIA 2017, 14th French information retrieval conference. Marseille, France, March 29–31, 2017. Proceedings, Marseille, France, March 29–31, 2017, pp 17–34

  21. Radenović F, Tolias G, Chum O (2016) CNN image retrieval learns from BoW: unsupervised fine-tuning with hard examples. In: Computer vision – ECCV 2016. Springer, Cham, pp 3–20

  22. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  23. Salvador A, Giro-i Nieto X, Marques F, Satoh S (2016) Faster r-CNN Features for Instance Search. arXiv:1604.08893 [cs]

  24. Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823

  25. Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 806–813

  26. Simo-Serra E, Trulls E, Ferraz L, Kokkinos I, Moreno-Noguer F (2014) Fracking deep convolutional image descriptors. arXiv:1412.6537

  27. Sivic J, Zisserman A et al. (2003) Video google: a text retrieval approach to object matching in videos. In: iccv, vol 2, pp 1470–1477

  28. Tolias G, Sicre R, Jégou H (2015) Particular object retrieval with integral max-pooling of CNN activations. arXiv:1511.05879 [cs]

  29. Turcot P, Lowe DG (2009) Better matching with fewer features: the selection of useful features in large database recognition problems. In: 2009 IEEE 12th international conference on computer vision workshops (ICCV Workshops). IEEE, pp 2109–2116. http://ieeexplore.ieee.org/abstract/document/5457541/

  30. Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? arXiv:1411.1792 [cs]

  31. Zbontar J, LeCun Y (2016) Stereo matching by training a convolutional neural network to compare image patches. J Mach Learn Res 17(1–32):2

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maxime Portaz.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Portaz, M., Kohl, M., Chevallet, JP. et al. Object instance identification with fully convolutional networks. Multimed Tools Appl 78, 2747–2764 (2019). https://doi.org/10.1007/s11042-018-5798-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5798-7

Keywords

Navigation