Object instance identification with fully convolutional networks

Portaz, Maxime; Kohl, Matthias; Chevallet, Jean-Pierre; Quénot, Georges; Mulhem, Philippe

doi:10.1007/s11042-018-5798-7

Object instance identification with fully convolutional networks

Published: 02 March 2018

Volume 78, pages 2747–2764, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Maxime Portaz ORCID: orcid.org/0000-0002-0251-2574¹,
Matthias Kohl¹,
Jean-Pierre Chevallet¹,
Georges Quénot¹ &
…
Philippe Mulhem¹

337 Accesses
4 Citations
Explore all metrics

Abstract

This paper presents a novel approach for instance search and object detection, applied to museum visits. This approach relies on fully convolutional networks (FCN) to obtain region proposals and object representation. Our proposal consists in four steps: a classical convolutional network is first fined-tuned as classifier over the dataset, next we build from this network a second one, fully convolutional, trained as classifier, that focuses on all regions of the corpus images, this network is used in a third step to define image global descriptors in a siamese architecture using triplets of images, and eventually these descriptors are then used for retrieval using classical scalar product between vectors. Our framework has the following features: i) it is well suited for small datasets with low objects variability as we use transfer learning, ii) it does not require any additional component in the network as we rely on classical (i.e. not fully convolutional) and fully convolutional networks, and iii) it does not need region annotations in the dataset as it deals with regions in a unsupervised way. Through multiple experiments on two image datasets taken from museum visits, we detail the effect of each parameter, and we show that the descriptors obtained using our proposed network outperform those from previous state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

https://www.gerhard-richter.com/en/art/microsites/4900-colours

References

Arandjelović R, Zisserman A (2012) Three things everyone should know to improve object retrieval. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2911–2918. http://ieeexplore.ieee.org/abstract/document/6248018/
Arandjelovic R, Gronat P, Torii A, Pajdla T, Sivic J (2016) Netvlad: Cnn architecture for weakly supervised place recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5297–5307
Babenko A, Lempitsky V (2015) Aggregating local deep features for image retrieval. In: Proceedings of the IEEE international conference on computer vision, pp 1269–1277
Babenko A, Slesarev A, Chigorin A, Lempitsky V (2014) Neural codes for image retrieval. In: European conference on computer vision. Springer, pp 584–599
Barroso LA, Dean J, Holzle U (2003) Web search for a planet: the google cluster architecture. IEEE Micro 23(2):22–28
Article Google Scholar
Bromley J, Guyon I, LeCun Y, Säckinger E, Shah R (1994) Signature verification using a “siamese” time delay neural network. In: Advances in neural information processing systems, pp 737–744
Fischer P, Dosovitskiy A, Brox T (2014) Descriptor matching with convolutional neural networks: a comparison to sift. arXiv:1405.5769
Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: European conference on computer vision. Springer, pp 392–407
Gordo A, Almazán J, Revaud J, Larlus D (2016) Deep image retrieval: learning global representations for image search. In: Computer vision – ECCV 2016. Springer, Cham, pp 241–257
Gordo A, Almazan J, Revaud J, Larlus D (2016) End-to-end learning of deep visual representations for image retrieval. arXiv:1610.07940
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: European conference on computer vision. Springer, Berlin, Heidelberg, pp 304–317
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25. Curran Associates, Inc., pp 1097–1105. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article MathSciNet Google Scholar
Paulin M, Douze M, Harchaoui Z, Mairal J, Perronin F, Schmid C (2015) Local convolutional features with unsupervised training for image retrieval. In: Proceedings of the IEEE international conference on computer vision, pp 91–99
Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: IEEE conference on computer vision and pattern recognition, 2007. CVPR’07. IEEE, pp 1–8
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: IEEE conference on computer vision and pattern recognition, 2007. CVPR’07. IEEE, pp 1–8. http://ieeexplore.ieee.org/abstract/document/4270197/
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: improving particular object retrieval in large scale image databases. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, pp 1–8. http://ieeexplore.ieee.org/abstract/document/4587635/
Portaz M, Poignant J, Budnik M, Mulhem P, Chevallet J, Goeuriot L (2017) Construction et évaluation d’un corpus pour la recherche d’instances d’images muséales. In: COnférence en recherche d’informations et applications - CORIA 2017, 14th French information retrieval conference. Marseille, France, March 29–31, 2017. Proceedings, Marseille, France, March 29–31, 2017, pp 17–34
Radenović F, Tolias G, Chum O (2016) CNN image retrieval learns from BoW: unsupervised fine-tuning with hard examples. In: Computer vision – ECCV 2016. Springer, Cham, pp 3–20
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Salvador A, Giro-i Nieto X, Marques F, Satoh S (2016) Faster r-CNN Features for Instance Search. arXiv:1604.08893 [cs]
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823
Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 806–813
Simo-Serra E, Trulls E, Ferraz L, Kokkinos I, Moreno-Noguer F (2014) Fracking deep convolutional image descriptors. arXiv:1412.6537
Sivic J, Zisserman A et al. (2003) Video google: a text retrieval approach to object matching in videos. In: iccv, vol 2, pp 1470–1477
Tolias G, Sicre R, Jégou H (2015) Particular object retrieval with integral max-pooling of CNN activations. arXiv:1511.05879 [cs]
Turcot P, Lowe DG (2009) Better matching with fewer features: the selection of useful features in large database recognition problems. In: 2009 IEEE 12th international conference on computer vision workshops (ICCV Workshops). IEEE, pp 2109–2116. http://ieeexplore.ieee.org/abstract/document/5457541/
Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? arXiv:1411.1792 [cs]
Zbontar J, LeCun Y (2016) Stereo matching by training a convolutional neural network to compare image patches. J Mach Learn Res 17(1–32):2
MATH Google Scholar

Download references

Author information

Authors and Affiliations

CNRS, Grenoble-INP, LIG, University Grenoble Alpes, 38000, Grenoble, France
Maxime Portaz, Matthias Kohl, Jean-Pierre Chevallet, Georges Quénot & Philippe Mulhem

Authors

Maxime Portaz
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Kohl
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Pierre Chevallet
View author publications
You can also search for this author in PubMed Google Scholar
Georges Quénot
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Mulhem
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maxime Portaz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Portaz, M., Kohl, M., Chevallet, JP. et al. Object instance identification with fully convolutional networks. Multimed Tools Appl 78, 2747–2764 (2019). https://doi.org/10.1007/s11042-018-5798-7

Download citation

Received: 03 November 2017
Revised: 23 January 2018
Accepted: 13 February 2018
Published: 02 March 2018
Issue Date: February 2019
DOI: https://doi.org/10.1007/s11042-018-5798-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object instance identification with fully convolutional networks

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Object instance identification with fully convolutional networks

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation