Abstract
In the past 5 years, deep learning has achieved remarkable breakthroughs, mainly attributed to the success of convolutional neural networks (CNN) on vision applications like ImageNet classification. In this chapter, we are interested in deep learning-based descriptors for object instance search in images. Specifically, we propose to tackle some practical issues of existing CNN models, with a focus on resource-efficient yet effective deep descriptors extracted from CNN. (1) How to achieve compact image representations (e.g., hundreds of bits) from deep neural networks in an end-to-end manner? (2) How to encode scale/rotation invariances into deep CNN architecture? To address the issues, our approach has two novel contributions. First, we propose Restricted Boltzmann Machine with a novel batch-level regularization scheme specifically designed for the purpose of descriptor hashing (RBMH), which is able to match the performance of the uncompressed descriptor with tiny 32–256 bit hashes. Second, inspired from invariance theory, we propose Nested Invariance Pooling (NIP), a method for computing group-invariant transformations with feed-forward neural networks. We specifically incorporate scale, translation, and rotation invariances, but the scheme can be extended to any arbitrary sets of transformations. A thorough empirical evaluation with state of the art shows that the results obtained both with the NIP descriptors and the NIP+RBMH hashes are consistently outstanding across a wide range of datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
CDVS Patches (2013). http://blackhole1.stanford.edu/vijayc/cdvs_patches.tar
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)
Anselmi, F., Leibo, J.Z., Rosasco, L., Mutch, J., Tacchetti, A., Poggio, T.: Magic materials: a theory of deep hierarchical architectures for learning sensory representations. CBCL paper (2013)
Anselmi, F., Leibo, J.Z., Rosasco, L., Mutch, J., Tacchetti, A., Poggio, T.: Unsupervised learning of invariant representations in hierarchical architectures. arXiv preprint arXiv:1311.4158 (2013)
Anselmi, F., Poggio, T.: Representation learning in sensory cortex: a theory. Tech. rep., Center for Brains, Minds and Machines (CBMM) (2014)
Arandjelovic, R., Zisserman, A.: All about vlad. In: Computer Vision and Pattern Recognition (CVPR), pp. 1578–1585 (2013)
Azizpour, H., Razavian, A., Sullivan, J., Maki, A., Carlsson, S.: From generic to specific deep representations for visual recognition. In: Computer Vision and Pattern Recognition Workshops (CVPR), pp. 36–45 (2015)
Babenko, A., Lempitsky, V.: Aggregating local deep features for image retrieval. In: International Conference on Computer Vision (ICCV), pp. 1269–1277 (2015)
Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: European Conference on Computer Vision (ECCV), pp. 584–599. Springer (2014)
Bromley, J., Guyon, I., LeCun, Y., Sackinger, E., Shah, R.: Signature verification using a Siamese time delay neural network. In: J. Cowan, G. Tesauro (eds.) Neural Information Processing Systems (NIPS), vol. 6. Morgan Kaufmann (1993)
Chandrasekhar, V., Lin, J., Morère, O., Goh, H., Veillard, A.: A practical guide to CNNs and Fisher vectors for image instance retrieval. Signal Processing (SIGPRO) (2016)
Chandrasekhar, V., Lin, J., Morère, O., Veillard, A., Goh, H.: Compact global descriptors for visual search. In: Data Compression Conference (DCC), pp. 333–342. IEEE (2015)
Chandrasekhar, V., Makar, M., Takacs, G., Chen, D., Tsai, S.S., Cheung, N.M., Grzeszczuk, R., Reznik, Y., Girod, B.: Survey of sift compression schemes. In: Mobile Multimedia Processing Workshop (MMP), pp. 35–40. Citeseer (2010)
Chandrasekhar, V., Takacs, G., Chen, D., Tsai, S., Grzeszczuk, R., Girod, B.: Chog: Compressed histogram of gradients a low bit-rate feature descriptor. In: Computer Vision and Pattern Recognition (CVPR), pp. 2504–2511. IEEE (2009)
Chen, D., Tsai, S., Chandrasekhar, V., Takacs, G., Chen, H., Vedantham, R., Grzeszczuk, R., Girod, B.: Residual enhanced visual vectors for on-device image matching. Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR) pp. 850–854 (2011). URL http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6190128
Chen, D., Tsai, S., Chandrasekhar, V., Takacs, G., Vedantham, R., Grzeszczuk, R., Girod, B.: Residual enhanced visual vector as a compact signature for mobile visual search. Signal Processing (SIGPRO) 93(8), 2316–2327 (2013)
Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 539–546. IEEE (2005)
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Annual Symposium on Computational Geometry (SoCG), pp. 253–262. ACM (2004)
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24(6), 381–395 (1981)
Girshick, R.: Fast r-cnn. In: International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Computer Vision and Pattern Recognition (CVPR), pp. 580–587. IEEE (2014)
Goh, H., Thome, N., Cord, M., Lim, J.H.: Unsupervised and supervised visual codes with restricted Boltzmann machines. In: European Conference on Computer Vision (ECCV), pp. 298–311. Springer (2012)
Gong, Y., Kumar, S., Rowley, H., Lazebnik, S.: Learning binary codes for high-dimensional data using bilinear projections. In: Computer Vision and Pattern Recognition (CVPR), pp. 484–491 (2013)
Gong, Y., Lazebnik, S.: Iterative quantization: A procrustean approach to learning binary codes. In: Computer Vision and Pattern Recognition (CVPR), pp. 817–824. IEEE (2011)
Gong, Y., Lazebnik, S.: Iterative quantization: A procrustean approach to learning binary codes. Computer Vision and Pattern Recognition (CVPR) pp. 817–824 (2011)
Gong, Y., Wang, L., Guo, R., Lazebnik, S.: Multi-scale orderless pooling of deep convolutional activation features. In: European Conference on Computer Vision (ECCV), pp. 392–407. Springer (2014)
Gordo, A., Perronnin, F., Gong, Y., Lazebnik, S.: Asymmetric distances for binary embeddings. Pattern Analysis and Machine Intelligence (PAMI) 36(1), 33–47 (2014)
Gordo, A., Rodríguez-Serrano, J.A., Perronnin, F., Valveny, E.: Leveraging category-level labels for instance-level image retrieval. In: Computer Vision and Pattern Recognition (CVPR), pp. 3045–3052. IEEE (2012)
Grauman, K., Fergus, R.: Learning binary hash codes for large-scale image search. In: Machine Learning for Computer Vision, pp. 49–87. Springer (2013)
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 1735–1742. IEEE (2006)
Heo, J.P., Lee, Y., He, J., Chang, S.F., Yoon, S.E.: Spherical hashing. In: Computer Vision and Pattern Recognition (CVPR), pp. 2957–2964. IEEE (2012)
Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Computation (NC) 14(8), 1771–1800 (2002)
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Computation (NC) 18(7), 1527–1554 (2006)
Hinton, G.E., Sejnowski, T.J.: Learning and relearning in Boltzmann machines. Parallel distributed processing: Explorations in the microstructure of cognition 1, 282–317 (1986)
Huiskes, M.J., Thomee, B., Lew, M.S.: New trends and ideas in visual concept detection: the mir flickr retrieval evaluation initiative. In: Multimedia Information Retrieval (MIR), pp. 527–536. ACM (2010)
Jégou, H., Chum, O.: Negative evidences and co-occurences in image retrieval: The benefit of pca and whitening. In: European Conference on Computer Vision (ECCV), pp. 774–787. Springer (2012)
Jégou, H., Douze, M., Schmid, C.: On the burstiness of visual elements. In: Computer Vision and Pattern Recognition (CVPR), pp. 1169–1176. IEEE (2009)
Jégou, H., Perronnin, F., Douze, M., Sanchez, J., Perez, P., Schmid, C.: Aggregating local image descriptors into compact codes. Pattern Analysis and Machine Intelligence (PAMI) 34(9), 1704–1716 (2012)
Jégou, H., Zisserman, A.: Triangulation embedding and democratic aggregation for image search. In: Computer Vision and Pattern Recognition (CVPR), pp. 3310–3317. IEEE (2014)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: Conference on Multimedia (ACMMM), pp. 675–678. ACM (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Neural Information Processing Systems (NIPS) pp. 1–9 (2012). URL https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
Kulis, B., Grauman, K.: Kernelized locality-sensitive hashing for scalable image search. In: International Conference on Computer Vision (ICCV), pp. 2130–2137. IEEE (2009)
Liao, Q., Leibo, J.Z., Poggio, T.: Learning invariant representations and applications to face verification. In: Neural Information Processing Systems (NIPS), pp. 3057–3065 (2013)
Lin, J., Duan, L.Y., Huang, T., Gao, W.: Robust Fisher codes for large scale image retrieval. In: International Conference on Acoustics and Signal Processing (ICASSP) (2013)
Lin, J., Duan, L.y., Wang, S., Bai, Y., Lou, Y., Chandrasekhar, V., Huang, T., Kot, A., Gao, W.: Hnip: Compact deep invariant representations for video matching, localization and retrieval. Transactions on Multimedia (TMM) (2017)
Lin, J., Morère, O., Chandrasekhar, V., Veillard, A., Goh, H.: Co-sparsity regularized deep hashing for image instance retrieval. In: International Conference on Image Processing (ICIP). IEEE (2016)
Lin, J., Morère, O., Petta, J., Chandrasekhar, V., Veillard, A.: Tiny descriptors for image retrieval with unsupervised triplet hashing. In: Data Compression Conference (DCC) (2016)
Lin, J., Morère, O., Veillard, A., Duan, L.Y., Goh, H., Chandrasekhar, V.: Deephash for image instance retrieval: Getting regularization, depth and fine-tuning right. In: ACM International Conference on Multimedia Retrieval (2017)
Liu, W., Wang, J., Ji, R., Jiang, Y.G., Chang, S.F.: Supervised hashing with kernels. In: Computer Vision and Pattern Recognition (CVPR), pp. 2074–2081. IEEE (2012)
Lou, Y., Bai, Y., Lin, J., Wang, S., Chen, J., Chandrasekhar, V., Duan, L.y., Tiejun, H., Kot, A., Gao, W.: Compact deep invariant descriptors for video retrieval. In: Data Compression Conference (DCC) (2017)
Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (IJCV) 60(2), 91–110 (2004)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (IJCV) 60(2), 91–110 (2004). doi: 10.1023/B:VISI.0000029664.99615.94. URL http://link.springer.com/10.1023/B:VISI.0000029664.99615.94
Morère, O., Lin, J., Veillard, A., Duan, L.y., Chandrasekhar, V., Poggio, T.: Nested invariance pooling and rbm hashing for image instance retrieval. In: ACM International Conference on Multimedia Retrieval (2017)
Morère, O., Veillard, A., Lin, J., Petta, J., Chandrasekhar, V., Poggio, T.: Group invariant deep representations for image instance retrieval. In: AAAI Symposium on Science of Intelligence (2017)
Nair, V., Hinton, G.E.: 3D object recognition with deep belief nets. In: Neural Information Processing Systems (NIPS), pp. 1339–1347 (2009)
Norouzi, M., Blei, D.M.: Minimal loss hashing for compact binary codes. In: International Conference on Machine Learning (ICML), pp. 353–360 (2011)
Perronnin, F., Larlus, D.: Fisher vectors meet neural networks: A hybrid classification architecture. In: Computer Vision and Pattern Recognition (CVPR), pp. 3743–3752 (2015)
Perronnin, F., Liu, Y., Sánchez, J., Poirier, H.: Large-scale image retrieval with compressed fisher vectors. In: Computer Vision and Pattern Recognition (CVPR), pp. 3384–3391. IEEE (2010)
Raginsky, M., Lazebnik, S.: Locality-sensitive binary codes from shift-invariant kernels. In: Neural Information Processing Systems (NIPS), pp. 1509–1517 (2009)
Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: Computer Vision and Pattern Recognition Workshop (CVPR), pp. 806–813 (2014)
Salakhutdinov, R., Mnih, A., Hinton, G.: Restricted Boltzmann machines for collaborative filtering. In: International Conference on Machine Learning (ICML), pp. 791–798. ACM (2007)
Sharif Razavian, A., Sullivan, J., Maki, A., Carlsson, S.: A baseline for visual instance retrieval with deep convolutional networks. In: International Conference on Learning Representations (ICLR). ICLR (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR) (2015)
Smolensky, P.: Information processing in dynamical systems: foundations of harmony theory. In: Parallel distributed processing: explorations in the microstructure of cognition, pp. 194–281. MIT Press (1986)
Tolias, G., Avrithis, Y., Jégou, H.: To aggregate or not to aggregate: Selective match kernels for image search. In: International Conference on Computer Vision (ICCV), pp. 1401–1408 (2013)
Tolias, G., Sicre, R., Jégou, H.: Particular object retrieval with integral max-pooling of CNN activations. In: International Conference on Learning Representations (ICLR) (2016)
Torralba, A., Fergus, R., Weiss, Y.: Small codes and large image databases for recognition. In: Computer Vision and Pattern Recognition (CVPR), pp. 1–8. IEEE (2008)
Tuytelaars, T.: Dense interest points. In: Computer Vision and Pattern Recognition (CVPR), pp. 2281–2288. IEEE (2010)
Wang, J., Kumar, S., Chang, S.F.: Semi-supervised hashing for scalable image retrieval. In: Computer Vision and Pattern Recognition (CVPR), pp. 3424–3431. IEEE (2010)
Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: Neural Information Processing Systems (NIPS), pp. 1753–1760 (2009)
Winder, S., Hua, G., Brown, M.: Picking the best daisy. In: Computer Vision and Pattern Recognition (CVPR), pp. 178–185. IEEE (2009)
Yahoo: Yahoo! 100 million image data set. http://webscope.sandbox.yahoo.com/
Yeo, C., Ahammad, P., Ramchandran, K.: Rate-efficient visual correspondences using random projections. In: International Conference on Image Processing (ICIP), pp. 217–220. IEEE (2008)
Zhang, C., Evangelopoulos, G., Voinea, S., Rosasco, L., Poggio, T.: A deep representation for invariance and music classification. In: Acoustics, Speech and Signal Processing (ICASSP), pp. 6984–6988. IEEE (2014)
Zhang, T., Du, C., Wang, J.: Composite quantization for approximate nearest neighbor search. In: International Conference on Machine Learning (ICML), pp. 838–846 (2014)
Zhang, T., Qi, G.J., Tang, J., Wang, J.: Sparse composite quantization. In: Computer Vision and Pattern Recognition (CVPR), pp. 4548–4556 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Lin, J., Morère, O., Veillard, A., Chandrasekhar, V. (2019). Deep Learning-Based Descriptors for Object Instance Search. In: Jiang, X., Hadid, A., Pang, Y., Granger, E., Feng, X. (eds) Deep Learning in Object Detection and Recognition. Springer, Singapore. https://doi.org/10.1007/978-981-10-5152-4_8
Download citation
DOI: https://doi.org/10.1007/978-981-10-5152-4_8
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-5151-7
Online ISBN: 978-981-10-5152-4
eBook Packages: Computer ScienceComputer Science (R0)