Skip to main content

Deep Learning-Based Descriptors for Object Instance Search

  • Chapter
Book cover Deep Learning in Object Detection and Recognition

Abstract

In the past 5 years, deep learning has achieved remarkable breakthroughs, mainly attributed to the success of convolutional neural networks (CNN) on vision applications like ImageNet classification. In this chapter, we are interested in deep learning-based descriptors for object instance search in images. Specifically, we propose to tackle some practical issues of existing CNN models, with a focus on resource-efficient yet effective deep descriptors extracted from CNN. (1) How to achieve compact image representations (e.g., hundreds of bits) from deep neural networks in an end-to-end manner? (2) How to encode scale/rotation invariances into deep CNN architecture? To address the issues, our approach has two novel contributions. First, we propose Restricted Boltzmann Machine with a novel batch-level regularization scheme specifically designed for the purpose of descriptor hashing (RBMH), which is able to match the performance of the uncompressed descriptor with tiny 32–256 bit hashes. Second, inspired from invariance theory, we propose Nested Invariance Pooling (NIP), a method for computing group-invariant transformations with feed-forward neural networks. We specifically incorporate scale, translation, and rotation invariances, but the scheme can be extended to any arbitrary sets of transformations. A thorough empirical evaluation with state of the art shows that the results obtained both with the NIP descriptors and the NIP+RBMH hashes are consistently outstanding across a wide range of datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. CDVS Patches (2013). http://blackhole1.stanford.edu/vijayc/cdvs_patches.tar

    Google Scholar 

  2. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)

    Google Scholar 

  3. Anselmi, F., Leibo, J.Z., Rosasco, L., Mutch, J., Tacchetti, A., Poggio, T.: Magic materials: a theory of deep hierarchical architectures for learning sensory representations. CBCL paper (2013)

    Google Scholar 

  4. Anselmi, F., Leibo, J.Z., Rosasco, L., Mutch, J., Tacchetti, A., Poggio, T.: Unsupervised learning of invariant representations in hierarchical architectures. arXiv preprint arXiv:1311.4158 (2013)

    Google Scholar 

  5. Anselmi, F., Poggio, T.: Representation learning in sensory cortex: a theory. Tech. rep., Center for Brains, Minds and Machines (CBMM) (2014)

    Google Scholar 

  6. Arandjelovic, R., Zisserman, A.: All about vlad. In: Computer Vision and Pattern Recognition (CVPR), pp. 1578–1585 (2013)

    Google Scholar 

  7. Azizpour, H., Razavian, A., Sullivan, J., Maki, A., Carlsson, S.: From generic to specific deep representations for visual recognition. In: Computer Vision and Pattern Recognition Workshops (CVPR), pp. 36–45 (2015)

    Google Scholar 

  8. Babenko, A., Lempitsky, V.: Aggregating local deep features for image retrieval. In: International Conference on Computer Vision (ICCV), pp. 1269–1277 (2015)

    Google Scholar 

  9. Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: European Conference on Computer Vision (ECCV), pp. 584–599. Springer (2014)

    Google Scholar 

  10. Bromley, J., Guyon, I., LeCun, Y., Sackinger, E., Shah, R.: Signature verification using a Siamese time delay neural network. In: J. Cowan, G. Tesauro (eds.) Neural Information Processing Systems (NIPS), vol. 6. Morgan Kaufmann (1993)

    Google Scholar 

  11. Chandrasekhar, V., Lin, J., Morère, O., Goh, H., Veillard, A.: A practical guide to CNNs and Fisher vectors for image instance retrieval. Signal Processing (SIGPRO) (2016)

    Google Scholar 

  12. Chandrasekhar, V., Lin, J., Morère, O., Veillard, A., Goh, H.: Compact global descriptors for visual search. In: Data Compression Conference (DCC), pp. 333–342. IEEE (2015)

    Google Scholar 

  13. Chandrasekhar, V., Makar, M., Takacs, G., Chen, D., Tsai, S.S., Cheung, N.M., Grzeszczuk, R., Reznik, Y., Girod, B.: Survey of sift compression schemes. In: Mobile Multimedia Processing Workshop (MMP), pp. 35–40. Citeseer (2010)

    Google Scholar 

  14. Chandrasekhar, V., Takacs, G., Chen, D., Tsai, S., Grzeszczuk, R., Girod, B.: Chog: Compressed histogram of gradients a low bit-rate feature descriptor. In: Computer Vision and Pattern Recognition (CVPR), pp. 2504–2511. IEEE (2009)

    Google Scholar 

  15. Chen, D., Tsai, S., Chandrasekhar, V., Takacs, G., Chen, H., Vedantham, R., Grzeszczuk, R., Girod, B.: Residual enhanced visual vectors for on-device image matching. Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR) pp. 850–854 (2011). URL http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6190128

  16. Chen, D., Tsai, S., Chandrasekhar, V., Takacs, G., Vedantham, R., Grzeszczuk, R., Girod, B.: Residual enhanced visual vector as a compact signature for mobile visual search. Signal Processing (SIGPRO) 93(8), 2316–2327 (2013)

    Article  Google Scholar 

  17. Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 539–546. IEEE (2005)

    Google Scholar 

  18. Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Annual Symposium on Computational Geometry (SoCG), pp. 253–262. ACM (2004)

    Google Scholar 

  19. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24(6), 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  20. Girshick, R.: Fast r-cnn. In: International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)

    Google Scholar 

  21. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Computer Vision and Pattern Recognition (CVPR), pp. 580–587. IEEE (2014)

    Google Scholar 

  22. Goh, H., Thome, N., Cord, M., Lim, J.H.: Unsupervised and supervised visual codes with restricted Boltzmann machines. In: European Conference on Computer Vision (ECCV), pp. 298–311. Springer (2012)

    Google Scholar 

  23. Gong, Y., Kumar, S., Rowley, H., Lazebnik, S.: Learning binary codes for high-dimensional data using bilinear projections. In: Computer Vision and Pattern Recognition (CVPR), pp. 484–491 (2013)

    Google Scholar 

  24. Gong, Y., Lazebnik, S.: Iterative quantization: A procrustean approach to learning binary codes. In: Computer Vision and Pattern Recognition (CVPR), pp. 817–824. IEEE (2011)

    Google Scholar 

  25. Gong, Y., Lazebnik, S.: Iterative quantization: A procrustean approach to learning binary codes. Computer Vision and Pattern Recognition (CVPR) pp. 817–824 (2011)

    Google Scholar 

  26. Gong, Y., Wang, L., Guo, R., Lazebnik, S.: Multi-scale orderless pooling of deep convolutional activation features. In: European Conference on Computer Vision (ECCV), pp. 392–407. Springer (2014)

    Google Scholar 

  27. Gordo, A., Perronnin, F., Gong, Y., Lazebnik, S.: Asymmetric distances for binary embeddings. Pattern Analysis and Machine Intelligence (PAMI) 36(1), 33–47 (2014)

    Article  Google Scholar 

  28. Gordo, A., Rodríguez-Serrano, J.A., Perronnin, F., Valveny, E.: Leveraging category-level labels for instance-level image retrieval. In: Computer Vision and Pattern Recognition (CVPR), pp. 3045–3052. IEEE (2012)

    Google Scholar 

  29. Grauman, K., Fergus, R.: Learning binary hash codes for large-scale image search. In: Machine Learning for Computer Vision, pp. 49–87. Springer (2013)

    Google Scholar 

  30. Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 1735–1742. IEEE (2006)

    Google Scholar 

  31. Heo, J.P., Lee, Y., He, J., Chang, S.F., Yoon, S.E.: Spherical hashing. In: Computer Vision and Pattern Recognition (CVPR), pp. 2957–2964. IEEE (2012)

    Google Scholar 

  32. Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Computation (NC) 14(8), 1771–1800 (2002)

    Article  Google Scholar 

  33. Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Computation (NC) 18(7), 1527–1554 (2006)

    Article  MathSciNet  Google Scholar 

  34. Hinton, G.E., Sejnowski, T.J.: Learning and relearning in Boltzmann machines. Parallel distributed processing: Explorations in the microstructure of cognition 1, 282–317 (1986)

    Google Scholar 

  35. Huiskes, M.J., Thomee, B., Lew, M.S.: New trends and ideas in visual concept detection: the mir flickr retrieval evaluation initiative. In: Multimedia Information Retrieval (MIR), pp. 527–536. ACM (2010)

    Google Scholar 

  36. Jégou, H., Chum, O.: Negative evidences and co-occurences in image retrieval: The benefit of pca and whitening. In: European Conference on Computer Vision (ECCV), pp. 774–787. Springer (2012)

    Google Scholar 

  37. Jégou, H., Douze, M., Schmid, C.: On the burstiness of visual elements. In: Computer Vision and Pattern Recognition (CVPR), pp. 1169–1176. IEEE (2009)

    Google Scholar 

  38. Jégou, H., Perronnin, F., Douze, M., Sanchez, J., Perez, P., Schmid, C.: Aggregating local image descriptors into compact codes. Pattern Analysis and Machine Intelligence (PAMI) 34(9), 1704–1716 (2012)

    Article  Google Scholar 

  39. Jégou, H., Zisserman, A.: Triangulation embedding and democratic aggregation for image search. In: Computer Vision and Pattern Recognition (CVPR), pp. 3310–3317. IEEE (2014)

    Google Scholar 

  40. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: Conference on Multimedia (ACMMM), pp. 675–678. ACM (2014)

    Google Scholar 

  41. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Neural Information Processing Systems (NIPS) pp. 1–9 (2012). URL https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

  42. Kulis, B., Grauman, K.: Kernelized locality-sensitive hashing for scalable image search. In: International Conference on Computer Vision (ICCV), pp. 2130–2137. IEEE (2009)

    Google Scholar 

  43. Liao, Q., Leibo, J.Z., Poggio, T.: Learning invariant representations and applications to face verification. In: Neural Information Processing Systems (NIPS), pp. 3057–3065 (2013)

    Google Scholar 

  44. Lin, J., Duan, L.Y., Huang, T., Gao, W.: Robust Fisher codes for large scale image retrieval. In: International Conference on Acoustics and Signal Processing (ICASSP) (2013)

    Google Scholar 

  45. Lin, J., Duan, L.y., Wang, S., Bai, Y., Lou, Y., Chandrasekhar, V., Huang, T., Kot, A., Gao, W.: Hnip: Compact deep invariant representations for video matching, localization and retrieval. Transactions on Multimedia (TMM) (2017)

    Google Scholar 

  46. Lin, J., Morère, O., Chandrasekhar, V., Veillard, A., Goh, H.: Co-sparsity regularized deep hashing for image instance retrieval. In: International Conference on Image Processing (ICIP). IEEE (2016)

    Google Scholar 

  47. Lin, J., Morère, O., Petta, J., Chandrasekhar, V., Veillard, A.: Tiny descriptors for image retrieval with unsupervised triplet hashing. In: Data Compression Conference (DCC) (2016)

    Google Scholar 

  48. Lin, J., Morère, O., Veillard, A., Duan, L.Y., Goh, H., Chandrasekhar, V.: Deephash for image instance retrieval: Getting regularization, depth and fine-tuning right. In: ACM International Conference on Multimedia Retrieval (2017)

    Google Scholar 

  49. Liu, W., Wang, J., Ji, R., Jiang, Y.G., Chang, S.F.: Supervised hashing with kernels. In: Computer Vision and Pattern Recognition (CVPR), pp. 2074–2081. IEEE (2012)

    Google Scholar 

  50. Lou, Y., Bai, Y., Lin, J., Wang, S., Chen, J., Chandrasekhar, V., Duan, L.y., Tiejun, H., Kot, A., Gao, W.: Compact deep invariant descriptors for video retrieval. In: Data Compression Conference (DCC) (2017)

    Google Scholar 

  51. Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (IJCV) 60(2), 91–110 (2004)

    Article  Google Scholar 

  52. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (IJCV) 60(2), 91–110 (2004). doi: 10.1023/B:VISI.0000029664.99615.94. URL http://link.springer.com/10.1023/B:VISI.0000029664.99615.94

  53. Morère, O., Lin, J., Veillard, A., Duan, L.y., Chandrasekhar, V., Poggio, T.: Nested invariance pooling and rbm hashing for image instance retrieval. In: ACM International Conference on Multimedia Retrieval (2017)

    Google Scholar 

  54. Morère, O., Veillard, A., Lin, J., Petta, J., Chandrasekhar, V., Poggio, T.: Group invariant deep representations for image instance retrieval. In: AAAI Symposium on Science of Intelligence (2017)

    Google Scholar 

  55. Nair, V., Hinton, G.E.: 3D object recognition with deep belief nets. In: Neural Information Processing Systems (NIPS), pp. 1339–1347 (2009)

    Google Scholar 

  56. Norouzi, M., Blei, D.M.: Minimal loss hashing for compact binary codes. In: International Conference on Machine Learning (ICML), pp. 353–360 (2011)

    Google Scholar 

  57. Perronnin, F., Larlus, D.: Fisher vectors meet neural networks: A hybrid classification architecture. In: Computer Vision and Pattern Recognition (CVPR), pp. 3743–3752 (2015)

    Google Scholar 

  58. Perronnin, F., Liu, Y., Sánchez, J., Poirier, H.: Large-scale image retrieval with compressed fisher vectors. In: Computer Vision and Pattern Recognition (CVPR), pp. 3384–3391. IEEE (2010)

    Google Scholar 

  59. Raginsky, M., Lazebnik, S.: Locality-sensitive binary codes from shift-invariant kernels. In: Neural Information Processing Systems (NIPS), pp. 1509–1517 (2009)

    Google Scholar 

  60. Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: Computer Vision and Pattern Recognition Workshop (CVPR), pp. 806–813 (2014)

    Google Scholar 

  61. Salakhutdinov, R., Mnih, A., Hinton, G.: Restricted Boltzmann machines for collaborative filtering. In: International Conference on Machine Learning (ICML), pp. 791–798. ACM (2007)

    Google Scholar 

  62. Sharif Razavian, A., Sullivan, J., Maki, A., Carlsson, S.: A baseline for visual instance retrieval with deep convolutional networks. In: International Conference on Learning Representations (ICLR). ICLR (2015)

    Google Scholar 

  63. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR) (2015)

    Google Scholar 

  64. Smolensky, P.: Information processing in dynamical systems: foundations of harmony theory. In: Parallel distributed processing: explorations in the microstructure of cognition, pp. 194–281. MIT Press (1986)

    Google Scholar 

  65. Tolias, G., Avrithis, Y., Jégou, H.: To aggregate or not to aggregate: Selective match kernels for image search. In: International Conference on Computer Vision (ICCV), pp. 1401–1408 (2013)

    Google Scholar 

  66. Tolias, G., Sicre, R., Jégou, H.: Particular object retrieval with integral max-pooling of CNN activations. In: International Conference on Learning Representations (ICLR) (2016)

    Google Scholar 

  67. Torralba, A., Fergus, R., Weiss, Y.: Small codes and large image databases for recognition. In: Computer Vision and Pattern Recognition (CVPR), pp. 1–8. IEEE (2008)

    Google Scholar 

  68. Tuytelaars, T.: Dense interest points. In: Computer Vision and Pattern Recognition (CVPR), pp. 2281–2288. IEEE (2010)

    Google Scholar 

  69. Wang, J., Kumar, S., Chang, S.F.: Semi-supervised hashing for scalable image retrieval. In: Computer Vision and Pattern Recognition (CVPR), pp. 3424–3431. IEEE (2010)

    Google Scholar 

  70. Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: Neural Information Processing Systems (NIPS), pp. 1753–1760 (2009)

    Google Scholar 

  71. Winder, S., Hua, G., Brown, M.: Picking the best daisy. In: Computer Vision and Pattern Recognition (CVPR), pp. 178–185. IEEE (2009)

    Google Scholar 

  72. Yahoo: Yahoo! 100 million image data set. http://webscope.sandbox.yahoo.com/

  73. Yeo, C., Ahammad, P., Ramchandran, K.: Rate-efficient visual correspondences using random projections. In: International Conference on Image Processing (ICIP), pp. 217–220. IEEE (2008)

    Google Scholar 

  74. Zhang, C., Evangelopoulos, G., Voinea, S., Rosasco, L., Poggio, T.: A deep representation for invariance and music classification. In: Acoustics, Speech and Signal Processing (ICASSP), pp. 6984–6988. IEEE (2014)

    Google Scholar 

  75. Zhang, T., Du, C., Wang, J.: Composite quantization for approximate nearest neighbor search. In: International Conference on Machine Learning (ICML), pp. 838–846 (2014)

    Google Scholar 

  76. Zhang, T., Qi, G.J., Tang, J., Wang, J.: Sparse composite quantization. In: Computer Vision and Pattern Recognition (CVPR), pp. 4548–4556 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie Lin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Cite this chapter

Lin, J., Morère, O., Veillard, A., Chandrasekhar, V. (2019). Deep Learning-Based Descriptors for Object Instance Search. In: Jiang, X., Hadid, A., Pang, Y., Granger, E., Feng, X. (eds) Deep Learning in Object Detection and Recognition. Springer, Singapore. https://doi.org/10.1007/978-981-10-5152-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-5152-4_8

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-5151-7

  • Online ISBN: 978-981-10-5152-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics