Multimedia Tools and Applications

, Volume 77, Issue 24, pp 32107–32131 | Cite as

Indexing of the CNN features for the large scale image search

  • Ruoyu LiuEmail author
  • Shikui Wei
  • Yao Zhao
  • Yi Yang


The convolutional neural network (CNN) features can give good description of image content, which usually represent an image with a single feature vector. Although CNN features are more compact than local descriptors, they still cannot efficiently deal with large-scale retrieval due to the linearly incremental cost of computation and storage. To address this issue, we build a simple but effective indexing framework on inverted table, which significantly decreases both search time and memory usage. First, several strategies are fully investigated to adapt inverted table to CNN features for compensating for quantization error. We use multiple assignment for the query and database images to increase the probability that relevant images are assigned to the same visual word obtained via clustering. Embedding codes are also introduced to improve retrieval accuracy by removing false matches. Second, a novel indexing framework that combines inverted table and hashing codes is proposed. This framework is faster than the reformed inverted tables with the introduced strategies. Experiment on several benchmark datasets demonstrates that our method yields faster retrieval speed compared to brute-force search. We also provide fair comparison between popular CNN features.


Convolutional neural network Indexing Inverted table 



This work was supported in part by National Natural Science Foundation of China (No.61532005, No.61572065), National Key Research and Development of China (No.2016YFB08004 04, 2017YFC1703503), Joint Fund of Ministry of Education of China and China Mobile (No.MCM20160102).


  1. 1.
    Babenko A, Lempitsky V (2015) Aggregating local deep features for image retrieval. In: ICCV, pp 1269–1277Google Scholar
  2. 2.
    Babenko A, Slesarev A, Chigorin A, Lempitsky V (2014) Neural codes for image retrieval. In: ECCV, pp 584–599Google Scholar
  3. 3.
    Baeza-Yates R, Ribeiro-Neto B et al (1999) Modern information retrieval, vol 463Google Scholar
  4. 4.
    Bay H, Tuytelaars T, Van Gool L (2006) Surf: Speeded up robust features, pp 404–417CrossRefGoogle Scholar
  5. 5.
    Cao Z, Long M, Wang J, Yu PS (2017) Hashnet: Deep learning to hash by continuation. arXiv:1702.00758
  6. 6.
    Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of singapore. In: CIVRGoogle Scholar
  7. 7.
    Ciaccia P, Patella M, Zezula P (1997) M-tree: An efficient access method for similarity search in metric spaces. In: VLDBGoogle Scholar
  8. 8.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: CVPR, vol. 1, pp 886–893Google Scholar
  9. 9.
    Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: SoCG, pp 253–262Google Scholar
  10. 10.
    Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: CVPR, pp 248–255Google Scholar
  11. 11.
    Dong L, Liang Y, Kong G, Zhang Q, Cao X, Izquierdo E (2016) Holons visual representation for image retrieval. TMM 18(4):714–725Google Scholar
  12. 12.
    Erin Liong V, Lu J, Wang G, Moulin P, Zhou J (2015) Deep hashing for compact binary codes learning. In: CVPR, pp 2475–2483Google Scholar
  13. 13.
    Gao Z, Xue J, Zhou W, Pang S, Tian Q (2016) Democratic diffusion aggregation for image retrieval. TMM 18(8):1661–1674Google Scholar
  14. 14.
    Ge T, He K, Ke Q, Sun J (2014) Optimized product quantization. TPAMI 36(4):744–755CrossRefGoogle Scholar
  15. 15.
    Girshick R (2015) Fast r-cnn. In: ICCV, pp 1440–1448Google Scholar
  16. 16.
    Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp 580–587Google Scholar
  17. 17.
    Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: ECCV, pp 392–407Google Scholar
  18. 18.
    Gordo A, Almazán J., Revaud J, Larlus D (2016) Deep image retrieval: Learning global representations for image search. In: ECCV, pp 241–257Google Scholar
  19. 19.
    Gordo A, Almazan J, Revaud J, Larlus D (2017) End-to-end learning of deep visual representations for image retrieval. IJCV 124(2):237–254MathSciNetCrossRefGoogle Scholar
  20. 20.
    Guttman A (1984) R-trees: A dynamic index structure for spatial searching, vol 14CrossRefGoogle Scholar
  21. 21.
    Han J, Ma KK (2002) Fuzzy color histogram and its use in color image retrieval. TIP 11(8):944–952Google Scholar
  22. 22.
    He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. TPAMI 37(9):1904–1916CrossRefGoogle Scholar
  23. 23.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp 770–778Google Scholar
  24. 24.
    Hu MK (1962) Visual pattern recognition by moment invariants. IRE Trans Inf Theory 8(2):179–187CrossRefGoogle Scholar
  25. 25.
    Hu Y, Zheng L, Yang Y, Huang Y (2017) Twitter100k: A real-world dataset for weakly supervised cross-media retrieval. TMMGoogle Scholar
  26. 26.
    Huiskes MJ, Lew MS (2008) The mir flickr retrieval evaluation. In: MIR, pp 39–43Google Scholar
  27. 27.
    Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: ECCV, pp 304–317Google Scholar
  28. 28.
    Jégou H., Douze M, Schmid C (2010) Improving bag-of-features for large scale image search. IJCV 87(3):316–336CrossRefGoogle Scholar
  29. 29.
    Jegou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. TPAMI 33(1):117–128CrossRefGoogle Scholar
  30. 30.
    Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: CVPR, pp 3304–3311Google Scholar
  31. 31.
    Jegou H, Harzallah H, Schmid C (2007) A contextual dissimilarity measure for accurate and efficient image search. In: CVPR, pp 1–8Google Scholar
  32. 32.
    Ji J, Li J, Yan S, Zhang B, Tian Q (2012) Super-bit locality-sensitive hashing. In: NIPS, pp 108–116Google Scholar
  33. 33.
    Kalantidis Y, Avrithis Y (2014) Locally optimized product quantization for approximate nearest neighbor search. In: CVPR, pp 2329–2336Google Scholar
  34. 34.
    Kalantidis Y, Mellina C, Osindero S (2016) Cross-dimensional weighting for aggregated deep convolutional features. In: ECCV, pp 685–701Google Scholar
  35. 35.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1097–1105Google Scholar
  36. 36.
    Lai H, Pan Y, Liu Y, Yan S (2015) Simultaneous feature learning and hash coding with deep neural networks. In: CVPR, pp 3270–3278Google Scholar
  37. 37.
    Li Q, Sun Z, He R, Tan T (2017) Deep supervised discrete hashing. In: NIPS, pp 2479–2488Google Scholar
  38. 38.
    Li Y, Wang S, Tian Q, Ding X (2015) A survey of recent advances in visual feature detection. Neurocomputing 149:736–751CrossRefGoogle Scholar
  39. 39.
    Li Y, Zhang Y, Huang X, Zhu H, Ma J (2018) Large-scale remote sensing image retrieval by deep hashing neural networks. TGRS 56(2):950–965Google Scholar
  40. 40.
    Li Z, Liu J, Tang J, Lu H (2015) Robust structured subspace learning for data representation. TPAMI 37(10):2085–2098CrossRefGoogle Scholar
  41. 41.
    Li Z, Tang J (2015) Unsupervised feature selection via nonnegative spectral analysis and redundancy control. TIP 24(12):5343–5355MathSciNetGoogle Scholar
  42. 42.
    Li Z, Tang J (2015) Weakly supervised deep metric learning for community-contributed image retrieval. TMM 17(11):1989–1999Google Scholar
  43. 43.
    Li Z, Tang J (2017) Weakly supervised deep matrix factorization for social image understanding. TIP 26(1):276–288MathSciNetGoogle Scholar
  44. 44.
    Li Z, Tang J, He X (2017) Robust structured nonnegative matrix factorization for image representation. TNNLSGoogle Scholar
  45. 45.
    Lin K, Lu J, Chen C, Zhou J (2016) Learning compact binary descriptors with unsupervised deep neural networks, pp 1183–1192Google Scholar
  46. 46.
    Lin K, Yang HF, Hsiao JH, Chen C (2015) Deep learning of binary hash codes for fast image retrieval. In: CVPR, pp 27–35Google Scholar
  47. 47.
    Liu H, Wang R, Shan S, Chen X (2016) Deep supervised hashing for fast image retrieval, pp 2064– 2072Google Scholar
  48. 48.
    Liu R, Wei S, Zhao Y, Zhu Z, Wang J (2018) Multi-view cross-media hashing with semantic consistency. IEEE MultiMediaGoogle Scholar
  49. 49.
    Liu R, Zhao Y, Wei S, Zhu Z (2015) Cross-media hashing with centroid approaching. In: ICME, pp 1–6Google Scholar
  50. 50.
    Liu Y, Zhang D, Lu G, Ma WY (2007) A survey of content-based image retrieval with high-level semantics. Pattern Recogn 40(1):262–282CrossRefGoogle Scholar
  51. 51.
    Liu Z, Li H, Zhou W, Hong R, Tian Q (2015) Uniting keypoints: Local visual information fusion for large-scale image search. TMM 17(4):538–548Google Scholar
  52. 52.
    Liu Z, Li H, Zhou W, Zhao R, Tian Q (2014) Contextual hashing for large-scale image search. TIP 23(4):1606–1614MathSciNetzbMATHGoogle Scholar
  53. 53.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. IJCV 60(2):91–110MathSciNetCrossRefGoogle Scholar
  54. 54.
    Lv Y, Ng WW, Zeng Z, Yeung DS, Chan PP (2015) Asymmetric cyclical hashing for large scale image retrieval. TMM 17(8):1225–1235Google Scholar
  55. 55.
    Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. TPAMI 27(10):1615–1630CrossRefGoogle Scholar
  56. 56.
    Mohedano E, McGuinness K, O’Connor NE, Salvador A, Marqués F, Giró-i Nieto X (2016) Bags of local convolutional features for scalable instance search. In: ACMMM, pp 327–331Google Scholar
  57. 57.
    Mu Y, Liu Z (2017) Deep hashing: A joint approach for image signature learning. In: AAAI, pp 2380–2386Google Scholar
  58. 58.
    Ning Q, Zhu J, Zhong Z, Hoi SC, Chen C (2017) Scalable image retrieval by sparse product quantization. TMM 19(3):586–597Google Scholar
  59. 59.
    Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: CVPR, vol 2, pp 2161–2168Google Scholar
  60. 60.
    Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. In: ECCV, pp 490–503Google Scholar
  61. 61.
    Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: CVPR, pp 1–8Google Scholar
  62. 62.
    Perronnin F, Larlus D (2015) Fisher vectors meet neural networks: A hybrid classification architecture. In: CVPR, pp 3743–3752Google Scholar
  63. 63.
    Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: CVPR, pp 1–8Google Scholar
  64. 64.
    Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: CVPR, pp 779–788Google Scholar
  65. 65.
    Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: NIPS, pp 91–99Google Scholar
  66. 66.
    Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. In: CVPR, pp 512–519Google Scholar
  67. 67.
    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
  68. 68.
    Sivic J, Zisserman A (2003) Video google: A text retrieval approach to object matching in videos. In: ICCV, pp 1470–1477Google Scholar
  69. 69.
    Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: CVPR, pp 1–9Google Scholar
  70. 70.
    Tamura H, Mori S, Yamawaki T (1978) Textural features corresponding to visual perception. IEEE Trans Syst Man Cybern 8(6):460–473CrossRefGoogle Scholar
  71. 71.
    Tolias G, Sicre R, Jégou H (2015) Particular object retrieval with integral max-pooling of cnn activations. arXiv:1511.05879
  72. 72.
    Wang J, Kumar S, Chang SF (2012) Semi-supervised hashing for large-scale search. TPAMI 34(12):2393–2406CrossRefGoogle Scholar
  73. 73.
    Wang J, Wang J, Yu N, Li S (2013) Order preserving hashing for approximate nearest neighbor search. In: ACMMM, pp 133–142Google Scholar
  74. 74.
    Wang J, Zhang T, Sebe N, Shen HT, et al. (2017) A survey on learning to hash. TPAMIGoogle Scholar
  75. 75.
    Wei S, Wu X, Xu D (2013) Partitioned k-means clustering for fast construction of unbiased visual vocabulary. In: The Era of Interactive Media, pp 483–493Google Scholar
  76. 76.
    Wei S, Xu D, Li X, Zhao Y (2013) Joint optimization toward effective and efficient image search. IEEE Trans Cybern 43(6):2216–2227CrossRefGoogle Scholar
  77. 77.
    Weiss Y, Torralba A, Fergus R (2009) Spectral hashing. In: NIPS, pp 1753–1760Google Scholar
  78. 78.
    Xia R, Pan Y, Lai H, Liu C, Yan S (2014) Supervised hashing for image retrieval via image representation learning. In: AAAI, vol. 1, p. 2Google Scholar
  79. 79.
    Yang Y, Nie F, Xu D, Luo J, Zhuang Y, Pan Y (2012) A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. TPAMI 34(4):723–742CrossRefGoogle Scholar
  80. 80.
    Zhang D, Wang J, Cai D, Lu J (2010) Self-taught hashing for fast similarity search. In: SIGIR, pp 18–25Google Scholar
  81. 81.
    Zhang J, Peng Y, Zhang J (2016) Ssdh: Semi-supervised deep hashing for large scale image retrieval. arXiv:1607.08477
  82. 82.
    Zhang Z, Chen Y, Saligrama V (2016) Efficient training of very deep neural networks for supervised hashing. In: CVPR, pp 1487–1495Google Scholar
  83. 83.
    Zhao F, Huang Y, Wang L, Tan T (2015) Deep semantic ranking based hashing for multi-label image retrieval. In: CVPR, pp 1556–1564Google Scholar
  84. 84.
    Zheng L, Wang S, Liu Z, Tian Q (2015) Fast image retrieval: query pruning and early termination. TMM 17(5):648–659Google Scholar
  85. 85.
    Zheng L, Yang Y, Hauptmann AG (2016) Person re-identification: Past, present and future. arXiv:1610.02984
  86. 86.
    Zheng L, Yang Y, Tian Q (2017) Sift meets cnn: A decade survey of instance retrieval. TPAMIGoogle Scholar
  87. 87.
    Zheng Z, Zheng L, Garrett M, Yang Y, Shen YD (2017) Dual-path convolutional image-text embedding. arXiv:1711.05535
  88. 88.
    Zheng Z, Zheng L, Yang Y (2017) A discriminatively learned cnn embedding for person reidentification. ToMM 14(1):13MathSciNetCrossRefGoogle Scholar
  89. 89.
    Zhong Z, Zhu J, Hoi SC (2015) Fast object retrieval using direct spatial matching. TMM 17(8):1391–1397Google Scholar
  90. 90.
    Zhou W, Yang M, Li H, Wang X, Lin Y, Tian Q (2014) Towards codebook-free: Scalable cascaded hashing for mobile image search. TMM 16(3):601–611Google Scholar
  91. 91.
    Zhuang B, Lin G, Shen C, Reid I (2016) Fast training of triplet-based deep binary embedding networks. In: CVPR, pp 5955–5964Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Beijing Jiaotong UniversityBeijingChina
  2. 2.University of Technology SydneyUltimoAustralia

Personalised recommendations