An efficient LSH indexing on discriminative short codes for high-dimensional nearest neighbors

  • Feng Xiaokang
  • Cui JiangtaoEmail author
  • Li Hui
  • Liu Yingfan


In massive multimedia era, the dimension curse and the I/O performance bottleneck have become two major challenges for disk-based Approximate Nearest Neighbor (ANN) search. Hashing is a popular solution to overcome the dimension curse, one promising hashing technique is Locality Sensitive Hashing (LSH). However, most existing LSH indexings incur significant I/O cost during the search due to their low NN candidate hits in each I/O access. We recommend a novel method SC-LSH (SortingCodes-LSH) which combines LSH with another hashing technique (i.e., the discriminative short codes) to lift the hit of NN candidates so as to further boost the ANN search performance. Firstly, we intensify an LSH index and sort all the compound hashing keys according to a linear order to make similar NN candidates distributed locally. Then we generate product quantization (PQ) codes to use them as candidates instead of the original data points. These space-efficient short codes can enable us acquire significantly candidates via much less I/O operations. Moreover, based on theoretical and empirical studies among series of space-filling curves, we finally choose the Gray curve as the linear order to produce better local distribution of candidate data. All these above significantly increase the NN hits during each I/O, which greatly reduce the amount of necessary I/O access. Meanwhile, with the good similarity preserving ability, PQ codes are precise enough to discriminate NNs and thus guarantee the accuracy. Empirical study demonstrates that, comparing with four state-of-the-arts, SC-LSH achieves the best accuracy with significantly smaller I/O cost and space consumption. In fact, depending on the datasets, the I/O cost (resp., space consumption) of our scheme is only 5%-20% (resp., 1%-20%) of the other methods.


Approximate nearest neighbor Hashing Locality-sensitive hashing Discriminative short codes Linear order 



This work is supported by the National Natural Science Foundation of China (Nos. 61472298, 61672408, 61702403, U1135002), China 111 Project (No. B16037), China Postdoctoral Science Foundation (No. 2018M633473), Natural Science Basic Research Plan in Shaanxi Province of China (Program No. 2015JQ6227), SRF for ROCS, SEM, the Fundamental Research Funds for the Central Universities (No. JB170308, etc.) and the Innovation Fund of Xidian University.


  1. 1.
    Babenko A, Lempitsky V (2012) The inverted multi-index. In: CVPR. IEEE, pp 3069–3076Google Scholar
  2. 2.
    Böhm C (2000) A cost model for query processing in high dimensional data spaces. ACM Trans Database Syst 25(2):129–178CrossRefGoogle Scholar
  3. 3.
    Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: SoCG, pp 253–262Google Scholar
  4. 4.
    Faloutsos C, Roseman S (1989) Fractals for secondary key retrieval. In: PODS, pp 247–252Google Scholar
  5. 5.
    Gaede V, Günther O (1998) Multidimensional access methods. ACM Comput Surv 30(2):170–231CrossRefGoogle Scholar
  6. 6.
    Gan J, Feng J, Fang Q, Ng W (2012) Locality sensitive hashing scheme based on dynamic collision counting. In: SIGMOD, pp 541–552Google Scholar
  7. 7.
    Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: VLDB, pp 518–529Google Scholar
  8. 8.
    Gong Y, Lazebnik S (2011) Iterative quantization: a procrustean approach to learning binary codes. In: CVPR. pp 817–824Google Scholar
  9. 9.
    He S, Ye G, Hu M, Yang Y, Shen F, Shen HT, Li X (2018) Learning binary codes with local and inner data structure. Neurocomputing 282:32–41CrossRefGoogle Scholar
  10. 10.
    Huang Q, Feng J, Zhang Y, Fang Q, Ng W (2015) Query-aware locality-sensitive hashing for approximate nearest neighbor search. PVLDB 9(1):1–12Google Scholar
  11. 11.
    Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC, pp 604–613Google Scholar
  12. 12.
    Jegou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Mach Intell 33(1):117–128CrossRefGoogle Scholar
  13. 13.
    Joly A, Buisson O (2008) A posteriori multi-probe locality sensitive hashing. In: ACM multimedia, pp 209–218Google Scholar
  14. 14.
    Kalantidis Y, Avrithis YS (2014) Locally optimized product quantization for approximate nearest neighbor search. In: CVPR, pp 2329–2336Google Scholar
  15. 15.
    Li Z, Nie F, Chang X, Yang Y (2017) Beyond trace ratio: weighted harmonic mean of trace ratios for multiclass discriminant analysis. IEEE Trans Knowl Data Eng 29(10):2100–2110CrossRefGoogle Scholar
  16. 16.
    Liu Y, Cui J, Huang Z, Li H, Shen HT (2014) SK-LSH: An efficient index structure for approximate nearest neighbor search. PVLDB 7(9):745–756Google Scholar
  17. 17.
    Lowe DG (1999) Object recognition from local scale-invariant features. In: The proceedings of the 7th IEEE international conference on computer vision, 1999, vol 2. IEEE, pp 1150–1157Google Scholar
  18. 18.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110MathSciNetCrossRefGoogle Scholar
  19. 19.
    Luo M, Chang X, Li Z, Nie L, Hauptmann AG, Zheng Q (2017) Simple to complex cross-modal learning to rank. Comput Vis Image Underst 163:67–77CrossRefGoogle Scholar
  20. 20.
    Luo X, Nie L, He X, Wu Y, Chen ZD, Xu XS (2018) Fast scalable supervised hashing. In: SIGIR, pp 735–744Google Scholar
  21. 21.
    Lv Q, Josephson W, Wang Z, Charikar M, Li K (2007) Multi-probe lsh: efficient indexing for high-dimensional similarity search. In: VLDB, pp 950–961Google Scholar
  22. 22.
    Nie L, Wang M, Zha ZJ, Chua TS (2012) Oracle in image search: a content-based approach to performance prediction. ACM Trans Inf Syst (TOIS) 30 (2):13CrossRefGoogle Scholar
  23. 23.
    Nie L, Yan S, Wang M, Hong R, Chua TS (2012) Harvesting visual concepts for image search with complex queries. In: Proceedings of the 20th ACM international conference on multimedia. ACM, pp 59–68Google Scholar
  24. 24.
    Norouzi M, Fleet DJ (2013) Cartesian k-means. In: CVPR, pp 3017–3024Google Scholar
  25. 25.
    Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175CrossRefGoogle Scholar
  26. 26.
    Panigrahy R (2006) Entropy based nearest neighbor search in high dimensions. In: SODA, pp 1186–1195Google Scholar
  27. 27.
    Park Y, Cafarella MJ, Mozafari B (2015) Neighbor-sensitive hashing. PVLDB 9(3):144–155Google Scholar
  28. 28.
    Shen F, Zhou X, Yang Y, Song J, Shen HT, Tao D (2016) A fast optimization method for general binary code learning. IEEE Trans Image Process 25 (12):5610–5621MathSciNetCrossRefGoogle Scholar
  29. 29.
    Shen F, Yang Y, Liu L, Liu W, Dacheng Tao HTS (2017) Asymmetric binary coding for image search. IEEE Trans Multimed 19(9):2022–2032CrossRefGoogle Scholar
  30. 30.
    Shen F, Xu Y, Liu L, Yang Y, Huang Z, Shen HT (2018) Unsupervised deep hashing with similarity-adaptive and discrete optimization. IEEE Trans Pattern Anal Mach IntellGoogle Scholar
  31. 31.
    Sun Y, Wang W, Qin J, Zhang Y, Lin X (2014) SRS: Solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index. PVLDB 8(1):1–12Google Scholar
  32. 32.
    Tao Y, Yi K, Sheng C, Kalnis P (2009) Quality and efficiency in high dimensional nearest neighbor search. In: SIGMOD, pp 563–576Google Scholar
  33. 33.
    Vitter JS (2008) Algorithms and data structures for external memory. Foundations TrendsⓇ, Theor Comput Sci 2(4):305–474MathSciNetCrossRefGoogle Scholar
  34. 34.
    Wang J, Shen HT, Song J, Ji J (2014) Hashing for similarity search: a survey. CoRR 1408.2927
  35. 35.
    Weber R, Schek HJ, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB, vol 98, pp 194–205Google Scholar
  36. 36.
    Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. In: Proceedings of the 22nd annual conference on neural information processing systems, advances in neural information processing systems 21, Vancouver, British Columbia, Canada, December 8-11, 2008, pp 1753–1760Google Scholar
  37. 37.
    Zhang PF, Li CX, Liu MY, Nie L, Xu XS (2017) Semi-relaxation supervised hashing for cross-modal retrieval. In: Proceedings of the 2017 ACM on multimedia conference. ACM, pp 1762–1770Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyXidian UniversityXi’anChina
  2. 2.School of Cyber EngineeringXidian UniversityXi’anChina
  3. 3.Department of System Engineering and Engineering ManagementChinese University of Hong KongHong KongChina

Personalised recommendations