An Efficient Exact Nearest Neighbor Search by Compounded Embedding

  • Mingjie Li
  • Ying Zhang
  • Yifang Sun
  • Wei Wang
  • Ivor W. Tsang
  • Xuemin Lin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10827)

Abstract

Nearest neighbor search (NNS) in high dimensional space is a fundamental and essential operation in applications from many domains, such as machine learning, databases, multimedia and computer vision. In this paper, we first propose a novel and effective distance lower bound computation technique for Euclidean distance by using the combination of linear and non-linear embedding methods. As such, each point in a high dimensional space can be embedded into a low dimensional space such that the distance between two embedded points lower bounds their distance in the original space. Following the filter-and-verify paradigm, we develop an efficient exact NNS algorithm by pruning candidates using the new lower bounding technique and hence reducing the cost of expensive distance computation in high dimensional space. Our comprehensive experiments on 10 real-life and diverse datasets, including image, video, audio and text data, demonstrate that our new algorithm can significantly outperform the state-of-the-art exact NNS techniques.

Notes

Acknowledgement

Ying Zhang is supported by ARC DE140100679 and DP170103710. Wei Wang is supported by ARC DP170103710, and D2DCRC DC25002 and DC25003. Ivor W. Tsang is supported by ARC grant FT130100746, DP180100106, and LP150100671. Xuemin Lin is supported by NSFC 61672235, DP170101628 and DP180103096.

References

  1. 1.
    Amsaleg, L., Chelly, O., Furon, T., Girard, S., Houle, M.E., Kawarabayashi, K., Nett, M.: Estimating local intrinsic dimensionality. In: KDD, pp. 29–38 (2015)Google Scholar
  2. 2.
    Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)CrossRefGoogle Scholar
  3. 3.
    Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 97–104. ACM (2006)Google Scholar
  4. 4.
    Dong, W., Charikar, M., Li, K.: Efficient k-nearest neighbor graph construction for generic similarity measures. In: WWW, pp. 577–586 (2011)Google Scholar
  5. 5.
    Feng, X., Cui, J., Liu, Y., Li, H.: Effective optimizations of cluster-based nearest neighbor search in high-dimensional space. Multimedia Syst. 23(1), 139–153 (2017)CrossRefGoogle Scholar
  6. 6.
    Ge, T., He, K., Ke, Q., Sun, J.: Optimized product quantization for approximate nearest neighbor search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2946–2953 (2013)Google Scholar
  7. 7.
    Golub, G.H., Van Loan, C.F.: Matrix Computations, vol. 3. JHU Press, Baltimore (2012)MATHGoogle Scholar
  8. 8.
    Halko, N., Martinsson, P.-G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Hotelling, H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24(6), 417 (1933)CrossRefGoogle Scholar
  10. 10.
    Hwang, Y., Han, B., Ahn, H.-K.: A fast nearest neighbor search algorithm by nonlinear embedding. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3053–3060. IEEE (2012)Google Scholar
  11. 11.
    Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 604–613. ACM (1998)Google Scholar
  12. 12.
    Jagadish, H.V., Ooi, B.C., Tan, K.-L., Yu, C., Zhang, R.: iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst. (TODS) 30(2), 364–397 (2005)CrossRefGoogle Scholar
  13. 13.
    Jolliffe, I.T.: Principal component analysis and factor analysis. In: Jolliffe, I.T. (ed.) Principal Component Analysis, pp. 150–166. Springer, New York (2002).  https://doi.org/10.1007/0-387-22440-8_7CrossRefMATHGoogle Scholar
  14. 14.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  15. 15.
    Li, W., Zhang, Y., Sun, Y., Wang, W., Zhang, W., Lin, X.: Approximate nearest neighbor search on high dimensional data - experiments, analyses, and improvement (v1.0). CoRR, abs/1610.02455 (2016)Google Scholar
  16. 16.
    Liaw, Y.-C., Leou, M.-L., Wu, C.-M.: Fast exact k nearest neighbors search using an orthogonal search tree. Pattern Recogn. 43(6), 2351–2358 (2010)CrossRefGoogle Scholar
  17. 17.
    Liu, W., Wang, J., Kumar, S., Chang, S.: Hashing with graphs. In: ICML, pp. 1–8 (2011)Google Scholar
  18. 18.
    Malkov, Y., Ponomarenko, A., Logvinov, A., Krylov, V.: Approximate nearest neighbor algorithm based on navigable small world graphs. Inf. Syst. 45, 61–68 (2014)CrossRefGoogle Scholar
  19. 19.
    Martinsson, P.-G., Rokhlin, V., Tygert, M.: A randomized algorithm for the decomposition of matrices. Appl. Comput. Harmonic Anal. 30(1), 47–68 (2011)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Muja, M., Lowe, D.G.: Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)CrossRefGoogle Scholar
  21. 21.
    Ramaswamy, S., Rose, K.: Adaptive cluster distance bounding for high-dimensional indexing. IEEE Trans. Knowl. Data Eng. 23(6), 815–830 (2011)CrossRefGoogle Scholar
  22. 22.
    Silpa-Anan, C., Hartley, R.: Optimised KD-trees for fast image descriptor matching. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)Google Scholar
  23. 23.
    Sun, Y., Wang, W., Qin, J., Zhang, Y., Lin, X.: SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index. Proc. VLDB Endow. 8(1), 1–12 (2014)CrossRefGoogle Scholar
  24. 24.
    Torgerson, W.S.: Multidimensional scaling: I. Theory and method. Psychometrika 17(4), 401–419 (1952)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Mingjie Li
    • 1
  • Ying Zhang
    • 1
  • Yifang Sun
    • 2
  • Wei Wang
    • 2
  • Ivor W. Tsang
    • 1
  • Xuemin Lin
    • 2
  1. 1.Centre for Artificial IntelligenceUniversity of Technology SydneySydneyAustralia
  2. 2.The University of New South WalesSydneyAustralia

Personalised recommendations