Skip to main content

Faster Dual-Tree Traversal for Nearest Neighbor Search

  • Conference paper
  • First Online:
Similarity Search and Applications (SISAP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9371))

Included in the following conference series:

Abstract

Nearest neighbor search is a nearly ubiquitous problem in computer science. When nearest neighbors are desired for a query set instead of a single query point, dual-tree algorithms often provide the fastest solution, especially in low-to-medium dimensions (i.e. up to a hundred or so), and can give exact results or absolute approximation guarantees, unlike hashing techniques. Using a recent decomposition of dual-tree algorithms into modular pieces, we propose a new piece: an improved traversal strategy; it is applicable to any dual-tree algorithm. Applied to nearest neighbor search using both kd-trees and ball trees, the new strategy demonstrably outperforms the previous fastest approaches. Other problems the traversal may easily be applied to include kernel density estimation and max-kernel search.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kumar, N., Zhang, L., Nayar, S.K.: What is a good nearest neighbors algorithm for finding similar patches in images? In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 364–378. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  2. Koren, Y.: The BellKor solution to the Netflix Grand Prize (2009)

    Google Scholar 

  3. Cunningham, P., Delany, S.J.: k-nearest neighbour classifiers. Technical Report UCD-CSI-2007-4, University College Dublin (2007)

    Google Scholar 

  4. Weinberger, K.Q., Blitzer, J., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. In: Advances in Neural Information Processing Systems 18 (NIPS 2005), pp. 1473–1480 (2005)

    Google Scholar 

  5. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Communications of the ACM 18(9), 509–517 (1975)

    Article  MATH  Google Scholar 

  6. Fukunaga, K., Narendra, P.M.: A branch and bound algorithm for computing k-nearest neighbors. IEEE Transactions on Computers 100(7), 750–753 (1975)

    Article  MATH  Google Scholar 

  7. Clarkson, K.L.: Nearest neighbor queries in metric spaces. Discrete & Computational Geometry 22(1), 63–93 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  8. Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: Forty-Seventh Annual IEEE Symposium of Foundations of Computer Science (FOCS 2006), pp. 459–468. IEEE (2006)

    Google Scholar 

  9. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing (STOC 1998), pp. 604–613. ACM (1998)

    Google Scholar 

  10. Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry (SoCG 2004), pp. 253–262. ACM (2004)

    Google Scholar 

  11. Gray, A.G., Moore, A.W.: ‘N-Body’ problems in statistical learning. In: Advances in Neural Information Processing Systems, vol. 14, no. 4, pp. 521–527 (2001)

    Google Scholar 

  12. Curtin, R.R., March, W.B., Ram, P., Anderson, D.V., Gray, A.G., Isbell Jr., C.L.: Tree-independent dual-tree algorithms. In: Proceedings of the 30th International Conference on Machine Learning (ICML 2013) (2013)

    Google Scholar 

  13. Ram, P., Lee, D., March, W.B., Gray, A.G.: Linear-time algorithms for pairwise statistical problems. In: Advances in Neural Information Processing Systems, vol. 22 (2009)

    Google Scholar 

  14. Curtin, R.R., Lee, D., March, W.B., Ram, P.: Plug-and-play runtime analysis for dual-tree algorithms. The Journal of Machine Learning Research (2015)

    Google Scholar 

  15. Gray, A.G., Moore, A.W.: Nonparametric density estimation: toward computational tractability. In: Proceedings of the 3rd SIAM International Conference on Data Mining (SDM 2003), San Francisco, pp. 203–211 (2003)

    Google Scholar 

  16. March, W.B., Ram, P., Gray, A.G.: Fast Euclidean minimum spanning tree: algorithm, analysis, and applications. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2010), Washington, D.C., pp. 603–612 (2010)

    Google Scholar 

  17. Wang, P., Lee, D., Gray, A.G., Rehg, J.M.: Fast mean shift with accurate and stable convergence. In: Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS 2007), pp. 604–611 (2007)

    Google Scholar 

  18. Lee, D., Gray, A.G.: Faster gaussian summation: theory and experiment. In: Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (2006)

    Google Scholar 

  19. Curtin, R.R., Ram, P., Gray, A.G.: Fast exact max-kernel search. In: SIAM International Conference on Data Mining (SDM 2013), pp. 1–9 (2013)

    Google Scholar 

  20. Klaas, M., Briers, M., De Freitas, N., Doucet, A., Maskell, S., Lang, D.: Fast particle smoothing: if I had a million particles. In: Proceedings of the 23rd International Conference on Machine Learning (ICML 2006), pp. 25–29 (2006)

    Google Scholar 

  21. Van Der Maaten, L.: Accelerating t-sne using tree-based algorithms. The Journal of Machine Learning Research 15(1), 3221–3245 (2014)

    MathSciNet  MATH  Google Scholar 

  22. Moore, D.A., Russell, S.J.: Fast Gaussian process posteriors with product trees. In: Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence (UAI 2014), Quebec City, July 2014

    Google Scholar 

  23. Curtin, R.R., Ram, P.: Dual-tree fast exact max-kernel search. Statistical Analysis and Data Mining 7(4), 229–253 (2014)

    Article  MathSciNet  Google Scholar 

  24. Liu, T., Moore, A.W., Yang, K., Gray, A.G.: An investigation of practical approximate nearest neighbor algorithms. In: Advances in Neural Information Processing Systems 17 (NIPS 2004), pp. 825–832 (2004)

    Google Scholar 

  25. Curtin, R.R., Cline, J.R., Slagle, N.P., March, W.B., Ram, P., Mehta, N.A., Gray, A.G.: mlpack: A scalable C++ machine learning library. Journal of Machine Learning Research 14, 801–805 (2013)

    MathSciNet  MATH  Google Scholar 

  26. Lichman, M.: UCI machine learning repository (2013)

    Google Scholar 

  27. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery 1(2), 141–182 (1997)

    Article  Google Scholar 

  28. Lupton, R., Gunn, J.E., Ivezic, Z., Knapp, G.R., Kent, S.: The SDSS imaging pipelines. In: Astronomical Data Analysis Software and Systems X, vol. 238, p. 269 (2001)

    Google Scholar 

  29. Adelman-McCarthy, J.K., Agüeros, M.A., Allam, S.S., Prieto, C.A., Anderson, K.S.J., et al.: The sixth data release of the Sloan Digital Sky Survey. The Astrophysical Journal Supplement Series 175(2), 297 (2008)

    Article  Google Scholar 

  30. Dong, W., Wang, Z., Josephson, W.K., Charikar, M., Li, K.: Modeling LSH for performance tuning. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM 2008), pp. 669–678. ACM (2008)

    Google Scholar 

  31. Dong, W.: Personal communication (2015)

    Google Scholar 

  32. Moore, A.W.: The anchors hierarchy: using the triangle inequality to survive high dimensional data. In: Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI 2000), pp. 397–405 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ryan R. Curtin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Curtin, R.R. (2015). Faster Dual-Tree Traversal for Nearest Neighbor Search. In: Amato, G., Connor, R., Falchi, F., Gennaro, C. (eds) Similarity Search and Applications. SISAP 2015. Lecture Notes in Computer Science(), vol 9371. Springer, Cham. https://doi.org/10.1007/978-3-319-25087-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25087-8_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25086-1

  • Online ISBN: 978-3-319-25087-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics