Skip to main content

Fast Approximate Furthest Neighbors with Data-Dependent Candidate Selection

  • Conference paper
  • First Online:
Similarity Search and Applications (SISAP 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9939))

Included in the following conference series:

Abstract

We present a novel strategy for approximate furthest neighbor search that selects a candidate set using the data distribution. This strategy leads to an algorithm, which we call DrusillaSelect, that is able to outperform existing approximate furthest neighbor strategies. Our strategy is motivated by an empirical study of the behavior of the furthest neighbor search problem, which lends intuition for where our algorithm is most useful. We also present a variant of the algorithm that gives an absolute approximation guarantee; under some assumptions, the guaranteed approximation can be achieved in provably less time than brute-force search. Performance studies indicate that DrusillaSelect can achieve comparable levels of approximation to other algorithms while giving up to an order of magnitude speedup. An implementation is available in the mlpack machine learning library (found at http://www.mlpack.org).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This is where the algorithm gets its name; the first author’s cat displays the same behavior when selecting a food bowl to eat from.

References

  1. Said, A., Kille, B., Jain, B.J., Albayrak, S.: Increasing diversity through furthest neighbor-based recommendation. In: Proceedings of the Fifth International Conference on Web Search and Data Mining (WSDM 2012), p. 12 (2012)

    Google Scholar 

  2. Said, A., Fields, B., Jain, B.J., Albayrak, S.: User-centric evaluation of a k-furthest neighbor collaborative filtering recommender algorithm. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, pp. 1399–1408. ACM (2013)

    Google Scholar 

  3. Vasiloglou, N., Gray, A.G., Anderson, D.V.: Scalable semidefinite manifold learning. In: Proceedings of the 2008 IEEE Workshop on Machine Learning for Signal Processing, 2008 (MLSP. 2008), pp. 368–373. IEEE (2008)

    Google Scholar 

  4. Defays, D.: An efficient algorithm for a complete link method. Comput. J. 20(4), 364–366 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  5. Schloss, P.D., Westcott, S.L., Ryabin, T., Hall, J.R., Hartmann, M., Hollister, E.B., Lesniewski, R.A., Oakley, B.B., Parks, D.H., Robinson, C.J., Sahl, J.W., Stres, B., Thallinger, G.G., Van Horn, D.J., Weber, C.F.: Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75(23), 7537–7541 (2009)

    Article  Google Scholar 

  6. Veenman, C.J., Reinders, M.J.T., Backer, E.: A maximum variance cluster algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 24(9), 1273–1280 (2002)

    Article  Google Scholar 

  7. Cheong, O., Shin, C.-S., Vigneron, A.: Computing farthest neighbors on a convex polytope. Theoret. Comput. Sci. 296(1), 47–58 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  8. Curtin, R.R., March, W.B., Ram, P., Anderson, D.V., Gray, A.G., Isbell Jr., C.L.: Tree-independent dual-tree algorithms. In: Proceedings of the 30th International Conference on Machine Learning (ICML 2013) (2013)

    Google Scholar 

  9. Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on \(p\)-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry (SoCG 2004), pp. 253–262. ACM (2004)

    Google Scholar 

  10. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing (STOC 1998), pp. 604–613. ACM (1998)

    Google Scholar 

  11. Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2006), pp. 459–468. IEEE (2006)

    Google Scholar 

  12. Pagh, R., Silvestri, F., Sivertsen, J., Skala, M.: Approximate furthest neighbor in high dimensions. In: Amato, G. (ed.) SISAP 2015. LNCS, vol. 9371, pp. 3–14. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25087-8_1

    Chapter  Google Scholar 

  13. Indyk, P.: Better algorithms for high-dimensional proximity problems via asymmetric embeddings. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2003), pp. 539–545. Society for Industrial and Applied Mathematics (2003)

    Google Scholar 

  14. Toussaint, G.T., Bhattacharya, B.K.: On geometric algorithms that use the furthest-point voronoi diagram. School of Computer Science, McGill University, Technical report No. 81.3 (1981)

    Google Scholar 

  15. Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proceedings of the 23rd International Conference on Machine Learning (ICML 2006), pp. 97–104. ACM (2006)

    Google Scholar 

  16. Curtin, R.R., Lee, D., March, W.B., Ram, P.: Plug-and-play dual-tree algorithm runtime analysis. J. Mach. Learn. Res. 16, 3269–3297 (2015)

    MathSciNet  MATH  Google Scholar 

  17. Curtin, R.R.: Faster dual-tree traversal for nearest neighbor search. In: Amato, G. (ed.) SISAP 2015. LNCS, vol. 9371, pp. 77–89. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25087-8_7

    Chapter  Google Scholar 

  18. Bespamyatnikh, S.: Dynamic algorithms for approximate neighbor searching. In: Proceedings of the 8th Canadian Conference on Computational Geometry (CCCG 1996), pp. 252–257 (1996)

    Google Scholar 

  19. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  20. Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching in fixed dimensions. J. ACM (JACM) 45(6), 891–923 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  21. Gionis, A., Indyk, P., Motwani, R., et al.: Similarity search in high dimensions via hashing. In: Proceedings of the Twenty-Fifth International Conference on Very Large Data Bases (VLDB 1999), vol. 99, pp. 518–529 (1999)

    Google Scholar 

  22. Gray, A.G., Moore, A.W.: N-Body problems in statistical learning. In: Advances in Neural Information Processing Systems 14 (NIPS 2001), vol. 4, pp. 521–527 (2001)

    Google Scholar 

  23. Lichman, M.: UCI machine learning repository, University of California Irvine, School of Information and Computer Sciences (2013). http://archive.ics.uci.edu/ml

  24. Radovanoić, M., Nanopoulos, A., Ivanović, C.: Hubs in space: popular nearest neighbors in high-dimensional data. J. Mach. Learn. Res. 11(Sep), 2487–2531 (2010)

    MathSciNet  MATH  Google Scholar 

  25. Tomasev, N., Radovanović, M., Mladenic, D., Ivanović, M.: The role of hubness in clustering high-dimensional data. IEEE Trans. Knowl. Data Eng. 26(3), 739–751 (2014)

    Article  Google Scholar 

  26. Curtin, R.R., Cline, J.R., Slagle, N.P., March, W.B., Ram, P., Mehta, N.A., Gray, A.G.: MLPACK: a scalable C++ machine learning library. J. Mach. Learn. Res. 14(1), 801–805 (2013)

    MathSciNet  MATH  Google Scholar 

  27. Curtin, R.R., Ram, P., Gray, A.G.: Fast exact max-kernel search. In: Proceedings of the 2013 SIAM International Conference on Data Mining (SDM 2013), pp. 1–9. SIAM (2013)

    Google Scholar 

  28. Curtin, R.R., Ram, P.: Dual-tree fast exact max-kernel search. Stat. Anal. Data Min. 7(4), 229–253 (2014)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ryan R. Curtin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Curtin, R.R., Gardner, A.B. (2016). Fast Approximate Furthest Neighbors with Data-Dependent Candidate Selection. In: Amsaleg, L., Houle, M., Schubert, E. (eds) Similarity Search and Applications. SISAP 2016. Lecture Notes in Computer Science(), vol 9939. Springer, Cham. https://doi.org/10.1007/978-3-319-46759-7_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46759-7_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46758-0

  • Online ISBN: 978-3-319-46759-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics