Fast Approximate Furthest Neighbors with Data-Dependent Candidate Selection

Curtin, Ryan R.; Gardner, Andrew B.

doi:10.1007/978-3-319-46759-7_17

Ryan R. Curtin¹⁶ &
Andrew B. Gardner¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9939))

Included in the following conference series:

International Conference on Similarity Search and Applications

1185 Accesses
9 Citations

Abstract

We present a novel strategy for approximate furthest neighbor search that selects a candidate set using the data distribution. This strategy leads to an algorithm, which we call DrusillaSelect, that is able to outperform existing approximate furthest neighbor strategies. Our strategy is motivated by an empirical study of the behavior of the furthest neighbor search problem, which lends intuition for where our algorithm is most useful. We also present a variant of the algorithm that gives an absolute approximation guarantee; under some assumptions, the guaranteed approximation can be achieved in provably less time than brute-force search. Performance studies indicate that DrusillaSelect can achieve comparable levels of approximation to other algorithms while giving up to an order of magnitude speedup. An implementation is available in the mlpack machine learning library (found at http://www.mlpack.org).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This is where the algorithm gets its name; the first author’s cat displays the same behavior when selecting a food bowl to eat from.

References

Said, A., Kille, B., Jain, B.J., Albayrak, S.: Increasing diversity through furthest neighbor-based recommendation. In: Proceedings of the Fifth International Conference on Web Search and Data Mining (WSDM 2012), p. 12 (2012)
Google Scholar
Said, A., Fields, B., Jain, B.J., Albayrak, S.: User-centric evaluation of a k-furthest neighbor collaborative filtering recommender algorithm. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, pp. 1399–1408. ACM (2013)
Google Scholar
Vasiloglou, N., Gray, A.G., Anderson, D.V.: Scalable semidefinite manifold learning. In: Proceedings of the 2008 IEEE Workshop on Machine Learning for Signal Processing, 2008 (MLSP. 2008), pp. 368–373. IEEE (2008)
Google Scholar
Defays, D.: An efficient algorithm for a complete link method. Comput. J. 20(4), 364–366 (1977)
Article MathSciNet MATH Google Scholar
Schloss, P.D., Westcott, S.L., Ryabin, T., Hall, J.R., Hartmann, M., Hollister, E.B., Lesniewski, R.A., Oakley, B.B., Parks, D.H., Robinson, C.J., Sahl, J.W., Stres, B., Thallinger, G.G., Van Horn, D.J., Weber, C.F.: Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75(23), 7537–7541 (2009)
Article Google Scholar
Veenman, C.J., Reinders, M.J.T., Backer, E.: A maximum variance cluster algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 24(9), 1273–1280 (2002)
Article Google Scholar
Cheong, O., Shin, C.-S., Vigneron, A.: Computing farthest neighbors on a convex polytope. Theoret. Comput. Sci. 296(1), 47–58 (2003)
Article MathSciNet MATH Google Scholar
Curtin, R.R., March, W.B., Ram, P., Anderson, D.V., Gray, A.G., Isbell Jr., C.L.: Tree-independent dual-tree algorithms. In: Proceedings of the 30th International Conference on Machine Learning (ICML 2013) (2013)
Google Scholar
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on \(p\)-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry (SoCG 2004), pp. 253–262. ACM (2004)
Google Scholar
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing (STOC 1998), pp. 604–613. ACM (1998)
Google Scholar
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2006), pp. 459–468. IEEE (2006)
Google Scholar
Pagh, R., Silvestri, F., Sivertsen, J., Skala, M.: Approximate furthest neighbor in high dimensions. In: Amato, G. (ed.) SISAP 2015. LNCS, vol. 9371, pp. 3–14. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25087-8_1
Chapter Google Scholar
Indyk, P.: Better algorithms for high-dimensional proximity problems via asymmetric embeddings. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2003), pp. 539–545. Society for Industrial and Applied Mathematics (2003)
Google Scholar
Toussaint, G.T., Bhattacharya, B.K.: On geometric algorithms that use the furthest-point voronoi diagram. School of Computer Science, McGill University, Technical report No. 81.3 (1981)
Google Scholar
Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proceedings of the 23rd International Conference on Machine Learning (ICML 2006), pp. 97–104. ACM (2006)
Google Scholar
Curtin, R.R., Lee, D., March, W.B., Ram, P.: Plug-and-play dual-tree algorithm runtime analysis. J. Mach. Learn. Res. 16, 3269–3297 (2015)
MathSciNet MATH Google Scholar
Curtin, R.R.: Faster dual-tree traversal for nearest neighbor search. In: Amato, G. (ed.) SISAP 2015. LNCS, vol. 9371, pp. 77–89. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25087-8_7
Chapter Google Scholar
Bespamyatnikh, S.: Dynamic algorithms for approximate neighbor searching. In: Proceedings of the 8th Canadian Conference on Computational Geometry (CCCG 1996), pp. 252–257 (1996)
Google Scholar
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Article MathSciNet MATH Google Scholar
Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching in fixed dimensions. J. ACM (JACM) 45(6), 891–923 (1998)
Article MathSciNet MATH Google Scholar
Gionis, A., Indyk, P., Motwani, R., et al.: Similarity search in high dimensions via hashing. In: Proceedings of the Twenty-Fifth International Conference on Very Large Data Bases (VLDB 1999), vol. 99, pp. 518–529 (1999)
Google Scholar
Gray, A.G., Moore, A.W.: N-Body problems in statistical learning. In: Advances in Neural Information Processing Systems 14 (NIPS 2001), vol. 4, pp. 521–527 (2001)
Google Scholar
Lichman, M.: UCI machine learning repository, University of California Irvine, School of Information and Computer Sciences (2013). http://archive.ics.uci.edu/ml
Radovanoić, M., Nanopoulos, A., Ivanović, C.: Hubs in space: popular nearest neighbors in high-dimensional data. J. Mach. Learn. Res. 11(Sep), 2487–2531 (2010)
MathSciNet MATH Google Scholar
Tomasev, N., Radovanović, M., Mladenic, D., Ivanović, M.: The role of hubness in clustering high-dimensional data. IEEE Trans. Knowl. Data Eng. 26(3), 739–751 (2014)
Article Google Scholar
Curtin, R.R., Cline, J.R., Slagle, N.P., March, W.B., Ram, P., Mehta, N.A., Gray, A.G.: MLPACK: a scalable C++ machine learning library. J. Mach. Learn. Res. 14(1), 801–805 (2013)
MathSciNet MATH Google Scholar
Curtin, R.R., Ram, P., Gray, A.G.: Fast exact max-kernel search. In: Proceedings of the 2013 SIAM International Conference on Data Mining (SDM 2013), pp. 1–9. SIAM (2013)
Google Scholar
Curtin, R.R., Ram, P.: Dual-tree fast exact max-kernel search. Stat. Anal. Data Min. 7(4), 229–253 (2014)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Center for Advanced Machine Learning, Symantec Corporation, Atlanta, Georgia, 30338, USA
Ryan R. Curtin & Andrew B. Gardner

Authors

Ryan R. Curtin
View author publications
You can also search for this author in PubMed Google Scholar
Andrew B. Gardner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ryan R. Curtin .

Editor information

Editors and Affiliations

CNRS–IRISA , Rennes, France
Laurent Amsaleg
National Institute of Informatics , Tokyo, Japan
Michael E. Houle
Ludwig-Maximilians-Universität München , München, Germany
Erich Schubert

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Curtin, R.R., Gardner, A.B. (2016). Fast Approximate Furthest Neighbors with Data-Dependent Candidate Selection. In: Amsaleg, L., Houle, M., Schubert, E. (eds) Similarity Search and Applications. SISAP 2016. Lecture Notes in Computer Science(), vol 9939. Springer, Cham. https://doi.org/10.1007/978-3-319-46759-7_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-46759-7_17
Published: 27 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46758-0
Online ISBN: 978-3-319-46759-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics