Skip to main content

Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11440))

Included in the following conference series:

Abstract

Approximate nearest neighbor algorithms are used to speed up nearest neighbor search in a wide array of applications. However, current indexing methods feature several hyperparameters that need to be tuned to reach an acceptable accuracy–speed trade-off. A grid search in the parameter space is often impractically slow due to a time-consuming index-building procedure. Therefore, we propose an algorithm for automatically tuning the hyperparameters of indexing methods based on randomized space-partitioning trees. In particular, we present results using randomized k-d trees, random projection trees and randomized PCA trees. The tuning algorithm adds minimal overhead to the index-building process but is able to find the optimal hyperparameters accurately. We demonstrate that the algorithm is significantly faster than existing approaches, and that the indexing methods used are competitive with the state-of-the-art methods in query time while being faster to build.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The gradient descent consistently converges with the learning rate \(\gamma =0.01\) in all our experiments; we did not observe further tuning of the learning rate to be necessary.

  2. 2.

    https://github.com/erikbern/ann-benchmarks.

  3. 3.

    https://github.com/vioshyvo/mrpt.

References

  1. Bilen, H., Pedersoli, M., Tuytelaars, T.: Weakly supervised object detection with convex clustering. In: CVPR, pp. 1081–1089. IEEE (2015)

    Google Scholar 

  2. Boytsov, L., Naidan, B.: Engineering efficient and effective non-metric space library. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds.) SISAP 2013. LNCS, vol. 8199, pp. 280–293. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41062-8_28

    Chapter  Google Scholar 

  3. Dasgupta, S., Sinha, K.: Randomized partition trees for nearest neighbor search. Algorithmica 72(1), 237–263 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  4. Dong, W., Wang, Z., Josephson, W., Charikar, M., Li, K.: Modeling LSH for performance tuning. In: CIKM, pp. 669–678. ACM (2008)

    Google Scholar 

  5. Hassan, H., Elaraby, M., Tawfik, A.Y.: Synthetic data for neural machine translation of spoken-dialects. Small 16, 17–33 (2017)

    Google Scholar 

  6. Hyvönen, V., et al.: Fast nearest neighbor search through sparse random projections and voting. In: IEEE International Conference on Big Data, pp. 881–888 (2016)

    Google Scholar 

  7. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC, pp. 604–613. ACM (1998)

    Google Scholar 

  8. Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. TPAMI 33(1), 117–128 (2011)

    Article  Google Scholar 

  9. Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. arXiv preprint arXiv:1702.08734 (2017)

  10. Malkov, Y.A., Yashunin, D.A.: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. arXiv:1603.09320 (2016)

  11. McBryde, C.R.: Spacecraft visual navigation using appearance matching and multi-spectral sensor fusion. Ph.D. thesis, Georgia Institute of Technology (2018)

    Google Scholar 

  12. McCartin-Lim, M., McGregor, A., Wang, R.: Approximate principal direction trees. In: ICML, pp. 1611–1618 (2012)

    Google Scholar 

  13. Muja, M., Lowe, D.G.: Scalable nearest neighbor algorithms for high dimensional data. TPAMI 36(11), 2227–2240 (2014)

    Article  Google Scholar 

  14. Silpa-Anan, C., Hartley, R.: Optimised KD-trees for fast image descriptor matching. In: CVPR, pp. 1–8. IEEE (2008)

    Google Scholar 

  15. Theil, H.: A rank-invariant method of linear and polynomial regression analysis. In: Henri Theil’s Contributions to Economics and Econometrics, pp. 345–381 (1992)

    Google Scholar 

  16. Verma, N., Kpotufe, S., Dasgupta, S.: Which spatial partition trees are adaptive to intrinsic dimension? In: UAI, pp. 565–574. AUAI Press (2009)

    Google Scholar 

  17. Wang, L., Tasoulis, S., Roos, T., Kangasharju, J.: Kvasir: scalable provision of semantically relevant web content on big data framework. IEEE Trans. Big Data 2(3), 219–233 (2016)

    Article  Google Scholar 

  18. Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: SODA, vol. 93, pp. 311–321 (1993)

    Google Scholar 

Download references

Acknowledgments

This project was supported by Business Finland (project 3662/31/2018 Advanced Machine Learning for Industrial Applications) and the Academy of Finland (project 311277 TensorML).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ville Hyvönen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jääsaari, E., Hyvönen, V., Roos, T. (2019). Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search. In: Yang, Q., Zhou, ZH., Gong, Z., Zhang, ML., Huang, SJ. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11440. Springer, Cham. https://doi.org/10.1007/978-3-030-16145-3_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-16145-3_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-16144-6

  • Online ISBN: 978-3-030-16145-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics