Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search

Jääsaari, Elias; Hyvönen, Ville; Roos, Teemu

doi:10.1007/978-3-030-16145-3_46

Elias Jääsaari¹⁹,
Ville Hyvönen^20,21 &
Teemu Roos^20,21

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11440))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2300 Accesses
5 Citations
1 Altmetric

Abstract

Approximate nearest neighbor algorithms are used to speed up nearest neighbor search in a wide array of applications. However, current indexing methods feature several hyperparameters that need to be tuned to reach an acceptable accuracy–speed trade-off. A grid search in the parameter space is often impractically slow due to a time-consuming index-building procedure. Therefore, we propose an algorithm for automatically tuning the hyperparameters of indexing methods based on randomized space-partitioning trees. In particular, we present results using randomized k-d trees, random projection trees and randomized PCA trees. The tuning algorithm adds minimal overhead to the index-building process but is able to find the optimal hyperparameters accurately. We demonstrate that the algorithm is significantly faster than existing approaches, and that the indexing methods used are competitive with the state-of-the-art methods in query time while being faster to build.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The gradient descent consistently converges with the learning rate \(\gamma =0.01\) in all our experiments; we did not observe further tuning of the learning rate to be necessary.
2.
https://github.com/erikbern/ann-benchmarks.
3.
https://github.com/vioshyvo/mrpt.

References

Bilen, H., Pedersoli, M., Tuytelaars, T.: Weakly supervised object detection with convex clustering. In: CVPR, pp. 1081–1089. IEEE (2015)
Google Scholar
Boytsov, L., Naidan, B.: Engineering efficient and effective non-metric space library. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds.) SISAP 2013. LNCS, vol. 8199, pp. 280–293. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41062-8_28
Chapter Google Scholar
Dasgupta, S., Sinha, K.: Randomized partition trees for nearest neighbor search. Algorithmica 72(1), 237–263 (2015)
Article MathSciNet MATH Google Scholar
Dong, W., Wang, Z., Josephson, W., Charikar, M., Li, K.: Modeling LSH for performance tuning. In: CIKM, pp. 669–678. ACM (2008)
Google Scholar
Hassan, H., Elaraby, M., Tawfik, A.Y.: Synthetic data for neural machine translation of spoken-dialects. Small 16, 17–33 (2017)
Google Scholar
Hyvönen, V., et al.: Fast nearest neighbor search through sparse random projections and voting. In: IEEE International Conference on Big Data, pp. 881–888 (2016)
Google Scholar
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC, pp. 604–613. ACM (1998)
Google Scholar
Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. TPAMI 33(1), 117–128 (2011)
Article Google Scholar
Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. arXiv preprint arXiv:1702.08734 (2017)
Malkov, Y.A., Yashunin, D.A.: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. arXiv:1603.09320 (2016)
McBryde, C.R.: Spacecraft visual navigation using appearance matching and multi-spectral sensor fusion. Ph.D. thesis, Georgia Institute of Technology (2018)
Google Scholar
McCartin-Lim, M., McGregor, A., Wang, R.: Approximate principal direction trees. In: ICML, pp. 1611–1618 (2012)
Google Scholar
Muja, M., Lowe, D.G.: Scalable nearest neighbor algorithms for high dimensional data. TPAMI 36(11), 2227–2240 (2014)
Article Google Scholar
Silpa-Anan, C., Hartley, R.: Optimised KD-trees for fast image descriptor matching. In: CVPR, pp. 1–8. IEEE (2008)
Google Scholar
Theil, H.: A rank-invariant method of linear and polynomial regression analysis. In: Henri Theil’s Contributions to Economics and Econometrics, pp. 345–381 (1992)
Google Scholar
Verma, N., Kpotufe, S., Dasgupta, S.: Which spatial partition trees are adaptive to intrinsic dimension? In: UAI, pp. 565–574. AUAI Press (2009)
Google Scholar
Wang, L., Tasoulis, S., Roos, T., Kangasharju, J.: Kvasir: scalable provision of semantically relevant web content on big data framework. IEEE Trans. Big Data 2(3), 219–233 (2016)
Article Google Scholar
Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: SODA, vol. 93, pp. 311–321 (1993)
Google Scholar

Download references

Acknowledgments

This project was supported by Business Finland (project 3662/31/2018 Advanced Machine Learning for Industrial Applications) and the Academy of Finland (project 311277 TensorML).

Author information

Authors and Affiliations

Kvasir Ltd., Cambridge, England
Elias Jääsaari
Department of Computer Science, University of Helsinki, Helsinki, Finland
Ville Hyvönen & Teemu Roos
Helsinki Institute for Information Technology (HIIT), Helsinki, Finland
Ville Hyvönen & Teemu Roos

Authors

Elias Jääsaari
View author publications
You can also search for this author in PubMed Google Scholar
Ville Hyvönen
View author publications
You can also search for this author in PubMed Google Scholar
Teemu Roos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ville Hyvönen .

Editor information

Editors and Affiliations

Hong Kong University of Science and Technology, Hong Kong, China
Qiang Yang
Nanjing University, Nanjing, China
Zhi-Hua Zhou
University of Macau, Taipa, Macau, China
Zhiguo Gong
Southeast University, Nanjing, China
Min-Ling Zhang
Nanjing University of Aeronautics and Astronautics, Nanjing, China
Sheng-Jun Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jääsaari, E., Hyvönen, V., Roos, T. (2019). Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search. In: Yang, Q., Zhou, ZH., Gong, Z., Zhang, ML., Huang, SJ. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11440. Springer, Cham. https://doi.org/10.1007/978-3-030-16145-3_46

Download citation

DOI: https://doi.org/10.1007/978-3-030-16145-3_46
Published: 22 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16144-6
Online ISBN: 978-3-030-16145-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics