Cosine Approximate Nearest Neighbors

Anastasiu, David C.

doi:10.1007/978-3-658-19287-7_6

David C. Anastasiu⁴

4771 Accesses
1 Citations

Zusammenfassung

Kosinus-Ähnlichkeitsgraphenerstellung, oder All-Pairs-Ähnlichkeitssuche, ist ein wichtiger Systemkern vieler Methoden der Datengewinnung und des maschinellen Lernens. Die Graphenerstellung ist eine schwierige Aufgabe. Bis zu n2 Objektpaare sollten intuitiv verglichen werden, um das Problem für eine Reihe von n Objekten zu lösen. Für große Objektreihen wurden Näherungslösungen für dieses Problem vorgeschlagen, welche die Komplexität der Aufgabe thematisieren, indem die meisten, aber nicht unbedingt alle, nächsten Nachbarn abgefragt werden. Wir schlagen eine neue Näherungsgraphen-Erstellungsmethode vor, welche Eigenschaften der Objektvektoren kombiniert, um effektiv weniger Vergleichskandidaten auszuwählen, welche wahrscheinlich Nachbarn sind. Außerdem kombiniert unsere Methode Filterstrategien, welche vor kurzem entwickelt wurden, um Vergleichskandidaten, die nicht vielversprechend sind, schnell auszuschließen, was zu weniger allgemeinen Ähnlichkeitsberechnungen und erhöhter Effizienz führt. Wir vergleichen unsere Methode mit mehreren gängigen Annäherungs- und exakten Grundwerten von sechs Datensätzen aus der Praxis. Unsere Ergebnisse zeigen, dass unser Ansatz einen guten Kompromiss zwischen Effizienz und Effektivität darstellt, mit einer 35,81-fachen Effizienzsteigerung gegenüber der besten Alternative bei 0,9 Recall.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 29.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Literatur

[1] A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig, “Syntactic clustering of the web,” in Selected papers from the sixth international conference on World Wide Web. Essex, UK: Elsevier Science Publishers Ltd., 1997, pp. 1157–1166.
Google Scholar
[2] A. Metwally, D. Agrawal, and A. El Abbadi, “Detectives: Detecting coalition hit inflation attacks in advertising networks streams,” in Proceedings of the 16th International Conference on World Wide Web, ser. WWW ’07. New York, NY, USA: ACM, 2007, pp. 241–250.
Google Scholar
[3] R. J. Bayardo, Y. Ma, and R. Srikant, “Scaling up all pairs similarity search,” in Proceedings of the 16th International Conference on World Wide Web, ser. WWW ’07. New York, NY, USA: ACM, 2007, pp. 131–140.
Google Scholar
[4] G. Karypis, “Evaluation of item-based top-n recommendation algo-rithms,” in Proceedings of the Tenth International Conference on Information and Knowledge Management, ser. CIKM ’01. New York, NY USA: ACM, 2001, pp. 247–254.
Google Scholar
[5] S. Chaudhuri, V Ganti, and R. Kaushik, A primitive operator for sim- ilarity joins in data cleaning,” in Proceedings of the 22nd International Conference on Data Engineering, ser. ICDE ’06. Washington, DC, USA: IEEE Computer Society, 2006, pp. 5–.
Google Scholar
[6] D. C. Anastasiu and G. Karypis, “L2ap: Fast cosine similarity search with prefix l-2 norm bounds,” in 30th IEEE International Conference on Data Engineering, ser. ICDE ’14, 2014.
Google Scholar
[7] V Satuluri and S. Parthasarathy, “Bayesian locality sensitive hashing for fast similarity search,” Proc VLDB Endow , vol. 5, no. 5, pp. 430–441, Jan. 2012.
Google Scholar
[8] Y . Park, S. Park, S.-g. Lee, and W. Jung, “Greedy filtering: A scalable algorithm for k-nearest neighbor graph construction,” in Database Systems for Advanced Applications, ser. Lecture Notes in Computer Science. Springer Verlag, 2014, vol. 8421, pp. 327–341.
Google Scholar
[9] W. Dong, C. Moses, and K. Li, Efficient k-nearest neighbor graph construction for generic similarity measures,” in proceedings of the 20 ^th International Conference on World Wide Web, ser. WWW ’11. New York, NY, USA: ACM, 2011, pp. 577–586.
Google Scholar
[10] D. C. Anastasiu and G. Karypis, “L2knng: Fast exact k-nearest neighbor graph construction with l2-norm pruning,” in 24th ACM International Conference on Information and Knowledge Management, ser. CIKM ’15, 2015.
Google Scholar
[11] Y. Malkov, A. Ponomarenko, A. Logvinov, and V. Krylov, Approximate nearest neighbor algorithm based on avigable small world graphs,” Information Systems, vol. 45, pp. 61–68, 2014.
Google Scholar
[12] D. D. Lewis, Y. Yang, T. G. Rose, and F. Li, “Rcv1: A new benchmark collection for text categorization research,” J. Mach. Learn. Res., vol. 5 pp. 361–397, Dec. 2004.
Google Scholar
[13] H. Kwak, C. Lee, H. Park, and S. Moon, “What is Twitter, a social network or a news media?” in WWW ’10: Proceedings of the 19 ^th international Conference on World wide web. New York, NY, USA ACM, 2010, pp. 591–600.
Google Scholar
[14] A. Mislove, M. Marcon, K. P Gummadi, P Druschel, and B. Bhattacharjee, “Measurement and analysis of online social networks,” in Proc. Internet Measurement Conf , 2007.
Google Scholar
[15] J. Chen, H.-r. Fang, and Y. Saad, “Fast approximate knn graph construction for high dimensional data via recursive lanczos bisection,” J. Mach Learn. Res., vol. 10, pp. 1989–2012, Dec. 2009.
Google Scholar

Download references

Acknowledgment

This work was in part made possible due to computing facilities provided by the Digital Technology Center (DTC) and the Minnesota Supercomputing Institute (MSI) at the University of Minnesota. We thank the reviewers for their helpful comments

Author information

Authors and Affiliations

Department of Computer Engineering, San José State University, San José, CA, USA
David C. Anastasiu

Authors

David C. Anastasiu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David C. Anastasiu .

Editor information

Editors and Affiliations

Fachhochschule Salzburg, Puch/Salzburg, Austria
Peter Haber
its Informationstechnik & System-Management, Fachhochschule Salzburg, Puch/Salzburg, Austria
Thomas Lampoltshammer
Informationstechnik & System-Management, Fachhochschule Salzburg, Puch/Salzburg, Austria
Manfred Mayr

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Anastasiu, D.C. (2017). Cosine Approximate Nearest Neighbors. In: Haber, P., Lampoltshammer, T., Mayr, M. (eds) Data Science – Analytics and Applications. Springer Vieweg, Wiesbaden. https://doi.org/10.1007/978-3-658-19287-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-658-19287-7_6
Published: 16 September 2017
Publisher Name: Springer Vieweg, Wiesbaden
Print ISBN: 978-3-658-19286-0
Online ISBN: 978-3-658-19287-7
eBook Packages: Computer Science and Engineering (German Language)

Publish with us

Policies and ethics