Skip to main content

Cosine Approximate Nearest Neighbors

  • Conference paper
  • First Online:
Data Science – Analytics and Applications

Zusammenfassung

Kosinus-Ähnlichkeitsgraphenerstellung, oder All-Pairs-Ähnlichkeitssuche, ist ein wichtiger Systemkern vieler Methoden der Datengewinnung und des maschinellen Lernens. Die Graphenerstellung ist eine schwierige Aufgabe. Bis zu n2 Objektpaare sollten intuitiv verglichen werden, um das Problem für eine Reihe von n Objekten zu lösen. Für große Objektreihen wurden Näherungslösungen für dieses Problem vorgeschlagen, welche die Komplexität der Aufgabe thematisieren, indem die meisten, aber nicht unbedingt alle, nächsten Nachbarn abgefragt werden. Wir schlagen eine neue Näherungsgraphen-Erstellungsmethode vor, welche Eigenschaften der Objektvektoren kombiniert, um effektiv weniger Vergleichskandidaten auszuwählen, welche wahrscheinlich Nachbarn sind. Außerdem kombiniert unsere Methode Filterstrategien, welche vor kurzem entwickelt wurden, um Vergleichskandidaten, die nicht vielversprechend sind, schnell auszuschließen, was zu weniger allgemeinen Ähnlichkeitsberechnungen und erhöhter Effizienz führt. Wir vergleichen unsere Methode mit mehreren gängigen Annäherungs- und exakten Grundwerten von sechs Datensätzen aus der Praxis. Unsere Ergebnisse zeigen, dass unser Ansatz einen guten Kompromiss zwischen Effizienz und Effektivität darstellt, mit einer 35,81-fachen Effizienzsteigerung gegenüber der besten Alternative bei 0,9 Recall.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 29.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Literatur

  • [1] A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig, “Syntactic clustering of the web,” in Selected papers from the sixth international conference on World Wide Web. Essex, UK: Elsevier Science Publishers Ltd., 1997, pp. 1157–1166.

    Google Scholar 

  • [2] A. Metwally, D. Agrawal, and A. El Abbadi, “Detectives: Detecting coalition hit inflation attacks in advertising networks streams,” in Proceedings of the 16th International Conference on World Wide Web, ser. WWW ’07. New York, NY, USA: ACM, 2007, pp. 241–250.

    Google Scholar 

  • [3] R. J. Bayardo, Y. Ma, and R. Srikant, “Scaling up all pairs similarity search,” in Proceedings of the 16th International Conference on World Wide Web, ser. WWW ’07. New York, NY, USA: ACM, 2007, pp. 131–140.

    Google Scholar 

  • [4] G. Karypis, “Evaluation of item-based top-n recommendation algo-rithms,” in Proceedings of the Tenth International Conference on Information and Knowledge Management, ser. CIKM ’01. New York, NY USA: ACM, 2001, pp. 247–254.

    Google Scholar 

  • [5] S. Chaudhuri, V Ganti, and R. Kaushik, A primitive operator for sim- ilarity joins in data cleaning,” in Proceedings of the 22nd International Conference on Data Engineering, ser. ICDE ’06. Washington, DC, USA: IEEE Computer Society, 2006, pp. 5–.

    Google Scholar 

  • [6] D. C. Anastasiu and G. Karypis, “L2ap: Fast cosine similarity search with prefix l-2 norm bounds,” in 30th IEEE International Conference on Data Engineering, ser. ICDE ’14, 2014.

    Google Scholar 

  • [7] V Satuluri and S. Parthasarathy, “Bayesian locality sensitive hashing for fast similarity search,” Proc VLDB Endow , vol. 5, no. 5, pp. 430–441, Jan. 2012.

    Google Scholar 

  • [8] Y . Park, S. Park, S.-g. Lee, and W. Jung, “Greedy filtering: A scalable algorithm for k-nearest neighbor graph construction,” in Database Systems for Advanced Applications, ser. Lecture Notes in Computer Science. Springer Verlag, 2014, vol. 8421, pp. 327–341.

    Google Scholar 

  • [9] W. Dong, C. Moses, and K. Li, Efficient k-nearest neighbor graph construction for generic similarity measures,” in proceedings of the 20 th International Conference on World Wide Web, ser. WWW ’11. New York, NY, USA: ACM, 2011, pp. 577–586.

    Google Scholar 

  • [10] D. C. Anastasiu and G. Karypis, “L2knng: Fast exact k-nearest neighbor graph construction with l2-norm pruning,” in 24th ACM International Conference on Information and Knowledge Management, ser. CIKM ’15, 2015.

    Google Scholar 

  • [11] Y. Malkov, A. Ponomarenko, A. Logvinov, and V. Krylov, Approximate nearest neighbor algorithm based on avigable small world graphs,” Information Systems, vol. 45, pp. 61–68, 2014.

    Google Scholar 

  • [12] D. D. Lewis, Y. Yang, T. G. Rose, and F. Li, “Rcv1: A new benchmark collection for text categorization research,” J. Mach. Learn. Res., vol. 5 pp. 361–397, Dec. 2004.

    Google Scholar 

  • [13] H. Kwak, C. Lee, H. Park, and S. Moon, “What is Twitter, a social network or a news media?” in WWW ’10: Proceedings of the 19 th international Conference on World wide web. New York, NY, USA ACM, 2010, pp. 591–600.

    Google Scholar 

  • [14] A. Mislove, M. Marcon, K. P Gummadi, P Druschel, and B. Bhattacharjee, “Measurement and analysis of online social networks,” in Proc. Internet Measurement Conf , 2007.

    Google Scholar 

  • [15] J. Chen, H.-r. Fang, and Y. Saad, “Fast approximate knn graph construction for high dimensional data via recursive lanczos bisection,” J. Mach Learn. Res., vol. 10, pp. 1989–2012, Dec. 2009.

    Google Scholar 

Download references

Acknowledgment

This work was in part made possible due to computing facilities provided by the Digital Technology Center (DTC) and the Minnesota Supercomputing Institute (MSI) at the University of Minnesota. We thank the reviewers for their helpful comments

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David C. Anastasiu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Fachmedien Wiesbaden GmbH

About this paper

Cite this paper

Anastasiu, D.C. (2017). Cosine Approximate Nearest Neighbors. In: Haber, P., Lampoltshammer, T., Mayr, M. (eds) Data Science – Analytics and Applications. Springer Vieweg, Wiesbaden. https://doi.org/10.1007/978-3-658-19287-7_6

Download citation

Publish with us

Policies and ethics