Skip to main content

Nearest Neighbors Graph Construction: Peer Sampling to the Rescue

  • Conference paper
  • First Online:
Networked Systems (NETYS 2016)

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 9944))

Included in the following conference series:

Abstract

In this paper, we propose an efficient KNN service, called KPS (KNN-Peer-Sampling). The KPS service can be used in various contexts e.g. recommendation systems, information retrieval and data mining. KPS borrows concepts from P2P gossip-based clustering protocols to provide a localized and efficient KNN computation in large-scale systems. KPS is a sampling-based iterative approach, combining randomness, to provide serendipity and avoid local minimum, and clustering, to ensure fast convergence. We compare KPS against the state of the art KNN centralized computation algorithm NNDescent, on multiple datasets. The experiments confirm the efficiency of KPS over NNDescent: KPS improves significantly on the computational cost while converging quickly to a close to optimal KNN graph. For instance, the cost, expressed in number of pairwise similarity computations, is reduced by \(\approx \)23 % and \(\approx \)49 % to construct high quality KNN graphs for Jester and MovieLens datasets, respectively. In addition, the randomized nature of KPS ensures eventual convergence, not always achieved with NNDescent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that the strength of KPS is to achieve good results with less information so this way of comparing is not in our favor.

References

  1. Jester dataset. http://grouplens.org/datasets/jester/

  2. Jester joke recommender. http://shadow.ieor.berkeley.edu/humor/

  3. Movielens dataset. http://grouplens.org/datasets/movielens/

  4. Agrawal, D., Das, S., El Abbadi, A.: Big data, cloud computing: current state and future opportunities. In: Proceedings of the 14th International Conference on Extending Database Technology, EDBT/ICDT 2011, pp. 530–533. ACM (2011)

    Google Scholar 

  5. Amato, G., Falchi, F.: KNN based image classification relying on local feature similarity. In: Proceedings of the Third International Conference on SImilarity Search and APplications, SISAP 2010, pp. 101–108. ACM (2010)

    Google Scholar 

  6. Bai, X., Guerraoui, R., Kermarrec, A.-M., Leroy, V.: Collaborative personalized top-k processing. ACM Trans. Database Syst. 36(4), 26 (2011)

    Article  Google Scholar 

  7. Bertier, M., Frey, D., Guerraoui, R., Kermarrec, A.-M., Leroy, V.: The gossple anonymous social network. In: Gupta, I., Mascolo, C. (eds.) Middleware 2010. LNCS, vol. 6452, pp. 191–211. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  8. Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8 (2008)

    Google Scholar 

  9. Boutet, A., Frey, D., Guerraoui, R., Jegou, A., Kermarrec, A.-M.: WhatsUp: a decentralized instant news recommender. In: Proceedings of the 27th IEEE International Symposium on Parallel Distributed Processing, IPDPS 2013, pp. 741–752 (2013)

    Google Scholar 

  10. Chen, J., Fang, H.-R., Saad, Y.: Fast approximate KNN graph construction for high dimensional data via recursive Lanczos bisection. J. Mach. Learn. Res. 10, 1989–2012 (2009)

    MathSciNet  MATH  Google Scholar 

  11. Connor, M., Kumar, P.: Fast construction of k-nearest neighbor graphs for point clouds. IEEE Trans. Vis. Comput. Graph. 16(4), 599–608 (2010)

    Article  Google Scholar 

  12. Desrosiers, C., Karypis, G.: A comprehensive survey of neighborhood-based recommendation methods. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. (eds.) Recommender Systems Handbook, pp. 107–144. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  13. Dong, W., Moses, C., Li, K.: Efficient k-nearest neighbor graph construction for generic similarity measures. In: Proceedings of the 20th International Conference on World Wide Web, WWW 2011, pp. 577–586. ACM (2011)

    Google Scholar 

  14. Giakkoupis, G., Kermarrec, A.-M., Woelfel, P.: Gossip protocols for renaming and sorting. In: Afek, Y. (ed.) DISC 2013. LNCS, vol. 8205, pp. 194–208. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  15. Guo, G., Wang, H., Bell, D.J., Bi, Y., Greer, K.: KNN model-based approach in classification. In: Meersman, R., Schmidt, D.C. (eds.) CoopIS 2003, DOA 2003, and ODBASE 2003. LNCS, vol. 2888, pp. 986–996. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  16. Hajebi, K., Abbasi-Yadkori, Y., Shahbazi, H., Zhang, H.: Fast approximate nearest-neighbor search with k-nearest neighbor graph. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, IJCAI 2011, pp. 1312–1317. AAAI Press (2011)

    Google Scholar 

  17. Huiskes, M.J., Lew, M.S.: The MIR flickr retrieval evaluation. In: Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, MIR 2008, pp. 39–43. ACM (2008)

    Google Scholar 

  18. Jelasity, M., Montresor, A., Babaoglu, O.: T-Man: gossip-based fast overlay topology construction. Comput. Netw.: Int. J. Comput. Telecommun. Netw. 53(13), 2321–2339 (2009)

    Article  MATH  Google Scholar 

  19. Jelasity, M., Voulgaris, S., Guerraoui, R., Kermarrec, A.-M., van Steen, M.: Gossip-based peer sampling. ACM Trans. Comput. Syst. 25(3) (2007)

    Google Scholar 

  20. Olman, V., Mao, F., Wu, H., Xu, Y.: Parallel clustering algorithm for large data sets with applications in bioinformatics. The IEEE/ACM Trans. Comput. Biol. Bioinform. 6(2), 344–352 (2009)

    Article  Google Scholar 

  21. Ormándi, R., Hegedűs, I., Jelasity, M.: Overlay management for fully distributed user-based collaborative filtering. In: D’Ambra, P., Guarracino, M., Talia, D. (eds.) Euro-Par 2010, Part I. LNCS, vol. 6271, pp. 446–457. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  22. Pan, R., Dolog, P., Xu, G.: KNN-based clustering for improving social recommender systems. In: Cao, L., Zeng, Y., Symeonidis, A.L., Gorodetsky, V.I., Yu, P.S., Singh, M.P. (eds.) ADMI. LNCS, vol. 7607, pp. 115–125. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  23. Paredes, R., Chávez, E., Figueroa, K., Navarro, G.: Practical construction of k-nearest neighbor graphs in metric spaces. In: Àlvarez, C., Serna, M. (eds.) WEA 2006. LNCS, vol. 4007, pp. 85–97. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  24. Voulgaris, S., van Steen, M.: VICINITY: a pinch of randomness brings out the structure. In: Eyers, D., Schwan, K. (eds.) Middleware 2013. LNCS, vol. 8275, pp. 21–40. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  25. Wang, J., Wang, J., Zeng, G., Tu, Z., Gan, R., Li, S.: Scalable k-NN graph construction for visual descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012, pp. 1106–1113 (2012)

    Google Scholar 

  26. Zhong, R., Li, G., Tan, K.-L., Zhou, L.: G-tree: an efficient index for knn search on road networks. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013, pp. 39–48. ACM (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yahya Benkaouz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Benkaouz, Y., Erradi, M., Kermarrec, AM. (2016). Nearest Neighbors Graph Construction: Peer Sampling to the Rescue. In: Abdulla, P., Delporte-Gallet, C. (eds) Networked Systems. NETYS 2016. Lecture Notes in Computer Science(), vol 9944. Springer, Cham. https://doi.org/10.1007/978-3-319-46140-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46140-3_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46139-7

  • Online ISBN: 978-3-319-46140-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics