NearBucket-LSH: Efficient Similarity Search in P2P Networks

  • Naama KrausEmail author
  • David Carmel
  • Idit Keidar
  • Meni Orenbach
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9939)


We present NearBucket-LSH, an effective algorithm for similarity search in large-scale distributed online social networks organized as peer-to-peer overlays. As communication is a dominant consideration in distributed systems, we focus on minimizing the network cost while guaranteeing good search quality. Our algorithm is based on Locality Sensitive Hashing (LSH), which limits the search to collections of objects, called buckets, that have a high probability to be similar to the query. More specifically, NearBucket-LSH employs an LSH extension that searches in near buckets, and improves search quality but also significantly increases the network cost. We decrease the network cost by considering the internals of both LSH and the P2P overlay, and harnessing their properties to our needs. We show that our NearBucket-LSH increases search quality for a given network cost compared to previous art. In many cases, the search quality increases by more than \(50\,\%\).


Hash Function Success Probability Online Social Network Cosine Similarity Network Cost 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



Naama Kraus is grateful to the Hasso-Plattner-Institut (HPI) for the scholarship for doctoral studies.


  1. 1.
    Adamic, L.A., Adar, E.: Friends and neighbors on the web. Soc. Netw. 25, 211–230 (2001)CrossRefGoogle Scholar
  2. 2.
    Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734–749 (2005)CrossRefGoogle Scholar
  3. 3.
    Anderson, A., Huttenlocher, D., Kleinberg, J., Leskovec, J.: Effects of user similarity in social media. WSDM 2012, pp. 703–712 (2012)Google Scholar
  4. 4.
    Bahmani, B., Goel, A., Shinde, R.: Efficient distributed locality sensitive hashing. In: CIKM 2012, pp. 2174–2178 (2012)Google Scholar
  5. 5.
    Batko, M., Novak, D., Falchi, F., Zezula, P.: Scalability comparison of peer-to-peer similarity search structures. Future Gener. Comp. Syst 24(8), 834–848 (2008)CrossRefGoogle Scholar
  6. 6.
    Buchegger, S., Schiöberg, D., Vu, L.H., Datta, A.: PeerSoN: P2P social networking - early experiences and insights. In: SNS 2009, pp. 46–52, 31 March 2009Google Scholar
  7. 7.
    Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: STOC 2002, pp. 380–388 (2002)Google Scholar
  8. 8.
    Chierichetti, F., Kumar, R.: LSH-preserving functions and their applications. In: SODA 2012, pp. 1078–1094 (2012)Google Scholar
  9. 9.
    Cutillo, L.A., Molva, R., Önen, M., Safebook: a distributed privacy preserving online social network. In: WOWMOM, pp. 1–3 (2011)Google Scholar
  10. 10.
    Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: SCG 2004, pp. 253–262 (2004)Google Scholar
  11. 11.
  12. 12.
    Falchi, F., Gennaro, C., Zezula, P.: A content–addressable network for similarity search in metric spaces. In: Moro, G., Bergamaschi, S., Joseph, S., Morin, J.-H., Ouksel, A.M. (eds.) DBISP2P 2005-2006. LNCS, vol. 4125, pp. 98–110. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-71661-7_9 CrossRefGoogle Scholar
  13. 13.
  14. 14.
    Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB 1999, pp. 518–529 (1999)Google Scholar
  15. 15.
    Haghani, P., Michel, S., Aberer, K.: Distributed similarity search in high dimensions using locality sensitive hashing. In EDBT 2009, pp. 744–755 (2009)Google Scholar
  16. 16.
    Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC 1998, pp. 604–613 (1998)Google Scholar
  17. 17.
  18. 18.
    Lua, E.K., Crowcroft, J., Pias, M., Sharma, R., Lim, S.: A survey and comparison of peer-to-peer overlay network schemes. IEEE Commun. Surv. Tutorials 7, 72–93 (2005)CrossRefGoogle Scholar
  19. 19.
  20. 20.
    Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: VLDB 2007, pp. 950–961 (2007)Google Scholar
  21. 21.
    Mani, M., Nguyen, A.-M., Crespi, N.: Scope: a prototype for spontaneous P2P social networking. In: PerCom Workshops, pp. 220–225 (2010)Google Scholar
  22. 22.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRefzbMATHGoogle Scholar
  23. 23.
    McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: homophily in social networks. Ann. Rev. Sociol. 27, 415–444 (2001)CrossRefGoogle Scholar
  24. 24.
    Narendula, R., Papaioannou, T.G., Aberer, K.: Towards the realization of decentralized online social networks: an empirical study. In: ICDCS Workshops, pp. 155–162 (2012)Google Scholar
  25. 25.
    Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable content-addressable network. In: SIGCOMM 2001, pp. 161–172, New York, NY, USA (2001)Google Scholar
  26. 26.
    Sundaram, N., Turmukhametova, A., Satish, N., Mostak, T., Indyk, P., Madden, S., Dubey, P.: Streaming similarity search over one billion tweets using parallel locality-sensitive hashing. Proc. VLDB Endow. 6(14), 1930–1941 (2013)CrossRefGoogle Scholar
  27. 27.
  28. 28.
    Xiang, R., Neville, J., Rogati, M.: Modeling relationship strength in online social networks. In: WWW 2010, pp. 981–990 (2010)Google Scholar
  29. 29.
    Yang, J., Leskovec, J.: Defining, evaluating network communities based on ground-truth. In: MDS 2012, pp. 3: 1–3: 8 (2012)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Naama Kraus
    • 1
    Email author
  • David Carmel
    • 2
  • Idit Keidar
    • 1
    • 2
  • Meni Orenbach
    • 1
  1. 1.Viterbi EE TechnionHaifaIsrael
  2. 2.Yahoo ResearchHaifaIsrael

Personalised recommendations