Advertisement

Peer-to-Peer Web Search: Euphoria, Achievements, Disillusionment, and Future Opportunities

  • Gerhard Weikum
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6462)

Abstract

The peer-to-peer (P2P) computing paradigm has been very successful like file sharing in Internet-wide communities (e.g., Gnutella, BitTorrent) or IP telephony (e.g., Skype). P2P systems promise perfect scalability from few peers to many millions, and resilience to failures, dynamic variability, and even misbehaving peers with egoistic or even malicious behavior. None of these salient properties requires any global planning, administration, or control; so P2P systems are completely self-organizing.

Web search seems to be a perfect match for P2P architectures. The Web has naturally distributed data, spread across the entire Internet, as opposed to artifically hosting all content by a centralized search engine. For user-provided contents in Web 2.0 communities, consideration of the content ownership, the autonomy of users, and the individualized control of privacy would also suggest decentralized solutions with many peers. Using the combined power and knowledge of millions of users and their computers could offer a more informative and pluralistic view of the world’s information. A P2P search engine could benefit from the intellectual input – bookmarks, queries, clicks – of a large user community, without undue risks about privacy or censorship, because users can gather logs on their own computers and control further sharing and aggregation by their individual policies. These potential benefits have motivated a wealth of exciting research on algorithms and systems for P2P Web search. This paper gives a brief overview on the last decade’s research achievements along these lines.

Despite all these intriguing promises and notwithstanding the impressive success of simpler file-sharing applications, P2P approaches to Web search or Web 2.0 services did not make a significant impact on the practical deployment side. The wave of P2P euphoria in academic research was followed by a phase of disillusionment about the lack of business models and user incentives. This paper discusses these shortcomings, and points out new opportunities for the P2P paradigm to play a more successful role in future Web applications.

Keywords

Overlay Network Distribute Hash Table Inverted List Semantic Overlay Network Unstructured Overlay 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Anand, A., Bedathur, S.J., Berberich, K., Schenkel, R., Tryfonopoulos, C.: EverLast: a distributed architecture for preserving the web. In: JCDL 2009, pp. 331–340 (2009)Google Scholar
  2. 2.
    Baeza-Yates, R.A., Castillo, C., Junqueira, F., Plachouras, V., Silvestri, F.: Challenges on Distributed Web Retrieval. In: ICDE 2007, pp. 6–20 (2007)Google Scholar
  3. 3.
    Balke, W.-T., Nejdl, W., Siberski, W., Thaden, U.: Progressive Distributed Top k Retrieval in Peer-to-Peer Networks. In: ICDE 2005, pp. 174–185 (2005)Google Scholar
  4. 4.
    Barroso, L.A., Dean, J., Hlzle, U.: Web Search for a Planet: The Google Cluster Architecture. IEEE Micro 23(2), 22–28 (2003)CrossRefGoogle Scholar
  5. 5.
    Bender, M., Ntarmos, N., Triantafillou, P., Weikum, G., Zimmer, C.: Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices. In: CIKM 2006, pp. 172–181 (2006)Google Scholar
  6. 6.
    Bender, M., Michel, S., Triantafillou, P., Weikum, G.: Global Document Frequency Estimation in Peer-to-Peer Web Search. In: WebDB (2006)Google Scholar
  7. 7.
    Bender, M., Michel, S., Parreira, J.X., Crecelius, T.: P2P Web Search: Make It Light, Make It Fly. In: CIDR 2007, pp. 164–168 (2007)Google Scholar
  8. 8.
    Borodin, A., Roberts, G.O., Rosenthal, J.S., Tsaparas, P.: Link analysis ranking: algorithms, theory, and experiments. ACM Trans. Internet Techn. 5(1), 231–297 (2005)CrossRefGoogle Scholar
  9. 9.
    Callan, J.P., Lu, Z., Bruce Croft, W.: Searching Distributed Collections with Inference Networks. SIGIR, 21–28 (1995)Google Scholar
  10. 10.
    Cao, P., Wang, Z.: Efficient top-K query calculation in distributed networks. In: PODC 2004, pp. 206–215 (2004)Google Scholar
  11. 11.
    Crespo, A., Garcia-Molina, H.: Semantic Overlay Networks for P2P Systems. In: Moro, G., Bergamaschi, S., Aberer, K. (eds.) AP2PC 2004. LNCS (LNAI), vol. 3601, pp. 1–13. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  12. 12.
    Cuenca-Acuna, F.M., Peery, C., Martin, R.P., Nguyen, T.D.: PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities. In: HPDC 2003, pp. 236–249 (2003)Google Scholar
  13. 13.
    Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Flajolet, P., Nigel Martin, G.: Probabilistic Counting Algorithms for Data Base Applications. J. Comput. Syst. Sci. 31(2), 182–209 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Gravano, L., Garcia-Molina, H., Tomasic, A.: GlOSS: Text-Source Discovery over the Internet. ACM Trans. Database Syst. 24(2), 229–264 (1999)CrossRefGoogle Scholar
  16. 16.
    Guha, R.V., Kumar, R., Raghavan, P., Tomkins, A.: Propagation of trust and distrust. In: WWW 2004, pp. 403–412 (2004)Google Scholar
  17. 17.
    Harth, A., Hose, K., Karnstedt, M., Polleres, A., Sattler, K.-U., Umbrich, J.: Data summaries for on-demand queries over linked data. In: WWW 2010, pp. 411–420 (2010)Google Scholar
  18. 18.
    Hartig, O., Bizer, C., Freytag, J.C.: Executing SPARQL Queries over the Web of Linked DataGoogle Scholar
  19. 19.
    Kalnis, P., Ng, W.S., Ooi, B.C., Tan, K.-L.: Answering similarity queries in peer-to-peer networks. Inf. Syst. 31(1), 57–72 (2006)CrossRefGoogle Scholar
  20. 20.
    Jelasity, M., Voulgaris, S., Guerraoui, R., Kermarrec, A.-M., van Steen, M.: Gossip-based peer sampling. ACM Trans. Comput. Syst. 25(3) (2007)Google Scholar
  21. 21.
    Kempe, D., Dobra, A., Gehrke, J.: Gossip-Based Computation of Aggregate Information. In: FOCS 2003, pp. 482–491 (2003)Google Scholar
  22. 22.
    Kempe, D., McSherry, F.: A decentralized algorithm for spectral analysis. In: STOC 2004, pp. 561–568 (2004)Google Scholar
  23. 23.
    Lu, J., Callan, J.P.: Content-based retrieval in hybrid peer-to-peer networks. In: CIKM 2003, pp. 199–206 (2003)Google Scholar
  24. 24.
    Mahlmann, P., Schindelhauer, C.: Distributed random digraph transformations for peer-to-peer networks. In: SPAA 2006, pp. 308–317 (2006)Google Scholar
  25. 25.
    Meng, W., Yu, C.T., Liu, K.-L.: Building efficient and effective metasearch engines. ACM Comput. Surv. 34(1), 48–89 (2002)CrossRefGoogle Scholar
  26. 26.
    Michel, S., Triantafillou, P., Weikum, G.: KLEE: A Framework for Distributed Top-k Query Algorithms. In: VLDB 2005, pp. 637–648 (2005)Google Scholar
  27. 27.
    Michel, S., Bender, M., Triantafillou, P., Weikum, G.: IQN Routing: Integrating Quality and Novelty in P2P Querying and Ranking. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 149–166. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  28. 28.
    Mislove, A., Gummadi, K.P., Druschel, P.: Exploiting Social Networks for Internet Search. HotNets (2006)Google Scholar
  29. 29.
    Mokbel, M.F. (ed.): Special Issue on Spatial and Spatial-temporal Databases. IEEE Data Eng. Bull. 33(2) (March 2010)Google Scholar
  30. 30.
    Neumann, T., Bender, M., Michel, S., Schenkel, R., Triantafillou, P., Weikum, G.: Distributed top-k aggregation queries at large. Distributed and Parallel Databases 26(1), 3–27 (2009)CrossRefGoogle Scholar
  31. 31.
    Nguyen, L.T., Yee, W.G., Frieder, O.: Adaptive distributed indexing for structured peer-to-peer networks. In: CIKM 2008, pp. 1241–1250 (2008)Google Scholar
  32. 32.
    Nottelmann, H., Fuhr, N.: Comparing Different Architectures for Query Routing in Peer-to-Peer Networks. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 253–264. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  33. 33.
    Ntarmos, N., Triantafillou, P., Weikum, G.: Distributed hash sketches: Scalable, efficient, and accurate cardinality estimation for distributed multisets. ACM Trans. Comput. Syst. 27(1) (2009)Google Scholar
  34. 34.
    Parreira, J.X., Castillo, C., Donato, D., Michel, S., Weikum, G.: The Juxtaposed approximate PageRank method for robust PageRank approximation in a peer-to-peer web search network. VLDB J. 17(2), 291–313 (2008)CrossRefGoogle Scholar
  35. 35.
    Podnar, I., Rajman, M., Luu, T., Klemm, F., Aberer, K.: Scalable Peer-to-Peer Web Retrieval with Highly Discriminative Keys. In: ICDE 2007, pp. 1096–1105 (2007)Google Scholar
  36. 36.
    Rowstron, A.I.T., Druschel, P.: Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  37. 37.
    Sozio, M., Parreira, J.X., Crecelius, T., Weikum, G.: Good Guys vs. Bad Guys: Countering Cheating in Peer-to-Peer Authority Computations over Social Networks. In: WebDB (2008)Google Scholar
  38. 38.
    Steinmetz, R., Wehrle, K.: Peer-to-Peer Systems and Applications. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  39. 39.
    Stoica, I., Morris, R., Karger, D.R., Frans Kaashoek, M., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: SIGCOMM 2001, pp. 149–160 (2001)Google Scholar
  40. 40.
    Tang, C., Xu, Z., Dwarkadas, S.: Peer-to-peer information retrieval using self-organizing semantic overlay networks. In: SIGCOMM 2003, pp. 175–186 (2003)Google Scholar
  41. 41.
    Terpstra, W.W., Kangasharju, J., Leng, C., Buchmann, A.P.: Bubblestorm: resilient, probabilistic, and exhaustive peer-to-peer search. In: SIGCOMM 2007, pp. 49–60 (2007)Google Scholar
  42. 42.
    Terpstra, W.W., Behnel, S., Fiege, L., Zeidler, A., Buchmann, A.P.: A peer-to-peer approach to content-based publish/subscribe. In: DEBS 2003 (2003)Google Scholar
  43. 43.
    Tryfonopoulos, C., Koubarakis, M., Drougas, Y.: Information filtering and query indexing for an information retrieval model. ACM Trans. Inf. Syst. 27(2) (2009)Google Scholar
  44. 44.
    Tummarello, G., Cyganiak, R., Catasta, M., Danielczyk, S., Delbru, R., Decker, S.: Sig.ma: live views on the web of data. In: WWW 2010, pp. 1301–1304 (2010)Google Scholar
  45. 45.
    van Renesse, R., Birman, K.P., Vogels, W.: Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining. ACM Trans. Comput. Syst. 21(2), 164–206 (2003)CrossRefGoogle Scholar
  46. 46.
    Yalagandula, P., Dahlin, M.: A scalable distributed information management system. In: SIGCOMM 2004, pp. 379–390 (2004)Google Scholar
  47. 47.
    Yu, H., Li, H.-G., Wu, P., Agrawal, D., El Abbadi, A.: Efficient Processing of Distributed Top-k Queries. In: Andersen, K.V., Debenham, J., Wagner, R. (eds.) DEXA 2005. LNCS, vol. 3588, pp. 65–74. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  48. 48.
    Zimmer, C., Tryfonopoulos, C., Weikum, G.: Exploiting correlated keywords to improve approximate information filtering. In: SIGIR 2008, pp. 323–330 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Gerhard Weikum
    • 1
  1. 1.Max-Planck Institute for InformaticsSaarbrueckenGermany

Personalised recommendations