Skip to main content

Peer-to-Peer Web Search: Euphoria, Achievements, Disillusionment, and Future Opportunities

  • Chapter
From Active Data Management to Event-Based Systems and More

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6462))

  • 2323 Accesses

Abstract

The peer-to-peer (P2P) computing paradigm has been very successful like file sharing in Internet-wide communities (e.g., Gnutella, BitTorrent) or IP telephony (e.g., Skype). P2P systems promise perfect scalability from few peers to many millions, and resilience to failures, dynamic variability, and even misbehaving peers with egoistic or even malicious behavior. None of these salient properties requires any global planning, administration, or control; so P2P systems are completely self-organizing.

Web search seems to be a perfect match for P2P architectures. The Web has naturally distributed data, spread across the entire Internet, as opposed to artifically hosting all content by a centralized search engine. For user-provided contents in Web 2.0 communities, consideration of the content ownership, the autonomy of users, and the individualized control of privacy would also suggest decentralized solutions with many peers. Using the combined power and knowledge of millions of users and their computers could offer a more informative and pluralistic view of the world’s information. A P2P search engine could benefit from the intellectual input – bookmarks, queries, clicks – of a large user community, without undue risks about privacy or censorship, because users can gather logs on their own computers and control further sharing and aggregation by their individual policies. These potential benefits have motivated a wealth of exciting research on algorithms and systems for P2P Web search. This paper gives a brief overview on the last decade’s research achievements along these lines.

Despite all these intriguing promises and notwithstanding the impressive success of simpler file-sharing applications, P2P approaches to Web search or Web 2.0 services did not make a significant impact on the practical deployment side. The wave of P2P euphoria in academic research was followed by a phase of disillusionment about the lack of business models and user incentives. This paper discusses these shortcomings, and points out new opportunities for the P2P paradigm to play a more successful role in future Web applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anand, A., Bedathur, S.J., Berberich, K., Schenkel, R., Tryfonopoulos, C.: EverLast: a distributed architecture for preserving the web. In: JCDL 2009, pp. 331–340 (2009)

    Google Scholar 

  2. Baeza-Yates, R.A., Castillo, C., Junqueira, F., Plachouras, V., Silvestri, F.: Challenges on Distributed Web Retrieval. In: ICDE 2007, pp. 6–20 (2007)

    Google Scholar 

  3. Balke, W.-T., Nejdl, W., Siberski, W., Thaden, U.: Progressive Distributed Top k Retrieval in Peer-to-Peer Networks. In: ICDE 2005, pp. 174–185 (2005)

    Google Scholar 

  4. Barroso, L.A., Dean, J., Hlzle, U.: Web Search for a Planet: The Google Cluster Architecture. IEEE Micro 23(2), 22–28 (2003)

    Article  Google Scholar 

  5. Bender, M., Ntarmos, N., Triantafillou, P., Weikum, G., Zimmer, C.: Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices. In: CIKM 2006, pp. 172–181 (2006)

    Google Scholar 

  6. Bender, M., Michel, S., Triantafillou, P., Weikum, G.: Global Document Frequency Estimation in Peer-to-Peer Web Search. In: WebDB (2006)

    Google Scholar 

  7. Bender, M., Michel, S., Parreira, J.X., Crecelius, T.: P2P Web Search: Make It Light, Make It Fly. In: CIDR 2007, pp. 164–168 (2007)

    Google Scholar 

  8. Borodin, A., Roberts, G.O., Rosenthal, J.S., Tsaparas, P.: Link analysis ranking: algorithms, theory, and experiments. ACM Trans. Internet Techn. 5(1), 231–297 (2005)

    Article  Google Scholar 

  9. Callan, J.P., Lu, Z., Bruce Croft, W.: Searching Distributed Collections with Inference Networks. SIGIR, 21–28 (1995)

    Google Scholar 

  10. Cao, P., Wang, Z.: Efficient top-K query calculation in distributed networks. In: PODC 2004, pp. 206–215 (2004)

    Google Scholar 

  11. Crespo, A., Garcia-Molina, H.: Semantic Overlay Networks for P2P Systems. In: Moro, G., Bergamaschi, S., Aberer, K. (eds.) AP2PC 2004. LNCS (LNAI), vol. 3601, pp. 1–13. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  12. Cuenca-Acuna, F.M., Peery, C., Martin, R.P., Nguyen, T.D.: PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities. In: HPDC 2003, pp. 236–249 (2003)

    Google Scholar 

  13. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  14. Flajolet, P., Nigel Martin, G.: Probabilistic Counting Algorithms for Data Base Applications. J. Comput. Syst. Sci. 31(2), 182–209 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  15. Gravano, L., Garcia-Molina, H., Tomasic, A.: GlOSS: Text-Source Discovery over the Internet. ACM Trans. Database Syst. 24(2), 229–264 (1999)

    Article  Google Scholar 

  16. Guha, R.V., Kumar, R., Raghavan, P., Tomkins, A.: Propagation of trust and distrust. In: WWW 2004, pp. 403–412 (2004)

    Google Scholar 

  17. Harth, A., Hose, K., Karnstedt, M., Polleres, A., Sattler, K.-U., Umbrich, J.: Data summaries for on-demand queries over linked data. In: WWW 2010, pp. 411–420 (2010)

    Google Scholar 

  18. Hartig, O., Bizer, C., Freytag, J.C.: Executing SPARQL Queries over the Web of Linked Data

    Google Scholar 

  19. Kalnis, P., Ng, W.S., Ooi, B.C., Tan, K.-L.: Answering similarity queries in peer-to-peer networks. Inf. Syst. 31(1), 57–72 (2006)

    Article  Google Scholar 

  20. Jelasity, M., Voulgaris, S., Guerraoui, R., Kermarrec, A.-M., van Steen, M.: Gossip-based peer sampling. ACM Trans. Comput. Syst. 25(3) (2007)

    Google Scholar 

  21. Kempe, D., Dobra, A., Gehrke, J.: Gossip-Based Computation of Aggregate Information. In: FOCS 2003, pp. 482–491 (2003)

    Google Scholar 

  22. Kempe, D., McSherry, F.: A decentralized algorithm for spectral analysis. In: STOC 2004, pp. 561–568 (2004)

    Google Scholar 

  23. Lu, J., Callan, J.P.: Content-based retrieval in hybrid peer-to-peer networks. In: CIKM 2003, pp. 199–206 (2003)

    Google Scholar 

  24. Mahlmann, P., Schindelhauer, C.: Distributed random digraph transformations for peer-to-peer networks. In: SPAA 2006, pp. 308–317 (2006)

    Google Scholar 

  25. Meng, W., Yu, C.T., Liu, K.-L.: Building efficient and effective metasearch engines. ACM Comput. Surv. 34(1), 48–89 (2002)

    Article  Google Scholar 

  26. Michel, S., Triantafillou, P., Weikum, G.: KLEE: A Framework for Distributed Top-k Query Algorithms. In: VLDB 2005, pp. 637–648 (2005)

    Google Scholar 

  27. Michel, S., Bender, M., Triantafillou, P., Weikum, G.: IQN Routing: Integrating Quality and Novelty in P2P Querying and Ranking. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 149–166. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  28. Mislove, A., Gummadi, K.P., Druschel, P.: Exploiting Social Networks for Internet Search. HotNets (2006)

    Google Scholar 

  29. Mokbel, M.F. (ed.): Special Issue on Spatial and Spatial-temporal Databases. IEEE Data Eng. Bull. 33(2) (March 2010)

    Google Scholar 

  30. Neumann, T., Bender, M., Michel, S., Schenkel, R., Triantafillou, P., Weikum, G.: Distributed top-k aggregation queries at large. Distributed and Parallel Databases 26(1), 3–27 (2009)

    Article  Google Scholar 

  31. Nguyen, L.T., Yee, W.G., Frieder, O.: Adaptive distributed indexing for structured peer-to-peer networks. In: CIKM 2008, pp. 1241–1250 (2008)

    Google Scholar 

  32. Nottelmann, H., Fuhr, N.: Comparing Different Architectures for Query Routing in Peer-to-Peer Networks. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 253–264. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  33. Ntarmos, N., Triantafillou, P., Weikum, G.: Distributed hash sketches: Scalable, efficient, and accurate cardinality estimation for distributed multisets. ACM Trans. Comput. Syst. 27(1) (2009)

    Google Scholar 

  34. Parreira, J.X., Castillo, C., Donato, D., Michel, S., Weikum, G.: The Juxtaposed approximate PageRank method for robust PageRank approximation in a peer-to-peer web search network. VLDB J. 17(2), 291–313 (2008)

    Article  Google Scholar 

  35. Podnar, I., Rajman, M., Luu, T., Klemm, F., Aberer, K.: Scalable Peer-to-Peer Web Retrieval with Highly Discriminative Keys. In: ICDE 2007, pp. 1096–1105 (2007)

    Google Scholar 

  36. Rowstron, A.I.T., Druschel, P.: Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  37. Sozio, M., Parreira, J.X., Crecelius, T., Weikum, G.: Good Guys vs. Bad Guys: Countering Cheating in Peer-to-Peer Authority Computations over Social Networks. In: WebDB (2008)

    Google Scholar 

  38. Steinmetz, R., Wehrle, K.: Peer-to-Peer Systems and Applications. Springer, Heidelberg (2005)

    Book  Google Scholar 

  39. Stoica, I., Morris, R., Karger, D.R., Frans Kaashoek, M., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: SIGCOMM 2001, pp. 149–160 (2001)

    Google Scholar 

  40. Tang, C., Xu, Z., Dwarkadas, S.: Peer-to-peer information retrieval using self-organizing semantic overlay networks. In: SIGCOMM 2003, pp. 175–186 (2003)

    Google Scholar 

  41. Terpstra, W.W., Kangasharju, J., Leng, C., Buchmann, A.P.: Bubblestorm: resilient, probabilistic, and exhaustive peer-to-peer search. In: SIGCOMM 2007, pp. 49–60 (2007)

    Google Scholar 

  42. Terpstra, W.W., Behnel, S., Fiege, L., Zeidler, A., Buchmann, A.P.: A peer-to-peer approach to content-based publish/subscribe. In: DEBS 2003 (2003)

    Google Scholar 

  43. Tryfonopoulos, C., Koubarakis, M., Drougas, Y.: Information filtering and query indexing for an information retrieval model. ACM Trans. Inf. Syst. 27(2) (2009)

    Google Scholar 

  44. Tummarello, G., Cyganiak, R., Catasta, M., Danielczyk, S., Delbru, R., Decker, S.: Sig.ma: live views on the web of data. In: WWW 2010, pp. 1301–1304 (2010)

    Google Scholar 

  45. van Renesse, R., Birman, K.P., Vogels, W.: Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining. ACM Trans. Comput. Syst. 21(2), 164–206 (2003)

    Article  Google Scholar 

  46. Yalagandula, P., Dahlin, M.: A scalable distributed information management system. In: SIGCOMM 2004, pp. 379–390 (2004)

    Google Scholar 

  47. Yu, H., Li, H.-G., Wu, P., Agrawal, D., El Abbadi, A.: Efficient Processing of Distributed Top-k Queries. In: Andersen, K.V., Debenham, J., Wagner, R. (eds.) DEXA 2005. LNCS, vol. 3588, pp. 65–74. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  48. Zimmer, C., Tryfonopoulos, C., Weikum, G.: Exploiting correlated keywords to improve approximate information filtering. In: SIGIR 2008, pp. 323–330 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Weikum, G. (2010). Peer-to-Peer Web Search: Euphoria, Achievements, Disillusionment, and Future Opportunities. In: Sachs, K., Petrov, I., Guerrero, P. (eds) From Active Data Management to Event-Based Systems and More. Lecture Notes in Computer Science, vol 6462. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17226-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17226-7_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17225-0

  • Online ISBN: 978-3-642-17226-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics