Skip to main content

Aggregation of Document Frequencies in Unstructured P2P Networks

  • Conference paper
Web Information Systems Engineering - WISE 2009 (WISE 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5802))

Included in the following conference series:

Abstract

Peer-to-peer (P2P) systems have been recently proposed for providing search and information retrieval facilities over distributed data sources, including web data. Terms and their document frequencies are the main building blocks of retrieval and as such need to be computed, aggregated, and distributed throughout the system. This is a tedious task, as the local view of each peer may not reflect the global document collection, due to skewed document distributions. Moreover, central assembly of the total information is not feasible, due to the prohibitive cost of storage and maintenance, and also because of issues related to digital rights management. In this paper, we propose an efficient approach for aggregating the document frequencies of carefully selected terms based on a hierarchical overlay network. To this end, we examine unsupervised feature selection techniques at the individual peer level, in order to identify only a limited set of the most important terms for aggregation. We provide a theoretical analysis to compute the cost of our approach, and we conduct experiments on two document collections, in order to measure the quality of the aggregated document frequencies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ahmad, K., Gillam, L., Tostevin, L.: Weirdness indexing for logical document extrapolation and retrieval WILDER. In: TREC (1999)

    Google Scholar 

  2. Balke, W.-T.: Supporting information retrieval in peer-to-peer systems. In: Steinmetz, R., Wehrle, K. (eds.) Peer-to-Peer Systems and Applications. LNCS, vol. 3485, pp. 337–352. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  3. Balke, W.-T., Nejdl, W., Siberski, W., Thaden, U.: DL Meets P2P – Distributed Document Retrieval Based on Classification and Content. In: Rauber, A., Christodoulakis, S., Tjoa, A.M. (eds.) ECDL 2005. LNCS, vol. 3652, pp. 379–390. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  4. Balke, W.-T., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top-k retrieval in peer-to-peer networks. In: Proc. of ICDE (2005)

    Google Scholar 

  5. Bender, M., Michel, S., Triantafillou, P., Weikum, G.: Global document frequency estimation in peer-to-peer web search. In: Proc. of the 9th Int. Workshop on the web and databases (2006)

    Google Scholar 

  6. Cuenca-Acuna, F., Peery, C., Martin, R., Nguyen, T.: PlanetP: Using gossiping to build content addressable peer-to-peer information sharing communities. In: Proc. of HPDC (2003)

    Google Scholar 

  7. Doulkeridis, C., Nørvåg, K., Vazirgiannis, M.: Scalable semantic overlay generation for P2P-based digital libraries. In: Gonzalo, J., Thanos, C., Verdejo, M.F., Carrasco, R.C. (eds.) ECDL 2006. LNCS, vol. 4172, pp. 26–38. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  8. Doulkeridis, C., Nørvåg, K., Vazirgiannis, M.: DESENT: Decentralized and distributed semantic overlay generation in P2P networks. Journal on Selected Areas in Communications 25(1) (2007)

    Google Scholar 

  9. Lu, J., Callan, J.: Full-text federated search of text-based digital libraries in peer-to-peer networks. Information Retrieval 9(4) (2006)

    Google Scholar 

  10. Melink, S., Raghavan, S., Yang, B., Garcia-Molina, H.: Building a distributed full-text index for the web. ACM Transactions on Information Systems 19(3) (2001)

    Google Scholar 

  11. Michel, S., Triantafillou, P., Weikum, G.: MINERVA infinity: A scalable efficient peer-to-peer search engine. In: Alonso, G. (ed.) Middleware 2005. LNCS, vol. 3790, pp. 60–81. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  12. Nottelmann, H., Fuhr, N.: Comparing different architectures for query routing in peer-to-peer networks. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 253–264. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  13. Papapetrou, O., Michel, S., Bender, M., Weikum, G.: On the usage of global document occurrences in peer-to-peer information systems. In: Proc. of COOPIS (2005)

    Google Scholar 

  14. Podnar, I., Luu, T., Rajman, M., Klemm, F., Aberer, K.: A P2P architecture for information retrieval across digital library collections. In: Gonzalo, J., Thanos, C., Verdejo, M.F., Carrasco, R.C. (eds.) ECDL 2006. LNCS, vol. 4172, pp. 14–25. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  15. Raftopoulou, P., Petrakis, E.G.M., Tryfonopoulos, C., Weikum, G.: Information retrieval and filtering over self-organising digital libraries. In: Christensen-Dalsgaard, B., Castelli, D., Ammitzbøll Jurik, B., Lippincott, J. (eds.) ECDL 2008. LNCS, vol. 5173, pp. 320–333. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  16. Sahin, O.D., Emekçi, F., Agrawal, D., Abbadi, A.E.: Content-based similarity search over peer-to-peer systems. In: Ng, W.S., Ooi, B.-C., Ouksel, A.M., Sartori, C. (eds.) DBISP2P 2004. LNCS, vol. 3367, pp. 61–78. Springer, Heidelberg (2005)

    Google Scholar 

  17. Skobeltsyn, G., Luu, T., Zarko, I.P., Rajman, M., Aberer, K.: Query-driven indexing for scalable peer-to-peer text retrieval. In: Proc. of Infoscale (2007)

    Google Scholar 

  18. Suel, T., Mathur, C., wen Wu, J., Zhang, J., Delis, A., Mehdi, Kharrazi, X.L., Shanmugasundaram, K.: Odissea: A peer-to-peer architecture for scalable web search and information retrieval. In: Proc. of WebDB (2003)

    Google Scholar 

  19. Tang, C., Dwarkadas, S.: Hybrid global-local indexing for efficient peer-to-peer information retrieval. In: Proc. of NSDI (2004)

    Google Scholar 

  20. Viles, C.L., French, J.C.: Dissemination of collection wide information in a distributed information retrieval system. In: Proc. of SIGIR (1995)

    Google Scholar 

  21. Viles, C.L., French, J.C.: On the update of term weights in dynamic information retrieval systems. In: Proc. of CIKM (1995)

    Google Scholar 

  22. Witschel, H.F.: Global term weights in distributed environments. Information Processing and Management 44(3) (2008)

    Google Scholar 

  23. Xu, Y., Wang, B., Li, J., Jing, H.: An extended document frequency metric for feature selection in text categorization. In: Li, H., Liu, T., Ma, W.-Y., Sakai, T., Wong, K.-F., Zhou, G. (eds.) AIRS 2008. LNCS, vol. 4993, pp. 71–82. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  24. Zhang, J., Suel, T.: Efficient query evaluation on large textual collections in a peer-to-peer environment. In: Proc. of IEEE P2P (2005)

    Google Scholar 

  25. Zimmer, C., Tryfonopoulos, C., Berberich, K., Koubarakis, M., Weikum, G.: Approximate information filtering in peer-to-peer networks. In: Bailey, J., Maier, D., Schewe, K.-D., Thalheim, B., Wang, X.S. (eds.) WISE 2008. LNCS, vol. 5175, pp. 6–19. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Neumayer, R., Doulkeridis, C., Nørvåg, K. (2009). Aggregation of Document Frequencies in Unstructured P2P Networks. In: Vossen, G., Long, D.D.E., Yu, J.X. (eds) Web Information Systems Engineering - WISE 2009. WISE 2009. Lecture Notes in Computer Science, vol 5802. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04409-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04409-0_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04408-3

  • Online ISBN: 978-3-642-04409-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics