Abstract
Content-addressable network is a scalable and robust distributed hash table providing distributed applications to store and retrieve information in an efficient manner. We consider design and implementation issues of a document sharing system over a content-addressable overlay network. Improvements and their applicability on a document sharing system are discussed. We describe our system prototype in which a hierarchical text classification approach is proposed as an alternative hash function to decompose dimensionality into lower dimensional realities. Properties of hierarchical document categories are used to obtain probabilistic class labels which also improves searching accuracy.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Crespo, A., Garcia-Molina, H.: Semantic overlay networks for p2p systems. Technical report, Computer Science Department, Stanford University (October 2002)
Zeinalipour-Yazti, D.: Information retrieval in peer-to-peer systems. Master Thesis, Department of Computer Science University of California (2003)
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Lindemann, C., Waldhorst, O.P.: A distributed search service for peer-to-peer ile sharing in mobile applications. In: Proceedings ofthe Second International Conference on Peer-to-Peer Computing, p. 73. IEEE Computer Society, Los Alamitos (2002)
Drmac, Z., Berry, M., Jessup, E.: Matrices, vector spaces and information retrieval. SIAM Review 41(2), 335–362 (1999)
McCallum, A., Nigam, K.: Text classification by bootstrapping with keywords, em and shrinkage. In: ACL Workshop for Unsupervised Learning in Natural Language Processing (1999)
Nakao, A., Peterson, L., Bavier, A.: A Routing Underlay for Overlay Networks. In: Proceedings ofthe ACM SIGCOMM Conference (August 2003)
Ng, C.-H., Sia, K.-C., King, I.: A novel strategy for information retrieval in the peer-to-peer network
Greg Plaxton, C., Rajaraman, R., Richa, A.W.: Accessing nearby copies of replicated objects in a distributed environment. In: ACM Symposium on Parallel Algorithms and Architectures, pp. 311–320 (1997)
Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable content addressable network. In: Proceedings of ACM SIGCOMM 2001 (2001)
Rowstron, A., Druschel, P.: Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001)
Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakr-ishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: Proceedings of ACM SIGCOMM, pp. 149–160. ACM Press, New York (2001)
Tang, C., Xu, Z., Dwarkadas, S.: Peer-to-peer information retrieval using self-organizing semantic overlay networks. In: Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, pp. 175–186. ACM Press, New York (2003)
Zhao, B.Y., Kubiatowicz, J.D., Joseph, A.D.: Tapestry: An infrastructure for fault-tolerant wide-area location and routing. Technical Report UCB/CSD-01-1141, UC Berkeley (April 2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Elmas, T., Ozkasap, O. (2004). Distributed Document Sharing with Text Classification over Content-Addressable Network. In: Chi, CH., Lam, KY. (eds) Content Computing. AWCC 2004. Lecture Notes in Computer Science, vol 3309. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30483-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-30483-8_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23898-0
Online ISBN: 978-3-540-30483-8
eBook Packages: Springer Book Archive