Parallel DSIR Text Retrieval System
We present a study concerning the applicability of a distributed computing technique to a million-page free-text document retrieval problem. We propose a high-performance DSIR retrieval algorithm on a Beowulf PC Pentium cluster using PVM message-passing library. DSIR is a vector space based retrieval model in which semantic similarity between documents and queries is characterized by semantic vectors derived from the document collection. Retrieval of relevant answers is then interpreted in terms of computing the geometric proximity between a large number of document vectors and query vectors in a semantic vector space. We test this DSIR parallel algorithm and present the experimental results using a large-scale TREC-7 collection and investigate both computing performance and problem size scalability issue.
KeywordsSemantic Similarity Document Collection Computing Node Retrieval Algorithm Input Query
Unable to display preview. Download preview PDF.
- 1.A. Geist et al. PVM: Parallel Virtual Machine-A Users’ Guide and Tutorial for Networked Parallel Computing. The MIT Press, 1994.Google Scholar
- 2.J. Dongarra et al. Integrated PVM Framework Supports Heterogeneous Networking Computing. Computers in Physics, 7(2):166–175, April 1993.Google Scholar
- 3.T.E. Anderson et al. A Case for NOWs. IEEE Micro, Febuary 1995.Google Scholar
- 4.A. Rungsawang. DSIR: The First TREC-7 Attempt. In E. Voorhees and D.K. Harman, editors, Proceedings of the Seventh Text REtrieval Conference. NIST Special publication, November 1988.Google Scholar
- 5.A. Rungsawang and M. Rajman. Textual Information Retrieval Based on the Concept of the Distributional Semantics. In Proceedings of the 3 th International Conference on Statistical Analysis of Textual Data, December 1995.Google Scholar
- 6.G. Salton and M.J. McGill. Introduction to Modern Information Retrieval. McGraw Hill, 1983.Google Scholar
- 7.P. Uthayopas. Beowulf Class Cluster: Opportunities and Approach in Thailand. In First NASA workshop on Beowulf class computer systems. NASA JPL, October 1997.Google Scholar
- 8.E.M. Voorhees and D.K. Harman. Overview of the Seventh Text REtrieval Confrence (TREC-7). In Proceedings of the Seventh Text REtrieval Conference. NIST Special publication, November 1998.Google Scholar