Advertisement

Parallel DSIR Text Retrieval System

  • Arnon Rungsawang
  • Athichat Tangpong
  • Pawat Laohawee
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1697)

Abstract

We present a study concerning the applicability of a distributed computing technique to a million-page free-text document retrieval problem. We propose a high-performance DSIR retrieval algorithm on a Beowulf PC Pentium cluster using PVM message-passing library. DSIR is a vector space based retrieval model in which semantic similarity between documents and queries is characterized by semantic vectors derived from the document collection. Retrieval of relevant answers is then interpreted in terms of computing the geometric proximity between a large number of document vectors and query vectors in a semantic vector space. We test this DSIR parallel algorithm and present the experimental results using a large-scale TREC-7 collection and investigate both computing performance and problem size scalability issue.

Keywords

Semantic Similarity Document Collection Computing Node Retrieval Algorithm Input Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    A. Geist et al. PVM: Parallel Virtual Machine-A Users’ Guide and Tutorial for Networked Parallel Computing. The MIT Press, 1994.Google Scholar
  2. 2.
    J. Dongarra et al. Integrated PVM Framework Supports Heterogeneous Networking Computing. Computers in Physics, 7(2):166–175, April 1993.Google Scholar
  3. 3.
    T.E. Anderson et al. A Case for NOWs. IEEE Micro, Febuary 1995.Google Scholar
  4. 4.
    A. Rungsawang. DSIR: The First TREC-7 Attempt. In E. Voorhees and D.K. Harman, editors, Proceedings of the Seventh Text REtrieval Conference. NIST Special publication, November 1988.Google Scholar
  5. 5.
    A. Rungsawang and M. Rajman. Textual Information Retrieval Based on the Concept of the Distributional Semantics. In Proceedings of the 3 th International Conference on Statistical Analysis of Textual Data, December 1995.Google Scholar
  6. 6.
    G. Salton and M.J. McGill. Introduction to Modern Information Retrieval. McGraw Hill, 1983.Google Scholar
  7. 7.
    P. Uthayopas. Beowulf Class Cluster: Opportunities and Approach in Thailand. In First NASA workshop on Beowulf class computer systems. NASA JPL, October 1997.Google Scholar
  8. 8.
    E.M. Voorhees and D.K. Harman. Overview of the Seventh Text REtrieval Confrence (TREC-7). In Proceedings of the Seventh Text REtrieval Conference. NIST Special publication, November 1998.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Arnon Rungsawang
    • 1
  • Athichat Tangpong
    • 1
  • Pawat Laohawee
    • 1
  1. 1.KU Text REtrieval Group (KU-TREG) Department of Computer Engineering Faculty of EngineeringKasetsart UniversityBangkokThailand

Personalised recommendations