Skip to main content

Improving Load Balance and Fault Tolerance for PC Cluster-Based Parallel Information Retrieval

  • Conference paper
Parallel Processing and Applied Mathematics (PPAM 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3019))

Abstract

Information service providers and companies have typically been using expensive mid-range or mainframe computers when they need a high performance information retrieval system for massive data sources such as the Internet. In recent years, companies have begun considering the PC cluster system as an alternative solution because of its cost-effectiveness as well as its high scalability. However, if some of the cluster nodes break down, users may have to wait for a long time or even may not be able to get any result in the worst case. This paper presents a duplicated data declustering method for PC cluster-based parallel information retrieval in order to achieve fault tolerance and to improve load balance in an efficient manner at low cost. The effectiveness of our method has been confirmed by experiments with a corpus of two million newspaper articles on an 8-node PC cluster.

This work was funded by the University Research Program supported by Ministry of Information and Communication in Korea under contract 2002-005-3.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-Yates, R., Ribeiro-Neto, B.: Modern information retrieval. Addison-Wesley, Reading (1999)

    Google Scholar 

  2. Jeong, B., Omiecinski, E.: Inverted file partitioning schemes in multiple disk systems. IEEE Transactions on Parallel and Distributed Systems 6(2), 142–153 (1995)

    Article  Google Scholar 

  3. Chung, S.-H., Kwon, H.-C., Ryu, K.R., Jang, H.-K., Kim, J.-H., Choi, C.-A.: Information retrieval on an SCI-based PC cluster. Journal of Supercomputing 19(3), 251–265 (2001)

    Article  MATH  Google Scholar 

  4. National Energy Research Scientic Computing Center: MVICH - MPI for virtual interface architecture (1999), http://www.nersc.gov/research/ftg/mvich/index.html

  5. Stanfill, C., Thau, R.: Information retrieval on the connection machine: 1 to 8192 gigabytes. Information Processing and Management 27, 285–310 (1991)

    Article  Google Scholar 

  6. Xi, W., Sornil, O., Luo, M., Fox, E.A.: Hybrid partitioned inverted indices for largescale digital libraries. In: Proceeding of the 6th European Conference on Research and Advanced Technology for Digital Libraries, pp. 422–431 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kang, J., Ahn, H., Jung, SW., Ryu, K.R., Kwon, HC., Chung, SH. (2004). Improving Load Balance and Fault Tolerance for PC Cluster-Based Parallel Information Retrieval. In: Wyrzykowski, R., Dongarra, J., Paprzycki, M., Waśniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2003. Lecture Notes in Computer Science, vol 3019. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24669-5_89

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24669-5_89

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21946-0

  • Online ISBN: 978-3-540-24669-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics