Advertisement

A High Availability Mechanism for Parallel File System

  • Hu Zhang
  • Weiguo Wu
  • Xiaoshe Dong
  • Depei Qian
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3756)

Abstract

Parallel file systems achieve a high I/O throughput by dividing a file into multiple blocks and storing them on multiple I/O nodes. However, the reliability and availability of the parallel file systems are sacrificed for the stripping of file data over multi I/O nodes. A new mechanism named Logic Mirror Ring (LMR), has been developed to improve the reliability and availability of the parallel file systems in this study. A logic mirror ring is built over all I/O nodes to indicate the mirror relationship among the nodes, i.e., each node maintains not only its own data but also the mirror data of other nodes. The fault tolerant capability of the system is improved because the node maintaining the mirror data of the failed node will take over the requests to the failed node. The mirror depth can be adjusted to different levels based on the requirements of the reliability and availability. A model is developed to evaluate the reliability and availability of the parallel file systems. The effects of LMR on the reliability and availability of the parallel file system is studied. The results show that LMR can be used to improve the reliability and availability of the parallel file systems effectively.

Keywords

Repair Rate Distribute File System Metadata Server Distribute Storage System Backup Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Wu, J., Wyckoff, P., Panda, D.: PVFS over InfiniBand: Design and Performance Evaluation. In: The International Conference on Parallel Processing (ICPP 2003), Taiwan (2003)Google Scholar
  2. 2.
  3. 3.
    Schmuck, F., Haskin, R.: GPFS: A Shared-Disk File System. In: Proceedings of the Conference on File and Storage Technologies (FAST 2002), Monterey, CA (2002)Google Scholar
  4. 4.
    Zhu, Y., Hong, H., Xin, X., Feng, D., Swanson, D.R.: Design, Implementation and Performance Evaluation of A Cost-Effective Fault-Tolerant Parallel Virtual File System. In: The International Workshop on Storage Network Architecture and Parallel I/O, New Orleans, LA (2003)Google Scholar
  5. 5.
    Rodeh, O., Teperman, A.: zFs - A Scalable Distributed File System Using Object Disks. In: Proceedings of the 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies(MSS 2003), San Diego, California (2003)Google Scholar
  6. 6.
    Braam, P.J.: The Lustre Storage Architecture (2004), http://www.clusterfs.com
  7. 7.
    Carns, P.H., Ligon III, W.B., Ross, R.B., Thakur, R.: PVFS: A Parallel File System For Linux Clusters (2000), http://www.parl.clemson.edu/pvfs/papers.html
  8. 8.
    Brandt, S.A., Miller, E.L., Long, D.E., Xue, L.: Efficient Metadata Management in Large Distributed Storage System. In: Proceedings of the 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies(MSS 2003), San Diego, California (2003)Google Scholar
  9. 9.
    Baek, S.H., Kim, B.W., Joung, E.J., Park, C.W.: Reliability and performance of hierarchical RAID with multiple controllers. In: Proceedings of the 20th annual ACM symposium on Principles of Distributed Computing (2001)Google Scholar
  10. 10.
    Xin, Q., Miller, E.L., Schwarz, T., Long, D.E.: Reliability Mechanisms for Very Large Storage System. In: Proceedings of the 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies(MSS 2003), San Diego, California (2003)Google Scholar
  11. 11.
    McDysan, D.: QoS & Traffic Management in IP & ATM Networks, pp. 153–164. TsingHua University Press, Beijing (2000)Google Scholar
  12. 12.
    Yuya, L.: The mathematic of Reliability. Huazhong Univ of Science and Technology Press, Wuhan (1990)Google Scholar
  13. 13.
    Oggerino, C.: The Fundamental of High Availability. China Electrical Power Press (2002)Google Scholar
  14. 14.
    Baker, M., Hartman, O., Kupfer, M., Shirriff, K., Ousterhout, J.: Measurements of a Distributed File System. In: Proceedings of the 13th SOSP, October 1991, p. 15 (1991)Google Scholar
  15. 15.
    Nieuwejaar, N., Kotz, D., Purakayastha, A., Ellis, C.S., Best, M.L.: File-access characteristics of parallel scientific workloads. IEEE Transaction on Parallel and Distributed Systems 7(10) (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Hu Zhang
    • 1
  • Weiguo Wu
    • 1
  • Xiaoshe Dong
    • 1
  • Depei Qian
    • 1
    • 2
  1. 1.Department of Computer ScienceXi’an Jiaotong Univ., Xi’anShaanxiChina
  2. 2.School of Computer ScienceBeihang Univ.BeijingChina

Personalised recommendations