A High Availability Mechanism for Parallel File System

Zhang, Hu; Wu, Weiguo; Dong, Xiaoshe; Qian, Depei

doi:10.1007/11573937_22

Hu Zhang¹⁹,
Weiguo Wu¹⁹,
Xiaoshe Dong¹⁹ &
…
Depei Qian^19,20

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3756))

Included in the following conference series:

International Workshop on Advanced Parallel Processing Technologies

663 Accesses
1 Citations

Abstract

Parallel file systems achieve a high I/O throughput by dividing a file into multiple blocks and storing them on multiple I/O nodes. However, the reliability and availability of the parallel file systems are sacrificed for the stripping of file data over multi I/O nodes. A new mechanism named Logic Mirror Ring (LMR), has been developed to improve the reliability and availability of the parallel file systems in this study. A logic mirror ring is built over all I/O nodes to indicate the mirror relationship among the nodes, i.e., each node maintains not only its own data but also the mirror data of other nodes. The fault tolerant capability of the system is improved because the node maintaining the mirror data of the failed node will take over the requests to the failed node. The mirror depth can be adjusted to different levels based on the requirements of the reliability and availability. A model is developed to evaluate the reliability and availability of the parallel file systems. The effects of LMR on the reliability and availability of the parallel file system is studied. The results show that LMR can be used to improve the reliability and availability of the parallel file systems effectively.

This research is supported by National 863 Plan under grant No.2004AA111110 and 2002AA104550.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Wu, J., Wyckoff, P., Panda, D.: PVFS over InfiniBand: Design and Performance Evaluation. In: The International Conference on Parallel Processing (ICPP 2003), Taiwan (2003)
Google Scholar
IA64 Cluster Document (2003), http://www.hlrs.de/hw-access/platforms/zx6000/user_oc.pdf
Schmuck, F., Haskin, R.: GPFS: A Shared-Disk File System. In: Proceedings of the Conference on File and Storage Technologies (FAST 2002), Monterey, CA (2002)
Google Scholar
Zhu, Y., Hong, H., Xin, X., Feng, D., Swanson, D.R.: Design, Implementation and Performance Evaluation of A Cost-Effective Fault-Tolerant Parallel Virtual File System. In: The International Workshop on Storage Network Architecture and Parallel I/O, New Orleans, LA (2003)
Google Scholar
Rodeh, O., Teperman, A.: zFs - A Scalable Distributed File System Using Object Disks. In: Proceedings of the 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies(MSS 2003), San Diego, California (2003)
Google Scholar
Braam, P.J.: The Lustre Storage Architecture (2004), http://www.clusterfs.com
Carns, P.H., Ligon III, W.B., Ross, R.B., Thakur, R.: PVFS: A Parallel File System For Linux Clusters (2000), http://www.parl.clemson.edu/pvfs/papers.html
Brandt, S.A., Miller, E.L., Long, D.E., Xue, L.: Efficient Metadata Management in Large Distributed Storage System. In: Proceedings of the 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies(MSS 2003), San Diego, California (2003)
Google Scholar
Baek, S.H., Kim, B.W., Joung, E.J., Park, C.W.: Reliability and performance of hierarchical RAID with multiple controllers. In: Proceedings of the 20th annual ACM symposium on Principles of Distributed Computing (2001)
Google Scholar
Xin, Q., Miller, E.L., Schwarz, T., Long, D.E.: Reliability Mechanisms for Very Large Storage System. In: Proceedings of the 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies(MSS 2003), San Diego, California (2003)
Google Scholar
McDysan, D.: QoS & Traffic Management in IP & ATM Networks, pp. 153–164. TsingHua University Press, Beijing (2000)
Google Scholar
Yuya, L.: The mathematic of Reliability. Huazhong Univ of Science and Technology Press, Wuhan (1990)
Google Scholar
Oggerino, C.: The Fundamental of High Availability. China Electrical Power Press (2002)
Google Scholar
Baker, M., Hartman, O., Kupfer, M., Shirriff, K., Ousterhout, J.: Measurements of a Distributed File System. In: Proceedings of the 13th SOSP, October 1991, p. 15 (1991)
Google Scholar
Nieuwejaar, N., Kotz, D., Purakayastha, A., Ellis, C.S., Best, M.L.: File-access characteristics of parallel scientific workloads. IEEE Transaction on Parallel and Distributed Systems 7(10) (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Xi’an Jiaotong Univ., Xi’an, Shaanxi, 710049, China
Hu Zhang, Weiguo Wu, Xiaoshe Dong & Depei Qian
School of Computer Science, Beihang Univ., Beijing, 100083, China
Depei Qian

Authors

Hu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Weiguo Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoshe Dong
View author publications
You can also search for this author in PubMed Google Scholar
Depei Qian
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computing, Hong Kong Polytechnic University, Kowloon, Hong Kong, China
Jiannong Cao
L3S Research Center, Leibniz Universität Hannover, Appelstrasse 9a, 30167, Hannover, Germany
Wolfgang Nejdl
Department of Network Engineering, School of Computer Science, National University of Defense Technology, 410073, Changsha, China
Ming Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, H., Wu, W., Dong, X., Qian, D. (2005). A High Availability Mechanism for Parallel File System. In: Cao, J., Nejdl, W., Xu, M. (eds) Advanced Parallel Processing Technologies. APPT 2005. Lecture Notes in Computer Science, vol 3756. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573937_22

Download citation

DOI: https://doi.org/10.1007/11573937_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29639-3
Online ISBN: 978-3-540-32107-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics