I/O Interference Alleviation on Parallel File Systems Using Server-Side QoS-Based Load-Balancing
Abstract
Storage performance in supercomputers is variable, depending not only on an application’s workload but also on the types of other concurrent I/O activities. In particular, performance degradation in meta-data accesses leads to poor storage performance across applications running at the same time. We herein focus on two representative performance problems, high load and slow response of a meta-data server, through analysis of meta-data server activities using file system performance metrics on the K computer. We investigate the root causes of such performance problems through MDTEST benchmark runs and confirm the performance improvement by server-side quality-of-service management in service thread assignment for incoming client requests on a meta-data server.
Keywords
Lustre FEFS MDS OSS Data-staging QoS K computerNotes
Acknowledgment
The results of this paper were obtained using the K computer.
References
- 1.Brim, M.J., Lothian, J.K.: Monitoring extreme-scale Lustre toolkit. In: Proceedings of the International Workshop on the Lustre Ecosystem: Challenges and Opportunities (2015). http://arxiv.org/html/1506.05323
- 2.Crosby, L.D., Mohr, R.: Petascale I/O: challenges, solutions, and recommendations. In: Proceedings of the Extreme Scaling Workshop, BW-XSEDE 2012, pp. 7:1–7:7. University of Illinois at Urbana-Champaign (2012)Google Scholar
- 3.Ezell, M., Mohr, R., Wynkoop, J., Braby, R.: Lustre at petascale: experiences in troubleshooting and upgrading. In: 2012 Cray User Group Meeting (2012)Google Scholar
- 4.Hirai, K., Iguchi, Y., Uno, A., Kurokawa, M.: Operations management software for the K computer. Fujitsu Sci. Tech. J. 48(3), 310–316 (2012)Google Scholar
- 5.Lustre. http://lustre.org/
- 6.MDTEST. https://github.com/hpc/ior
- 7.Mohr, R., Brim, M., Oral, S., Dilger, A.: Evaluating progressive file layouts for Lustre (2016). http://lustre.ornl.gov/ecosystem-2016/
- 8.Morrone, C.: LMT Lustre monitoring tools. In: Lustre User Group 2011 (2011)Google Scholar
- 9.Qian, Y., Barton, E., Wang, T., Puntambekar, N., Dilger, A.: A novel network request scheduler for a large scale storage system. Comput. Sci. - Res. Dev. 23(3), 143–148 (2009)CrossRefGoogle Scholar
- 10.Qian, Y., et al.: A configurable rule based classful token bucket filter network request scheduler for the Lustre file system. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017, pp. 6:1–6:12. ACM (2017)Google Scholar
- 11.Qian, Y., Yi, R., Du, Y., Xiao, N., Jin, S.: Dynamic I/O congestion control in scalable Lustre file system. In: IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST 2013), pp. 1–5, May 2013Google Scholar
- 12.Reed, J., Archuleta, J., Brim, M.J., Lothian, J.: Evaluating dynamic file striping for Lustre. In: Proceedings of the International Workshop on the Lustre Ecosystem: Challenges and Opportunities (2015). http://arxiv.org/html/1506.05323
- 13.Saini, S., Rappleye, J., Chang, J., Barker, D., Mehrotra, P., Biswas, R.: I/O performance characterization of Lustre and NASA applications on Pleiades. In: 19th International Conference on High Performance Computing (HiPC), pp. 1–10 (2012)Google Scholar
- 14.Sakai, K., Sumimoto, S., Kurokawa, M.: High-performance and highly reliable file system for the K computer. Fujitsu Sci. Tech. J. 48(3), 302–309 (2012)Google Scholar
- 15.Schmuck, F., Haskin, R.: GPFS: a shared-disk file system for large computing clusters. In: Proceedings of the 1st USENIX Conference on File and Storage Technologies, FAST 2002, USENIX Association (2002)Google Scholar
- 16.Sumimoto, S.: An overview of Fujitsu’s Lustre based file system. In: Lustre User Group 2011 (2011)Google Scholar
- 17.Uselton, A.: Deploying server-side file system monitoring at NERSC. In: 2009 Cray User Group Meeting (2009)Google Scholar
- 18.Uselton, A., Wright, N.: A file system utilization metric for I/O characterization. In: 2013 Cray User Group Meeting (2013)Google Scholar
- 19.Zhang, X., Davis, K., Jiang, S.: QoS support for end users of I/O-intensive applications using shared storage systems. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2011, pp. 18:1–18:12. ACM (2011)Google Scholar