Skip to main content
Log in

Dawning Nebulae: A PetaFLOPS Supercomputer with a Heterogeneous Structure

  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Dawning Nebulae is a heterogeneous system composed of 9280 multi-core x86 CPUs and 4640 NVIDIA Fermi GPUs. With a Linpack performance of 1.271 petaFLOPS, it was ranked the second in the TOP500 List released in June 2010. In this paper, key issues in the system design of Dawning Nebulae are introduced. System tuning methodologies aiming at petaFLOPS Linpack result are presented, including algorithmic optimization and communication improvement. The design of its file I/O subsystem, including HVFS and the underlying DCFS3, is also described. Performance evaluations show that the Linpack efficiency of each node reaches 69.89%, and 1024-node aggregate read and write bandwidths exceed 100 GB/s and 70GB/s respectively. The success of Dawning Nebulae has demonstrated the viability of CPU/GPU heterogeneous structure for future designs of supercomputers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Compute unified device architecture. http://www.nvidia.com/object/cuda home new.html, 2011.

  2. Petitet A, Whaley R C, Dongarra J, Cleary A. HPL — A portable implementation of the high performance Linpack benchmark for distributed memory computers, version 2.0. http://www.netlib.org/benchmark/hpl/, Sept. 2008.

  3. Fatica M. Accelerating Linpack with CUDA on heterogenous clusters. In Proc. the 2nd Workshop on General Purpose Processing on Graphics Processing Units (GPGPU-2), Washington DC, USA, Mar. 8, 2009, pp.46-51.

  4. Tan G, Sun N, Gao G R. Improving performance of dynamic programming via parallelism and locality on multi-core architectures. IEEE Transactions on Parallel and Distributed Systems, 2009, 20(2): 261–274.

    Article  Google Scholar 

  5. Nagle D, Serenyi D, Matthews A. The Panasas ActiveScale storage cluster — Delivering scalable high bandwidth storage. In Proc. 2004 IEEE/ACM High Performance Computing, Networking and Storage Conference (SC2004), Pittsburgh, USA, Nov. 6–12, 2004, p.53.

  6. Shvachko K, Huang H, Radia S, Chansler R. The Hadoop distributed file system. In Proc. the 26th IEEE (MSST2010) Symposium on Massive Storage Systems and Technologies (Research Track), Inchine Village, USA, May 3–7, 2010.

  7. Schmuck F, Haskin R. GPFS: A shared-disk file system for large computing clusters. In Proc. the First USENIX Conference on File and Storage Technologies (FAST2002), Monterey, USA, Jan. 28–30, 2002, Article No.19.

  8. Braam P J. The Lustre storage architecture. White Paper, Cluster File Systems, Inc., Oct. 2003.

  9. http://www.pvfs.org/, 2011.

  10. IBM Tivoli SANergy administrator’s guide, Version 3, Release 2. IBM Corporation, Oct. 2002.

  11. http://www.quantum.com/Products/Software/StorNext/Index.aspx.

  12. http://www.datadomain.com/, 2011.

  13. Ghemawat S, Gobioff H, Leung S T. The Google file system. In Proc. the 19th ACM Symp. Operating Systems Principles (SOSP 2003), New York, USA, Oct. 19–22, 2003, pp.29-43.

  14. http://ceph.newdream.net/, 2011.

  15. Patil S, Gibson G. GIGA+: Scalable directories for shared file systems. Carnegie Mellon University Parallel Data Lab, Technical Report CMU-PDL-08-110, Oct. 2008.

  16. Xing J, Xiong J, Sun N, Ma J. Adaptive and scalable metadata management to support a trillion files. In Proc. the SC2009, Portland, USA, Nov. 14–20, 2009, Article No. 26.

  17. Fagin R, Nievergelt J, Pippenger N, Strong H R. Extendible hashing — A fast access method for dynamic files. ACM Trans. Database Systems, Sept. 1979, 4(3): 315–344.

    Google Scholar 

  18. Zhou Y, Chen Z, Li K. Second-level buffer cache management. IEEE Transactions on Parallel and Distributed Systems, Jun. 2004, 15(6): 505–519.

  19. Chen Z, Zhang Y, Zhou Y, Scott H, Schiefer B. Empirical evaluation of multi-level buffer cache collaboration for storage systems. In Proc. Int. Conf. Measurements and Modeling of Computer Systems (SIGMETRICS 2005), Banff, Canada, Jun. 6–10, 2005, pp.145-156.

  20. Li X, Aboulnaga A, Salem K, Sachedina A, Gao S. Secondtier cache management using write hints. In Proc. the 4th USENIX Conference on File and Storage Technologies (FAST 2005), San Francisco, USA, Dec. 13–16, 2005, pp.115–127.

  21. Jiang S, Zhang X. ULC: A file block placement and replacement protocol to efficiently exploit hierarchical locality in multi-level buffer caches. In Proc. the 24th International Conference on Distributed Computing Systems (ICDCS 2004), Tokyo, Japan, Mar. 24–26, 2004, pp.168-177.

  22. Yadgar G, Factor M, Li K, Schuster A. MC2: Multiple clients on a multilevel cache. In Proc. the 28th International Conference on Distributed Computing Systems (ICDCS 2008), Beijing, China, Jun. 17–20, 2008, pp.722–730.

  23. Li C, Shen K. Managing prefetch memory for data-intensive online servers. In Proc. the 4th USENIX Conference on File and Storage Technologies (FAST 2005), San Francisco, USA, Dec. 13–16, 2005, pp.253–266.

  24. Li C, Shen K, Papathanasiou A. Competitive prefetching for concurrent sequential I/O. In Proc. EuroSys 2007 Conference, Lisbon, Portugal, Mar. 21–23, 2007, pp.189–202.

  25. Liang S, Jiang S, Zhang X. STEP: Sequentiality and thrashing detection based prefetching to improve performance of networked storage servers. In Proc. the 27th International Conference on Distributed Computing Systems (ICDCS 2007), Toronto, Canada, Jun. 25–29, 2007, Article No. 64.

  26. Zhang Z, Lee K, Ma X, Zhou Y. PFC: Transparent optimization of existing prefetching strategies for multi-level storage systems. In Proc. the 28th International Conference on Distributed Computing Systems (ICDCS 2008), Beijing, China, Jun. 17–20, 2008, pp.740–751.

  27. Li M, Varki E, Bhatia S, Merchant A. TaP: Table-based prefetching for storage caches. In Proc. the 6th USENIX Conference on File and Storage Technologies (FAST 2008), San Jose, USA, Feb. 26–29, 2008, Article No. 6.

  28. Nisar, W Liao, A Choudhary. Scaling parallel I/O performance through I/O delegate and caching system. In Proc. the 2008 International Conference on for High Performance Computing, Networking, Storage and Analysis (SC 2008), Austin, USA, Nov. 15–21, 2008, Article No. 9.

  29. Chen Y, Byna S, Sun X, Thakur R, Gropp W. Hiding I/O latency with pre-execution prefetching for parallel applications. In Proc. the 2008 International Conference for High Performance Computing, Networking, Storage and Analysis (SC2008), Austin, USA, Nov. 15–21, 2008, No. 40.

  30. Byna S, Chen Y, Sun X, Thakur R, Gropp W. Parallel I/O prefetching using MPI file caching and I/O signatures. In Proc. the 2008 International Conference for High Performance Computing, Networking, Storage and Analysis (SC2008), Austin, USA, Nov. 15–21, 2008, Article No. 44.

  31. Chen H, Xiong J, Sun N. A novel hint-based I/O mechanism for centralized file server of cluster. In Proc. 2008 IEEE International Conference on Cluster Computing (Cluster 2008), Tsukuba, Japan, Sept. 29–Oct. 1, 2008, pp.194–201.

  32. Norcott W D. Iozone file system benchmark. 2005, http://www.iozone.org/docs/IOzone msword 98.pdf.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ning-Hui Sun.

Additional information

This work is supported by the National Hi-Tech Research and Development 863 Program of China under Grant No. 2009AA01A129, the National Natural Science Foundation of China under Grant Nos. 60633040, 60803030, 61033009 the National Basic Research 973 Program of China under Grant No. 2011CB302500, the National Natural Science Foundation for Distinguished Young Scholars of China under Grant No. 60925009, and the Foundation for Innovative Research Groups of the National Natural Science Foundation of China under Grant No. 60921002.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(PDF 86.0 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, NH., Xing, J., Huo, ZG. et al. Dawning Nebulae: A PetaFLOPS Supercomputer with a Heterogeneous Structure. J. Comput. Sci. Technol. 26, 352–362 (2011). https://doi.org/10.1007/s11390-011-1138-3

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-011-1138-3

Keywords

Navigation