Advertisement

System Software for Many-Core and Multi-core Architecture

  • Atsushi Hori
  • Yuichi Tsujita
  • Akio Shimada
  • Kazumi Yoshinaga
  • Namiki Mitaro
  • Go Fukazawa
  • Mikiko Sato
  • George Bosilca
  • Aurélien Bouteiller
  • Thomas Herault
Chapter

Abstract

In this project, the software technologies for the post-peta scale computing were explored. More specifically, OS technologies for heterogeneous architectures, lightweight thread, scalable I/O, and fault mitigation were investigated. As for the OS technologies, a new parallel execution model, Partitioned Virtual Address Space (PVAS), for the many-core CPU was proposed. For the heterogeneous architectures, where multi-core CPU and many-core CPU are connected with an I/O bus, an extension of PVAS, Multiple-PVAS, to have a unified virtual address space of multi-core and many-core CPUs was proposed. The proposed PVAS was also enhanced to have multiple processes where process context switch can take place at the user level (named User-Level Process: ULP). As for the scalable I/O, EARTH, optimization techniques for MPI collective I/O, was proposed. Lastly, for the fault mitigation, User Level Fault Mitigation, ULFM was improved to have faster agreement process, and sliding methods to substitute failed nodes with spare nodes was proposed. The funding of this project was ended in 2016; however, many proposed technologies are still being propelled.

References

  1. 1.
    Angskun, T., Bosilca, G., Dongarra, J.J.: Binomial graph: a scalable and fault-tolerant logical network topology. In: Parallel and Distributed Processing and Applications, 5th International Symposium, ISPA 2007, Niagara Falls, 29–31 Aug 2007, pp. 471–482 (2007)Google Scholar
  2. 2.
    Bland, W., Bouteiller, A., Herault, T., Bosilca, G., Dongarra, J.: Post-failure recovery of MPI communication capability: Design and rationale. Int. J. High Perform. Comput. Appl. 27(3), 244–254 (2013). https://doi.org/10.1177/1094342013488238 CrossRefGoogle Scholar
  3. 3.
    Bouteiller, A., Bosilca, G., Dongarra, J.J.: Plan b: Interruption of ongoing MPI operations to support failure recovery. In: Proceedings of the 22Nd European MPI Users’ Group Meeting, EuroMPI ’15, pp. 11:1–11:9. ACM, New York (2015). https://doi.org/10.1145/2802658.2802668
  4. 4.
    Brightwell, R., Pedretti, K., Hudson, T.: SMARTMAP: operating system support for efficient data sharing among processes on a multi-core processor. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC ’08, pp. 25:1–25:12. IEEE Press, Piscataway (2008). http://dl.acm.org/citation.cfm?id=1413370.1413396
  5. 5.
    Fukazawa, G.: Multiple PVAS: a systems software for HPC application programs on multi-core and many-core. Master Thesis, in Japanese (2014)Google Scholar
  6. 6.
    Herault, T., Bouteiller, A., Bosilca, G., Gamell, M., Teranishi, K., Parashar, M., Dongarra, J.: Practical scalable consensus for pseudo-synchronous distributed systems. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’15, pp. 31:1–31:12. ACM, New York (2015) https://doi.org/10.1145/2807591.2807665
  7. 7.
    Hori, A., Si, M., Gerofi, B., Takagi, M., Dayal, J., Balaji, P., Ishikawa, Y.: Process-in-process: techniques for practical address-space sharing. In: The 27th International Symposium on High-Performance Parallel and Distributed Computing (HPDC’18). ACM (2018)Google Scholar
  8. 8.
    Hori, A., Yoshinaga, K., Herault, T., Bouteiller, A., Bosilca, G., Ishikawa, Y.: Sliding Substitution of Failed Nodes. In: Proceedings of the 22Nd European MPI Users’ Group Meeting, EuroMPI ’15, pp. 14:1–14:10. ACM, New York (2015). https://doi.org/10.1145/2802658.2802670
  9. 9.
    Matsuoka, S.: The road to tsubame and beyond. In: Resch, M., Roller, S., Lammers, P., Furui, T., Galle, M., Bez, W. (eds.) High Performance Computing on Vector Systems 2007, pp. 265–267. Springer, Berlin/Heidelberg (2008)CrossRefGoogle Scholar
  10. 10.
    Nakashima, J., Taura, K.: MassiveThreads: A Thread Library for High Productivity Languages, pp. 222–238. Springer, Berlin/Heidelberg (2014). https://doi.org/10.1007/978-3-662-44471-9_10 Google Scholar
  11. 11.
    Pérache, M., Jourdren, H., Namyst, R.: MPC: a unified parallel runtime for clusters of NUMA machines. In: Proceedings of the 14th International Euro-Par Conference on Parallel Processing, Euro-Par’08, pp. 78–88. Springer, Berlin/Heidelberg (2008). http://doi.org/10.1007/978-3-540-85451-7_9
  12. 12.
    Reinders, J.: An Overview of Programming for Intel Xeon processors and Intel Xeon Phi coprocessors (2012)Google Scholar
  13. 13.
    del Rosario, J.M., Bordawekar, R., Choudhary, A.: Improved parallel i/o via a two-phase run-time access strategy. SIGARCH Comput. Archit. News 21(5), 31–38 (1993). https://doi.org/http://doi.acm.org/10.1145/165660.165667 CrossRefGoogle Scholar
  14. 14.
    Sato, M., Fukazawa, G., Shimada, A., Hori, A., Ishikawa, Y., Namiki, M.: Design of multiple pvas on infiniband cluster system consisting of many-core and multi-core. In: Proceedings of the 21st European MPI Users’ Group Meeting, EuroMPI/ASIA ’14, pp. 133:133–133:138. ACM, New York (2014). https://doi.org/10.1145/2642769.2642795.
  15. 15.
    Sato, M., Fukazawa, G., Yoshinaga, K., Tsujita, Y., Hori, A., Namiki, M.: A hybrid operating system for a computing node with multi-core and many-core processors. In: International Journal Advanced mputer Science (IJACSci), vol. 3, pp. 368–377 (2013)Google Scholar
  16. 16.
    Schneider, T., Gerstenberger, R., Hoefler, T.: Micro-Applications for Communication Data Access Patterns and MPI Datatypes. In: Recent Advances in the Message Passing Interface – 19th European MPI Users’ Group Meeting, EuroMPI 2012, Vienna, Austria, 23–26 Sept 2012. Proceedings, vol. 7490, pp. 121–131. Springer (2012)Google Scholar
  17. 17.
    Shimada, A.: A study on task models for high-performance and efficient intra-node communication in many-core environments. Ph.D. thesis, Keio University (2017). (in Japanese)Google Scholar
  18. 18.
    Shimada, A., Gerofi, B., Hori, A., Ishikawa, Y.: Pgas intra-node communication towards many-core architecture. In: In PGAS 2012: 6th Conference on Partitioned Global Address Space Programing Model, PGAS’12 (2012)Google Scholar
  19. 19.
    Shimada, A., Gerofi, B., Hori, A., Ishikawa, Y.: Proposing a new task model towards many-core architecture. In: Proceedings of the First International Workshop on Many-core Embedded Systems, MES ’13, pp. 45–48. ACM, New York (2013). https://doi.org/10.1145/2489068.2489075. http://doi.acm.org/10.1145/2489068.2489075
  20. 20.
    Shimada, A., Hori, A., Ishikawa, Y.: Eliminating costs for crossing process boundary from mpi intra-node communication. In: Proceedings of the 21st European MPI Users’ Group Meeting, EuroMPI/ASIA ’14, pp. 119:119–119:120. ACM, New York (2014). https://doi.org/10.1145/2642769.2642790
  21. 21.
    Shimada, A., Hori, A., Ishikawa, Y., Balaji, P.: User-level process towards exascale systems 2014(22), 1–7 (2014). http://ci.nii.ac.jp/naid/110009850784/ Google Scholar
  22. 22.
    Shimosawa, T., Gerofi, B., Takagi, M., Shirasawa, T., Shimizu, M., Hori, A., Ishikawa, Y.: Interface for Heterogeneous Kernels: A Framework to Enable Hybrid OS Designs Tergeting High Performance Computing. In: IEEE International Conference on High Performance Computing (HiPC). IEEE (2014)Google Scholar
  23. 23.
    Thakur, R., Lusk, E., Gropp, W.: Users Guide for ROMIO: A High-Performance, Portable MPI-IO Implementation. Technical Memorandum ANL/MCS-TM-234, Argonne National Laboratory (2004)Google Scholar
  24. 24.
    Tsujita, Y., Hori, A., Ishikawa, Y.: Locality-aware process mapping for high performance collective MPI-IO on FEFS with Tofu interconnect. In: Proceedings of the 21th European MPI Users’ Group Meeting, pp. 157–162. ACM (2014). Challenges in Data-Centric Computing. https://doi.org/10.1145/2642769.2642799
  25. 25.
    Tsujita, Y., Hori, A., Kameyama, T., Uno, A., Shoji, F., Ishikawa, Y.: Improving collective MPI-IO using topology-aware stepwise data aggregation with I/O throttling. In: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, pp. 12–23. ACM (2018). https://doi.org/10.1145/3149457.3149464
  26. 26.
    Yokokawa, M., Shoji, F., Uno, A., Kurokawa, M., Watanabe, T.: The K Computer: Japanese Next-Generation Supercomputer Development Project. In: Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design, ISLPED ’11, pp. 371–372. IEEE Press, Piscataway (2011). http://dl.acm.org/citation.cfm?id=2016802.2016889

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Atsushi Hori
    • 1
  • Yuichi Tsujita
    • 2
  • Akio Shimada
    • 3
  • Kazumi Yoshinaga
    • 4
  • Namiki Mitaro
    • 5
  • Go Fukazawa
    • 7
  • Mikiko Sato
    • 6
  • George Bosilca
    • 8
  • Aurélien Bouteiller
    • 8
  • Thomas Herault
    • 8
  1. 1.RIKENR-CCSMinato-kuJapan
  2. 2.RIKENR-CCSKobeJapan
  3. 3.Research and Development GroupHitachi, Ltd.YokohamaJapan
  4. 4.eF-4 Co., Ltd.Meguro-kuJapan
  5. 5.Tokyo University of Agriculture and TechnologyKoganei-shiJapan
  6. 6.Department of Embedded Technology, School of Information and Telecommunication EngineeringTokai UniversityMinato-kuJapan
  7. 7.Yamaha Corp.Naka-kuJapan
  8. 8.Innovative Computing LaboratoryUniversity of TennesseeKnoxvilleUSA

Personalised recommendations