FFMK: An HPC OS Based on the L4Re Microkernel
The German research project FFMK aims to build a new HPC operating system platform that addresses hardware and software challenges posed by future exascale systems. These challenges include massively increased parallelism (e.g., nodes and cores), overcoming performance variability, and most likely higher failure rates due to significantly increased component counts. We also expect more complex applications and the need to manage system resources in a more dynamic way than on contemporary HPC platforms, which assign resources to applications statically. The project combines and adapts existing system-software building blocks that have already matured and proven themselves in other areas. At the lowest level, the architecture is based on a microkernel to provide an extremely lightweight and fast execution environment that leaves as many resources as possible to applications. An instance of the microkernel controls each compute node, but it is complemented by a virtualized Linux kernel that provides device drivers, compatibility with existing HPC infrastructure, and rich support for programming models and HPC runtimes such as MPI . Above the level of individual nodes, the system architecture includes distributed performance and health monitoring services as well as fault-tolerant information dissemination algorithms that enable failure handling and dynamic load management. In this chapter, we will give an overview of the overall architecture of the FFMK operating system platform. However, the focus will be on the microkernel and how it integrates with Linux to form a multi-kernel operating system architecture.
We would like to thank the German priority program 1648 “Software for Exascale Computing” for supporting the project FFMK (FFMK 2019), the ESF-funded project microHPC (microHPC 2019), and the cluster of excellence “Center for Advancing Electronics Dresden” (cfaed). We also acknowledge the Julich Supercomputing Centre, the Gauss Centre for Supercomputing, and the John von Neumann Institute for Computing for providing compute time on the JUQUEEN and JURECA supercomputers. We would also like to deeply thank TU Dresden’s ZIH for allowing us bare metal access to nodes of their Taurus system, as well as all our fellow researchers in the FFMK project for their advise, contributions, and friendly collaboration.
- Andersen, E. (2010). \(\mu \)Clibc. https://uclibc.org.
- Barak, A., Drezner, Z., Levy, E., Lieber, M., & Shiloh, A. (2015). Resilient gossip algorithms for collecting online management information in exascale clusters. Concurrency and Computation: Practice and Experience, 27(17), 4797–4818.Google Scholar
- Beckman, P. et al. (2015). Argo: An exascale operating system. http://www.argo-osr.org/. Accessed 20 Nov 2015.
- Döbel, B., & Härtig, H. (2014). Can we put concurrency back into redundant multithreading? Proceedings of the 14th International Conference on Embedded Software, EMSOFT 2014 (pp. 19:1–19:10). USA: ACM.Google Scholar
- Döbel, B., Härtig, H., & Engel, M. (2012). Operating system support for redundant multithreading. Proceedings of the Tenth ACM International Conference on Embedded Software EMSOFT 2012 (pp. 83–92). USA: ACM.Google Scholar
- FFMK. FFMK Project Website. https://ffmk.tudos.org. Accessed 01 Feb 2018.
- Gerofi, B., Takagi, M., Hori, A., Nakamura, G., Shirasawa, T., & Ishikawa, Y. (2016). On the scalability, performance isolation and device driver transparency of the IHK/McKernel hybrid lightweight kernel. 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (pp. 1041–1050).Google Scholar
- Graham, R. L., Woodall, T. S., & Squyres, J. M. (2005). Open MPI: A flexible high performance MPI. Proceedings, 6th Annual International Conference on Parallel Processing and Applied Mathematics. Poland: Poznan.Google Scholar
- Härtig, H., & Roitzsch, M. (2006). Ten Years of Research on L4-Based Real-Time. Proceedings of the Eighth Real-Time Linux Workshop. China: Lanzhou.Google Scholar
- Härtig, H., Hohmuth, M., Liedtke, J., Schönberg, S., & Wolter, J. (1997). The performance of \(\mu \)-kernel-based systems. SOSP 1997: Proceedings of the sixteenth ACM symposium on Operating systems principles (pp. 66–77). USA: ACM Press.Google Scholar
- Hoefler, T., Schneider, T., & Lumsdaine, A. (2010). Characterizing the influence of system noise on large-scale applications by simulation. Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010. USA: IEEE Computer Society.Google Scholar
- Lackorzynski, A., & Warg, A. (2009). Taming subsystems: capabilities as universal resource access control in L4. IIES 2009: Proceedings of the Second Workshop on Isolation and Integration in Embedded Systems (pp. 25–30). USA: ACM.Google Scholar
- Lackorzynski, A., Weinhold, C., & Härtig, H. (2016a). Combining predictable execution with full-featured commodity systems. Proceedings of OSPERT2016, the 12th Annual Workshop on Operating Systems Platforms for Embedded Real-Time Applications OSPERT 2016 (pp. 31–36).Google Scholar
- Lackorzynski, A., Weinhold, C., & Härtig, H. (2016b). Decoupled: Low-effort noise-free execution on commodity system. Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2016. USA: ACM.Google Scholar
- Lackorzynski, A., Weinhold, C., & Härtig, H. (2017). Predictable low-latency interrupt response with general-purpose systems. Proceedings of OSPERT2017, the 13th Annual Workshop on Operating Systems Platforms for Embedded Real-Time Applications OSPERT 2017 (pp. 19–24).Google Scholar
- Lawrence Livermore National Laboratory. The FTQ/FWQ Benchmark.Google Scholar
- Levy, E., Barak, A., Shiloh, A., Lieber, M., Weinhold, C., & Härtig, H. (2014). Overhead of a decentralized gossip algorithm on the performance of HPC applications. Proceedings of the ROSS 2014 (pp. 10:1–10:7). New York: ACM.Google Scholar
- Lieber, M., Grützun, V., Wolke, R., Müller, M. S., & Nagel, W. E. (2012). Highly scalable dynamic load balancing in the atmospheric modeling system COSMO-SPECS+FD4. Proceedings of the PARA 2010 (Vol. 7133, pp. 131–141). Berlin: Springer.Google Scholar
- Liedtke, J. (1995). On micro-kernel construction. SOSP 1995: Proceedings of the fifteenth ACM symposium on Operating systems principles (pp. 237–250). USA: ACM Press.Google Scholar
- microHPC. microHPC Project Website. https://microhpc.tudos.org. Accessed 01 Feb 2018.
- mvapichweb. MVAPICH: MPI over InfiniBand. http://mvapich.cse.ohio-state.edu/. Accessed 29 Jan 2017.
- Reussner, R., Sanders, P., & Larsson Träff, J. (2002). SKaMPI: a comprehensive benchmark for public benchmarking of MPI (pp. 10:55–10:65).Google Scholar
- Seelam, S., Fong, L., Tantawi, A., Lewars, J., Divirgilio, J., & Gildea, K. (2010). Extreme scale computing: Modeling the impact of system noise in multicore clustered systems. 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS).Google Scholar
- Singaravelu, L., Pu, C., Härtig, H., & Helmuth, C. (2006). Reducing TCB complexity for security-sensitive applications: three case studies. Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006, EuroSys 2006 (pp. 161–174). USA: ACM.Google Scholar
- The CP2K Developers Group. Open source molecular dynamics. http://www.cp2k.org/. Accessed 20 Nov 2015.
- Weinhold, C. & Härtig, H. (2011). jVPFS: adding robustness to a secure stacked file system with untrusted local storage components. Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference, USENIXATC 2011, (p. 32). USA: USENIX Association.Google Scholar
- Weinhold, C., & Härtig, H. (2008). VPFS: building a virtual private file system with a small trusted computing base. Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008, Eurosys 2008 (pp. 81–93). USA: ACM.Google Scholar
- Weinhold, C., Lackorzynski, A., Bierbaum, J., Küttler, M., Planeta, M., Härtig, H., et al. (2016). Ffmk: A fast and fault-tolerant microkernel-based system for exascale computing. Software for Exascale Computing—SPPEXA 2013–2015 (Vol. 113, pp. 405–426).Google Scholar
- XtreemFS. XtreemFS - a cloud file system. http://www.xtreemfs.org. Accessed 16 May 2018.