Designing Parallel Operating Systems via Parallel Programming

Frachtenberg, Eitan; Davis, Kei; Petrini, Fabrizio; Fernandez, Juan; Sancho, José Carlos

doi:10.1007/978-3-540-27866-5_90

Eitan Frachtenberg¹⁹,
Kei Davis¹⁹,
Fabrizio Petrini¹⁹,
Juan Fernandez¹⁹ &
…
José Carlos Sancho¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3149))

Included in the following conference series:

European Conference on Parallel Processing

464 Accesses
1 Citations

Abstract

Ever-increasing demand for computing capability is driving the construction of ever-larger computer clusters, soon to be reaching tens of thousands of processors. Many functionalities of system software have failed to scale accordingly – systems are becoming more complex, less reliable, and less efficient. Our premise is that these deficiencies arise from a lack of global control and coordination of the processing nodes. In practice, current parallel machines are loosely-coupled systems that are used for solving inherently tightly-coupled problems. This paper demonstrates that existing and future systems can be made more scalable by using BSP-like parallel programming principles in the design and implementation of the system software, and by taking full advantage of the latest interconnection network hardware. Moreover, we show that this approach can also yield great improvements in efficiency, reliability, and simplicity.

This work is partially supported by the U.S. Department of Energy through Los Alamos National Laboratory contract W-7405-ENG-36 and the Spanish MCYT under grant TIC2003-08154-C06-03.

Download to read the full chapter text

Chapter PDF

Parallel and Distributed Computing

Topic 9: Parallel and Distributed Programming

Scalability and efficiency challenges for the exascale supercomputing system: practice of a parallel supporting environment on the Sunway exascale prototype system

Article 23 January 2023

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Brightwell, R., Fisk, L.A.: Scalable Parallel Application Launch on Cplant. In: Proceedings of IEEE/ACM Supercomputing 2001 (SC 2001), Denver, CO (November 2001)
Google Scholar
Buntinas, D., Panda, D., Duato, J., Sadayappan, P.: Broadcast/Multicast over Myrinet using NIC-Assisted Multidestination Messages. In: Falsafi, B., Lauria, M. (eds.) CANPC 2000. LNCS, vol. 1797, pp. 115–129. Springer, Heidelberg (2000)
Chapter Google Scholar
Petrini, F., Feng, W.-c.: Buffered Coscheduling: A New Methodology for Multitasking Parallel Jobs on Distributed Systems. In: Proceedings of the International Parallel and Distributed Processing Symposium 2000, Cancun, MX, May 2000, vol. 16 (2000)
Google Scholar
Fernandez, J., Petrini, F., Frachtenberg, E.: BCS MPI: A New Approach in the System Software Design for Large-Scale Parallel Computers. In: Proceedings of IEEE/ACM Supercomputing 2003 (SC 2003), Phoenix, AZ (November 2003)
Google Scholar
Frachtenberg, E., Feitelson, D., Petrini, F., Fernandez, J.: Flexible CoScheduling: Mitigating Load Imbalance and Improving Utilization of Heterogeneous Resources. In: Proceedings of the International Parallel and Distributed Processing Symposium 2003 (IPDPS 2003), Nice, France (April 2003)
Google Scholar
Frachtenberg, E., Feitelson, D.G., Fernandez, J., Petrini, F.: Parallel Job Scheduling under Dynamic Workloads. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 208–227. Springer, Heidelberg (2003)
Chapter Google Scholar
Frachtenberg, E., Petrini, F., Fernandez, J., Pakin, S., Coll, S.: STORM: Lightning-Fast Resource Management. In: Proceedings of IEEE/ACM Supercomputing 2002 (SC 2002), Baltimore, MD (November 2002)
Google Scholar
Gupta, M.: Challenges in Developing Scalable Scalable Software for BlueGene/L. In: Scaling to New Heights Workshop, Pittsburgh, PA (May 2002)
Google Scholar
Hendriks, E.: BProc: The Beowulf Distributed Process Space. In: Proceedings of the 16th Annual ACM International Conference on Supercomputing, New York, NY (June 2002)
Google Scholar
Hori, A., Tezuka, H., Ishikawa, Y.: Overhead Analysis of Preemptive Gang Scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) Job Scheduling Strategies for Parallel Processing, Springer, Heidelberg (1998)
Google Scholar
Kamada, T., Matsuoka, S., Yonezawa, A.: Efficient Parallel Global Garbage Collection on Massively Parallel Computers. In: Johnson, G.M. (ed.) Proceedings of IEEE/ACM Supercomputing 1994, SC 1994 (1994)
Google Scholar
Lamport, L.: How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Transactions on Computers C-28(9) (September 1979)
Google Scholar
Liu, J., Mamidala, A., Panda, D.K.: Fast and Scalable MPI-Level Broadcast using Infiniband’s Hardware Multicast Support. In: Proceedings of the 18th International Parallel & Distributed Processing Symposium, Santa Fe, New Mexico (April 2004)
Google Scholar
Petrini, F., Feng, W., Hoisie, A., Coll, S., Frachtenberg, E.: The Quadrics Network: High-Performance Clustering Technology. IEEE Micro 22(1), January/February (2002)
Google Scholar
Petrini, F., Kerbyson, D., Pakin, S.: The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q. In: Proceedings of IEEE/ACM Supercomputing 2003 (SC 2003), Phoenix, AZ (November 2003)
Google Scholar
Sancho, J.C., Petrini, F., Johnson, G., Fernández, J., Frachtenberg, E.: On the Feasibility of Incremental Checkpointing for Scientific Computing. In: Proceedings of the 18th International Parallel & Distributed Processing Symposium, Santa Fe, New Mexico (April 2004)
Google Scholar
Shivam, P., Wyckoff, P., Panda, D.: EMP: Zero-copy OS-bypass NIC-driven Gigabit Ethernet Message Passing. In: Proceedings of IEEE/ACM Supercomputing 2001 (SC 2001), Denver, CO (November 2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Los Alamos National Laboratory, Los Alamos, NM, 87545, USA
Eitan Frachtenberg, Kei Davis, Fabrizio Petrini, Juan Fernandez & José Carlos Sancho

Authors

Eitan Frachtenberg
View author publications
You can also search for this author in PubMed Google Scholar
Kei Davis
View author publications
You can also search for this author in PubMed Google Scholar
Fabrizio Petrini
View author publications
You can also search for this author in PubMed Google Scholar
Juan Fernandez
View author publications
You can also search for this author in PubMed Google Scholar
José Carlos Sancho
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

No Affiliations,
Marco Danelutto
Computer Science Department, University of Pisa, Largo B. Pontecorvo 3, 56127, Pisa, Italy
Marco Vanneschi
Information Science and Technologies Institute (ISTI) The Italian National Research Council (CNR), Area della Ricerca, Via Giuseppe Moruzzi, 1, I-56126, Pisa, Italy
Domenico Laforenza

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Frachtenberg, E., Davis, K., Petrini, F., Fernandez, J., Sancho, J.C. (2004). Designing Parallel Operating Systems via Parallel Programming. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds) Euro-Par 2004 Parallel Processing. Euro-Par 2004. Lecture Notes in Computer Science, vol 3149. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27866-5_90

Download citation

DOI: https://doi.org/10.1007/978-3-540-27866-5_90
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22924-7
Online ISBN: 978-3-540-27866-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Designing Parallel Operating Systems via Parallel Programming

Abstract

Chapter PDF

Similar content being viewed by others

Parallel and Distributed Computing

Topic 9: Parallel and Distributed Programming

Scalability and efficiency challenges for the exascale supercomputing system: practice of a parallel supporting environment on the Sunway exascale prototype system

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Designing Parallel Operating Systems via Parallel Programming

Abstract

Chapter PDF

Similar content being viewed by others

Parallel and Distributed Computing

Topic 9: Parallel and Distributed Programming

Scalability and efficiency challenges for the exascale supercomputing system: practice of a parallel supporting environment on the Sunway exascale prototype system

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation