Parallel application characterization for multiprocessor scheduling policy design

Nguyen, Thu D.; Vaswani, Raj; Zahorjan, John

doi:10.1007/BFb0022294

Parallel application characterization for multiprocessor scheduling policy design

Thu D. Nguyen¹,
Raj Vaswani¹ &
John Zahorjan¹

Conference paper
First Online: 01 January 2005

131 Accesses
14 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1162))

Abstract

Much of the recent work on multiprocessor scheduling disciplines has used abstract workload models to explore the fundamental, high-level properties of the various alternatives. As continuing work on these policies increases their level of sophistication, however, it is clear that the choice of appropriate policies must be guided at least in part by the typical behavior of actual parallel applications. Our goal in this paper is to examine a variety of such applications, providing measurements of properties relevant to scheduling policy design. We give measurements for both hand-coded parallel programs (from the SPLASH benchmark suites) and compiler-parallelized programs (from the PERFECT Club suite) running on a KSR-2 shared-memory multiprocessor.

The measurements we present are intended primarily to address two aspects of multiprocessor scheduling policy design:

In the spectrum between aggressively dynamic and static allocation policies, what is an appropriate choice for the rate at which reallocations should take place?
Is it possible to take measurements of application speedup and efficiency at runtime that are sufficiently accurate to guide allocation decisions?

We address these questions through three sets of measurements:

First, we examine application speedup, and the sources of speedup loss. Our results confirm that there is considerable variation in job speedup, and that the bulk of the speedup loss is due to communication and idleness.
Next, we examine runtime measurement of speedup information. We begin by looking at how such information might be acquired accurately and at acceptable cost. We then investigate the extent to which recent measurements of speedup accurately predict the future, and so the extent to which such measurements might reasonably be expected to guide allocation decisions.
Finally, we examine the durations of individual processor idle periods, and relate these to the cost of reallocating a processor at those times. These results shed light on the potential for aggressively dynamic policies to improve performance.

This work was supported in part by the National Science Foundation (Grants CCR-9123308 and CCR-9200832) and the Washington Technology Center.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

References

A. Agarwal and A. Gupta. Memory-Reference Characteristics of Multiprocessor Applications under MACH. In Proceedings of the ACM SIGMETRICS Conference, pages 215–225, May 1988.
Google Scholar
I. Ashok and J. Zahorjan. Scheduling a Mixed Interactive and Batch Workload on a Parallel, Shared Memory Supercomputer. In Supercomputing '92, pages 616–625, Nov. 1992.
Google Scholar
M. Berry, D. Chen, P. Koss, D. Kuck, S. Lo, Y. Pang, L. Pointer, R. Roloff, A. Sameh, E. Clementi, S. Chin, D. Schneider, G. Fox, P. Messina, D. Walker, C. Hsiung, J. Scharzmeier, K. Lue, S. Orszag, F. Seidl, O. Johnson, R. Goodrum, and J. Martin. The PERFECT Club Benchmarks: Effective Performance Evaluation of Supercomputers. The International Journal of Supercomputer Applications, 3(3):5–40, 1989.
Google Scholar
J. Chen, Y. Endo, K. Chan, D. Mazieres, A. Dias, M. Seltzer, and M. Smith. The Measured Performance of Personal Computer Operating Systems. In Proceedings of the 15th ACM Symposium on Operating system Principles, pages 299–313, Dec. 1995.
Google Scholar
S.-H. Chiang, R. K. Mansharamani, and M. K. Vernon. Use of Application Characteristics and Limited Preemption for Run-To-Completion Parallel Processor Scheduling Policies. In Proceedings of the ACM SIGMETRICS Conference, pages 33–44, May 1994.
Google Scholar
E. C. Cooper and R. P. Draves. C Threads. Technical Report CMU-CS-88-154, Department of Computer Science, Carnegie-Mellon University, June 1988.
Google Scholar
G. Cybenko, L. Kipp, L. Pointer, and D. Kuck. Supercomputer Performance Evaluation and the Perfect Benchmarks. In Proceedings of the 1990 International Conference on Supercomputing, ACM SIGARCH Computer Architecture News, pages 254–266, Sept. 1990.
Google Scholar
R. Cypher, A. Ho, S. Konstantinidou, and P. Messina. Architectural Requirements of Parallel Scientific Applications with Explicit Communication. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 2–13, May 1993.
Google Scholar
F. Darema-Rogers, G. Pfister, and K. So. Memory Access Patterns of Parallel Scientific Programs. In Proceedings of the ACM SIGMETRICS Conference, pages 46–58, May 1987.
Google Scholar
J. J. Dongarra and T. Dunigan. Message-Passing Performance of Various Computers. Technical Report CS-95-299, University of Tennessee, July 1995.
Google Scholar
R. Eigenmann,J. Hoeflinger,Z. Li, and D. Padua. Experience in the Parallelization of Four Perfect-Benchmark Programs. Technical Report 1193, Center for Supercomputing Research and Development, Aug. 1991.
Google Scholar
D. G. Feitelson and B. Nitzberg. Job Characteristics of a Production Parallel Scientific Workload on the NASA Ames iPSC/860. In Proceedings of the IPPS'95 Workshop on Job Scheduling Strategies for Parallel Processing, pages 337–360, Apr. 1995.
Google Scholar
K. Guha. Using Parallel Program Characteristics in Dynamic Processor Allocation Policies. Technical Report CS-95-03, Department of Computer Science, York University, May 1995.
Google Scholar
A. Gupta, A. Tucker, and S. Urushibara. The Impact of Operating System Scheduling Policies and Synchronization Methods on the Performance of Parallel Applications. In Proceedings of the ACM SIGMETRICS Conference, pages 120–133, May 1991.
Google Scholar
A. Karlin, K. Li, M. S. Manasse, and S. Owicki. Empirical Studies of Competitive Spinning for a Shared-Memory Multiprocessor. In Proceedings of the 13th ACM Symposium on Operating Systems Principles, pages 41–55, Oct. 1991.
Google Scholar
Kendall Square Research Inc., 170 Tracer Lane, Waltham, MA 02154. KSR Fortran Programming, 1993.
Google Scholar
S. T. Leutenegger and M. K. Vernon. The Performance of Multiprogrammed Multiprocessor Scheduling Policies. In Proceedings of the ACM SIGMETRICS Conference, pages 226–236, May 1990.
Google Scholar
S.-P. Lo and V. Gligor. A Comparative Analysis of Multiprocessor Scheduling Algorithms. In Proceedings of the 7th International Conference on Distributed Computing Systems, pages 356–63, Sept. 1987.
Google Scholar
S. Majumdar, D. L. Eager, and R. B. Bunt. Scheduling in Multiprogrammed Parallel Systems. In Proceedings of the ACM SIGMETRICS Conference, pages 104–113, May 1988.
Google Scholar
C. McCann, R. Vaswani, and J. Zahorjan. A Dynamic Processor Allocation Policy for Multiprogrammed Shared-Memory Multiprocessors. ACM Transactions on Computer Systems, 11(2):146–178, May 1993.
Article Google Scholar
A. J. Musciano and T. L. Sterling. Efficient Dynamic Scheduling of Medium-Grained Tasks for General Purpose Parallel Processing. In Proceedings of the International Conference on Parallel Processing, pages 166–175, Aug. 1988.
Google Scholar
T. D. Nguyen, R. Vaswani, and J. Zahorjan. Maximizing Speedup Through Self-Tuning of Processor Allocation. In Proceedings of the 10th International Parallel Processing Symposium, pages 463–468, Apr. 1996.
Google Scholar
T. D. Nguyen, R. Vaswani, and J. Zahorjan. Using Runtime Measured Workload Characteristics in Parallel Processor Scheduling. In Proceedings of the IPPS'96 Workshop on Job Scheduling Strategies for Parallel Processing, Apr. 1996.
Google Scholar
T. D. Nguyen, R. Vaswani, and J. Zahorjan. Parallel Application Characterization for Multiprocessor Scheduling Policy Design. Technical report, Department of Computer Science and Engineering, University of Washington, In preparation.
Google Scholar
J. K. Ousterhout. Scheduling Techniques for Concurrent Systems. In Proceedings of 3rd International Conference on Distributed Computing Systems, pages 22–30, Oct. 1982.
Google Scholar
P. Petersen and D. Padua. Machine-Independent Evaluation of Parallelizing Compilers. Technical Report 1173, Center for Supercomputing Research and Development, 1992.
Google Scholar
E. Rothberg, J. P. Singh, and A. Gupta. Working Sets, Cache Sizes, and Node Granularity Issues for Large-Scale Multiprocessors. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 14–25, May 1993.
Google Scholar
K. C. Sevcik. Characterizations of Parallelism in Applications and their Use in Scheduling. In Proceedings of the ACM SIGMETRICS Conference, pages 171–180, May 1989.
Google Scholar
K. C. Sevcik. Application Scheduling and Processor Allocation in Multiprogrammed Parallel Processing Systems. Performance Evaluation, 19(2/3): 107–140, Mar. 1994.
Article Google Scholar
J. P. Singh, W.-D. Weber, and A. Gupta. SPLASH: Stanford Parallel Applications for Shared-Memory. Computer Architecture News, 20(1):5–44, 1992.
Article Google Scholar
R. L. Sites, editor. Alpha Architecture Reference Manual. Digital Press, 1992.
Google Scholar
M. Squillante and E. Lazowska. Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling. IEEE Transactions on Parallel and Distributed Systems, 4(2):131–143, February 1993.
Article Google Scholar
D. Thiebaut and H. S. Stone. Footprints in the Cache. ACM Transactions on Computer Systems, 5(4):305–329, Nov. 1987.
Article Google Scholar
A. Tucker and A. Gupta. Process Control and Scheduling Issues for Multiprogrammed Shared-Memory Multiprocessors. In Proceedings of the 12th ACM Symposium on Operating Systems Principles, pages 159–166, Dec. 1989.
Google Scholar
R. Vaswani and J. Zahorjan. The Implications of Cache Affinity on Processor Scheduling for Multiprogrammed, Shared Memory Multiprocessors. In Proceedings of the 13th ACM Symposium on Operating Systems Principles, pages 26–40, Dec. 1991.
Google Scholar
R. P. Wilson, R. S. French, C. S. Wilson, S. P. Amarasinghe, J. M. Anderson, S. W. K. Tjiang, S.-W. Liao, C.-W. Tseng, M. W. Hall, M. S. Lam, and J. L. Hennessy. SUIF: An Infrastructure for Research on Parallelizing and Optimizing Comilers. Technical report, Computer Systems Laboratory, Stanford Univeristy.
Google Scholar
S. C.Woo,M. Ohara,E. Torrie,J. P.Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings 22nd Annual International Symposium on Computer Architecture, pages 24–36, June 1995.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Washington, Box 352350, 98195-2350, Seattle, WA, USA
Thu D. Nguyen, Raj Vaswani & John Zahorjan

Authors

Thu D. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Raj Vaswani
View author publications
You can also search for this author in PubMed Google Scholar
John Zahorjan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Dror G. Feitelson Larry Rudolph

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, T.D., Vaswani, R., Zahorjan, J. (1996). Parallel application characterization for multiprocessor scheduling policy design. In: Feitelson, D.G., Rudolph, L. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 1996. Lecture Notes in Computer Science, vol 1162. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0022294

Download citation

DOI: https://doi.org/10.1007/BFb0022294
Published: 15 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61864-5
Online ISBN: 978-3-540-70710-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics