Skip to main content

Understanding Co-run Degradations on Integrated Heterogeneous Processors

  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8967))

Abstract

Co-runs of independent applications on systems with heterogeneous processors are common (data centers, mobile devices, etc.). There has been limited understanding on the influence of co-runners on such systems. The previous studys on this topic are on simulators with limited settings.

In this work, we conduct a comprehensive investigation of the performance of co-running jobs on integrated heterogeneous processors. The investigation produces a list of interesting and counter-intuitive findings. It reveals some critical design issues in modern operating systems in supporting heterogeneous processors, and suggests some potential solutions at the levels of program transformation and OS design.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Markatos, E.P., LeBlanc, T.J.: Using processor affinity in loop scheduling on shared-memory multiprocessors. IEEE Trans. Parallel Distrib. Syst. 5(4), 379–400 (1994)

    Article  Google Scholar 

  2. Squillante, M.S., Lazowska, E.D.: Using processor-cache affinity information in shared-memory multiprocessor scheduling. IEEE Trans. Parallel Distrib. Syst. 4(2), 131–143 (1993)

    Article  Google Scholar 

  3. Gelado, I., Stone, J.E., Cabezas, J., et al.: An asymmetric distributed shared memory model for heterogeneous parallel systems. ACM SIGARCH Comput. Archit. News (ACM) 38(1), 347–358 (2010)

    Article  Google Scholar 

  4. George, V., Engineer, S.P., Piazza, T., et al.: Technology Insight: Intel Next Generation Microarchitecture Codename Ivy Bridge (2011)

    Google Scholar 

  5. Amd, APP SDK 2.4. http://developer.amd.com/amd-license-agreement/?f=AMD-APP-SDK-v2.4-Windows-64.exe

  6. Jiang, Y., Shen, X., Chen, J., et al.: Analysis and approximation of optimal co-scheduling on chip multiprocessors. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 220–229. ACM (2008)

    Google Scholar 

  7. Tian, K., Jiang, Y., Shen, X.: A study on optimally co-scheduling jobs of different lengths on chip multiprocessors. In: Proceedings of the 6th ACM Conference on Computing Frontiers, pp. 41–50. ACM (2009)

    Google Scholar 

  8. Jiang, Y., Tian, K., Shen, X.: Combining locality analysis with online proactive job co-scheduling in chip multiprocessors. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds.) HiPEAC 2010. LNCS, vol. 5952, pp. 201–215. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  9. Fedorova, A., Seltzer, M., Smith, M.D.: Improving performance isolation on chip multiprocessors via an operating system scheduler. In: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pp. 25–38. IEEE Computer Society (2007)

    Google Scholar 

  10. El-Moursy, A., Garg, R., Albonesi, D.H., et al.: Compatible phase co-scheduling on a CMP of multi-threaded processors. In: Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), p. 10. IEEE (2006)

    Google Scholar 

  11. Grewe, D., Wang, Z., O’Boyle, M.F.P.: OpenCL task partitioning in the presence of GPU contention. In: Caṣcaval, C., Montesinos-Ortego, P. (eds.) LCPC 2013 - Testing. LNCS, vol. 8664, pp. 87–101. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  12. Luk, C.K., Hong, S., Qilin, K.H.: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42), pp. 45–55. IEEE (2009)

    Google Scholar 

  13. Grewe, D., O’Boyle, M.F.P.: A static task partitioning approach for heterogeneous systems using OpenCL. In: Knoop, J. (ed.) CC 2011. LNCS, vol. 6601, pp. 286–305. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  14. Ravi, V.T., Ma, W., Chiu, D., et al.: Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. In: Proceedings of the 24th ACM International Conference on Supercomputing, pp. 137–146. ACM (2010)

    Google Scholar 

  15. Mekkat, V., Holey, A., Yew, P.C., et al.: Managing shared last-level cache in a heterogeneous multicore processor. In: Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, pp. 225–234. IEEE Press (2013)

    Google Scholar 

  16. Liu, Y., Zhang, E.Z., Shen, X.: A cross-input adaptive framework for GPU program optimizations. In: IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2009), pp. 1–10. IEEE (2009)

    Google Scholar 

  17. Tuck, N., Tullsen, D.M.: Initial observations of the simultaneous multithreading Pentium 4 processor. In: Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT 2003), pp. 26–34. IEEE (2003)

    Google Scholar 

  18. Ding, C., Zhong, Y.: Predicting whole-program locality through reuse distance analysis. ACM SIGPLAN Not. (ACM) 38(5), 245–257 (2003)

    Article  MathSciNet  Google Scholar 

  19. Fousek, J., Filipovi, J., Madzin, M.: Automatic fusions of CUDA-GPU kernels for parallel map. ACM SIGARCH Comput. Archit. News 39(4), 98–99 (2011)

    Article  Google Scholar 

  20. Wang, G., Lin, Y.S., Yi, W.: Kernel fusion: an effective method for better power efficiency on multithreaded GPU. In: 2010 IEEE/ACM International Conference on Cyber, Physical and Social Computing (CPSCom), Green Computing and Communications (GreenCom), pp. 344–350. IEEE (2010)

    Google Scholar 

  21. Wu, H., Diamos, G., Wang, J., et al.: Optimizing data warehousing applications for GPUs using kernel fusion, fission. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 2433–2442. IEEE (2012)

    Google Scholar 

  22. Aila, T., Laine, S.: Understanding the efficiency of ray traversal on GPUs. In: Proceedings of the Conference on High Performance Graphics, pp. 145–149. ACM (2009)

    Google Scholar 

  23. Chen, L., Villa, O., Krishnamoorthy, S., et al.: Dynamic load balancing on single-and multi-GPU systems. In: 2010 IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–12. IEEE (2010)

    Google Scholar 

  24. Gupta, K., Stuart, J.A., Owens, J.D.: A study of persistent threads style GPU programming for GPGPU workloads. In: Innovative Parallel Computing (InPar), pp. 1–14. IEEE (2012)

    Google Scholar 

  25. Xiao, S., Feng, W.: Inter-block GPU communication via fast barrier synchronization. In: 2010 IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–12. IEEE (2010)

    Google Scholar 

  26. http://unixhelp.ed.ac.uk/CGI/man-cgi?sched_setscheduler+2

  27. Zahedi, S.M., Lee, B.C.: REF: resource elasticity fairness with sharing incentives for multiprocessors. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2014)

    Google Scholar 

  28. Mars, J., Tang, L., Hundt, R.: Whare-Map: heterogeneity in homogeneous warehouse-scale computers. In: Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA), pp. 1–12 (2013)

    Google Scholar 

  29. Zhang, E.Z., Jiang, Y., Shen, X.: Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? ACM Sigplan Not. (ACM) 45(5), 203–212 (2010)

    Article  Google Scholar 

  30. Chang, J., Sohi, G.S.: Cooperative cache partitioning for chip multiprocessors. In: Proceedings of the 21st Annual International Conference on Supercomputing, pp. 242–252. ACM (2007)

    Google Scholar 

  31. Rafique, N., Lim, W.T., Thottethodi, M.: Architectural support for operating system-driven CMP cache management. In: Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, pp. 2–12. ACM (2006)

    Google Scholar 

  32. Suh, G.E., Devadas, S., Rudolph, L.: A new memory monitoring scheme for memory-aware scheduling and partitioning. In: Proceedings of the Eighth International Symposium on High-Performance Computer Architecture, pp. 117–128. IEEE (2002)

    Google Scholar 

  33. Qureshi, M.K., Patt, Y.N.: Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 423–432. IEEE Computer Society (2006)

    Google Scholar 

Download references

Acknowledgments

We thank the reviewers for the helpful comments. This material is based upon work supported by DOE Early Career Award and the National Science Foundation (NSF) under Grant No. 1320796 and CAREER Award. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of DOE or NSF. This work is also partially supported by 863 Program of China (2012AA010905), NSFC (61272144, 61272143) and NUDT/Hunan Innov. Fund. For PostGrad. (B120604, CX2012B029).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qi Zhu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Zhu, Q., Wu, B., Shen, X., Shen, L., Wang, Z. (2015). Understanding Co-run Degradations on Integrated Heterogeneous Processors. In: Brodman, J., Tu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2014. Lecture Notes in Computer Science(), vol 8967. Springer, Cham. https://doi.org/10.1007/978-3-319-17473-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-17473-0_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17472-3

  • Online ISBN: 978-3-319-17473-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics