Skip to main content
Log in

Adaptive Thread Scheduling in Chip Multiprocessors

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

The full potential of chip multiprocessors remains unexploited due to architecture oblivious thread schedulers employed in operating systems. We introduce an adaptive cache-hierarchy-aware scheduler that tries to schedule threads in a way that inter-thread contention is minimized. A novel multi-metric scoring scheme is used which specifies L1 cache access characteristics of threads. Scheduling decisions are made based on these multi-metric scores of threads.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Moore, G.E.: Cramming more components onto integrated circuits. Proc. IEEE 86(1), 82–85 (1998). https://doi.org/10.1109/JPROC.1998.658762

    Article  Google Scholar 

  2. Olukotun, K., Nayfeh, B.A., Hammond, L., Wilson, K., Chang, K.: The case for a single-chip multiprocessor. In: Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, USA, pp. 2–11 (1996). https://doi.org/10.1145/237090.237140

  3. Tullsen, D.M., Eggers, S.J., Levy, H.M.: Simultaneous multithreading: maximizing on-chip parallelism. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp. 392–403. ACM, New York, NY, USA (1995). https://doi.org/10.1145/223982.224449

  4. Kumar, R., Tullsen, D.M.: Compiling for instruction cache performance on a multithreaded architecture. In: Proceedings of the 35th Annual ACM/IEEE International Symposium on Microarchitecture, pp. 419–429. IEEE Computer Society Press, Los Alamitos, CA, USA (2002)

  5. Zhang, E.Z., Jiang, Y., Shen, X.: Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 203–212. ACM, New York, NY, USA (2010). https://doi.org/10.1145/1693453.1693482

  6. Tam, D., Azimi, R., Stumm, M.: Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors. In: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, pp. 47–58. ACM, New York, NY, USA (2007). https://doi.org/10.1145/1272996.1273004

  7. Parekh, S.S., Eggers, S.J., Levy, H.M.: Thread-Sensitive Scheduling for SMT Processors. Technical report, University of Washington (2001)

  8. Bulpin, J.R., Pratt, I.A.: Hyper-threading aware process scheduling heuristics. In: Proceedings of USENIX Annual Technical Conference, p. 27. USENIX Association, Berkeley, CA, USA (2005)

  9. Settle, A., Kihm, J., Janiszewski, A., Connors, D.: Architectural support for enhanced SMT job scheduling. In: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pp. 63–73. IEEE Computer Society, Washington, DC, USA (2004). https://doi.org/10.1109/PACT.2004.7

  10. Ubal, R., Sahuquillo, J., Petit, S., López, P.: Multi2Sim: a simulation framework for CPU-GPU computing. In: Proceedings of the 19th International Symposium on Computer Architecture and High Performance Computing (2007)

  11. Bienia, C.: Benchmarking modern multiprocessors. Ph.D. thesis, Princeton University (2011)

  12. Jiang, Y., Shen, X., Chen, J., Tripathi, R.: Analysis and approximation of optimal co-scheduling on chip multiprocessors. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 220–229. ACM, New York, NY, USA (2008). https://doi.org/10.1145/1454115.1454146

  13. El-Moursy, A., Garg, R., Albonesi, D.H., Dwarkadas, S.: Compatible phase co-scheduling on a CMP of multi-threaded processors. In: Proceedings of the 20th International Conference on Parallel and Distributed Processing, p. 141. IEEE Computer Society, Washington, DC, USA (2006)

  14. Snavely, A., Tullsen, D.M.: Symbiotic jobscheduling for a simultaneous multithreaded processor. In: Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 234–244. ACM, New York, NY, USA (2000). https://doi.org/10.1145/378993.379244

  15. Kim, S., Chandra, D., Solihin, Y.: Fair cache sharing and partitioning in a chip multiprocessor architecture. In: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pp. 111–122. IEEE Computer Society, Washington, DC, USA (2004) . https://doi.org/10.1109/PACT.2004.15

  16. Fedorova, A., Seltzer, M., Smith, M.D.: Improving performance isolation on chip multiprocessors via an operating system scheduler. In: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pp. 25–38. IEEE Computer Society, Washington, DC, USA (2007). https://doi.org/10.1109/PACT.2007.40

  17. Denning, P.J.: The working set model for program behavior. Commun. ACM 11(5), 323–333 (1968). https://doi.org/10.1145/363095.363141

    Article  MathSciNet  MATH  Google Scholar 

  18. Wong, W., Baer, J.L.: Modified LRU policies for improving second-level cache behavior. In: Proceedings of the 6th International Symposium on High Performance Computer Architecture, pp. 49–60 (2000). https://doi.org/10.1109/HPCA.2000.824338

  19. Stone, H.S., Turek, J., Wolf, J.L.: Optimal partitioning of cache memory. IEEE Trans. Comput. 41(9), 1054–1068 (1992). https://doi.org/10.1109/12.165388

    Article  Google Scholar 

  20. Qureshi, M.K., Lynch, D.N., Mutlu, O., Patt, Y.N.: A case for MLP-aware cache replacement. In: Proceedings of the 33rd Annual International Symposium on Computer Architecture. IEEE Computer Society, Washington, DC, USA, pp. 167–178 (2006). https://doi.org/10.1109/ISCA.2006.5

  21. Chiou, D., Devadas, S., Rudolph, L., Ang, B.S., Chiouy, D., Chiouy, D., Rudolphy, L., Rudolphy, L., Devadasy, S., Devadasy, S., Angz, B.S., Angz, B.S.: Dynamic cache partitioning via columnization. In: Proceedings of Design Automation Conference (2000)

  22. Sherwood, T., Perelman, E., Hamerly, G., Calder, B.: Automatically characterizing large scale program behavior. In: Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 45–57. ACM, New York, NY, USA (2002). https://doi.org/10.1145/605397.605403

  23. Chandra, D., Guo, F., Kim, S., Solihin, Y.: Predicting inter-thread cache contention on a chip multi-processor architecture. In: Proceedings of the 11th International Symposium on High Performance Computer Architecture, pp. 340–351. IEEE Computer Society, Washington, DC, USA (2005). https://doi.org/10.1109/HPCA.2005.27

  24. Cazorla, F.J., Ramirez, A., Valero, M., Fernandez, E.: Dynamically controlled resource allocation in SMT processors. In: Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 171–182. IEEE Computer Society, Washington, DC, USA (2004). https://doi.org/10.1109/MICRO.2004.17

  25. Kihm, J.L., Janiszewski, A.W., Connors, D.A.: Dynamically controlled resource allocation in SMT processors. In: Proceedings of International Conference on Computing, Communications and Control Technologies (2004)

  26. Tian, K., Jiang, Y., Shen, X.: A study on optimally co-scheduling jobs of different lengths on chip multiprocessors. In: Proceedings of the 6th ACM Conference on Computing Frontiers, pp. 41–50. ACM, New York, NY, USA (2009). https://doi.org/10.1145/1531743.1531752

  27. Jiang, Y., Tian, K., Shen, X.: Analysis and approximation of optimal co-scheduling on chip multiprocessors. In: Proceedings of the 5th International Conference on High Performance Embedded Architectures and Compilers, pp. 201–215. Springer, Berlin, Heidelberg (2010)

    Chapter  Google Scholar 

  28. Ding, C., Zhong, Y.: Predicting whole-program locality through reuse distance analysis. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 245–257. ACM, New York, NY, USA (2003). https://doi.org/10.1145/781131.781159

  29. Suh, G.E., Devadas, S., Rudolph, L.: A new memory monitoring scheme for memory-aware scheduling and partitioning. In: Proceedings of the 8th International Symposium on High Performance Computer Architecture, pp. 117–128. IEEE Computer Society, Washington, DC, USA (2002)

  30. Sugumar, R.A., Abraham, S.G.: Set-associative cache simulation using generalized binomial trees. ACM Trans. Comput. Syst. 13(1), 32–56 (1995). https://doi.org/10.1145/200912.200918

    Article  Google Scholar 

  31. DeVuyst, M., Kumar, R., Tullsen, D.M.: Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors. In: Proceedings of the 20th International Conference on Parallel and Distributed Processing, pp. 140–149. IEEE Computer Society, Washington, DC, USA (2006)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ismail Akturk.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Akturk, I., Ozturk, O. Adaptive Thread Scheduling in Chip Multiprocessors. Int J Parallel Prog 47, 1014–1044 (2019). https://doi.org/10.1007/s10766-019-00637-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-019-00637-y

Keywords

Navigation