Adaptive Thread Scheduling in Chip Multiprocessors

Akturk, Ismail; Ozturk, Ozcan

doi:10.1007/s10766-019-00637-y

Adaptive Thread Scheduling in Chip Multiprocessors

Published: 14 May 2019

Volume 47, pages 1014–1044, (2019)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Ismail Akturk¹ &
Ozcan Ozturk²

391 Accesses
2 Citations
Explore all metrics

Abstract

The full potential of chip multiprocessors remains unexploited due to architecture oblivious thread schedulers employed in operating systems. We introduce an adaptive cache-hierarchy-aware scheduler that tries to schedule threads in a way that inter-thread contention is minimized. A novel multi-metric scoring scheme is used which specifies L1 cache access characteristics of threads. Scheduling decisions are made based on these multi-metric scores of threads.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Boosting white shark optimizer for global optimization and cloud scheduling problem

Article 28 March 2024

Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir

Parallel programming models for heterogeneous many-cores: a comprehensive survey

Article 31 July 2020

References

Moore, G.E.: Cramming more components onto integrated circuits. Proc. IEEE 86(1), 82–85 (1998). https://doi.org/10.1109/JPROC.1998.658762
Article Google Scholar
Olukotun, K., Nayfeh, B.A., Hammond, L., Wilson, K., Chang, K.: The case for a single-chip multiprocessor. In: Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, USA, pp. 2–11 (1996). https://doi.org/10.1145/237090.237140
Tullsen, D.M., Eggers, S.J., Levy, H.M.: Simultaneous multithreading: maximizing on-chip parallelism. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp. 392–403. ACM, New York, NY, USA (1995). https://doi.org/10.1145/223982.224449
Kumar, R., Tullsen, D.M.: Compiling for instruction cache performance on a multithreaded architecture. In: Proceedings of the 35th Annual ACM/IEEE International Symposium on Microarchitecture, pp. 419–429. IEEE Computer Society Press, Los Alamitos, CA, USA (2002)
Zhang, E.Z., Jiang, Y., Shen, X.: Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 203–212. ACM, New York, NY, USA (2010). https://doi.org/10.1145/1693453.1693482
Tam, D., Azimi, R., Stumm, M.: Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors. In: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, pp. 47–58. ACM, New York, NY, USA (2007). https://doi.org/10.1145/1272996.1273004
Parekh, S.S., Eggers, S.J., Levy, H.M.: Thread-Sensitive Scheduling for SMT Processors. Technical report, University of Washington (2001)
Bulpin, J.R., Pratt, I.A.: Hyper-threading aware process scheduling heuristics. In: Proceedings of USENIX Annual Technical Conference, p. 27. USENIX Association, Berkeley, CA, USA (2005)
Settle, A., Kihm, J., Janiszewski, A., Connors, D.: Architectural support for enhanced SMT job scheduling. In: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pp. 63–73. IEEE Computer Society, Washington, DC, USA (2004). https://doi.org/10.1109/PACT.2004.7
Ubal, R., Sahuquillo, J., Petit, S., López, P.: Multi2Sim: a simulation framework for CPU-GPU computing. In: Proceedings of the 19th International Symposium on Computer Architecture and High Performance Computing (2007)
Bienia, C.: Benchmarking modern multiprocessors. Ph.D. thesis, Princeton University (2011)
Jiang, Y., Shen, X., Chen, J., Tripathi, R.: Analysis and approximation of optimal co-scheduling on chip multiprocessors. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 220–229. ACM, New York, NY, USA (2008). https://doi.org/10.1145/1454115.1454146
El-Moursy, A., Garg, R., Albonesi, D.H., Dwarkadas, S.: Compatible phase co-scheduling on a CMP of multi-threaded processors. In: Proceedings of the 20th International Conference on Parallel and Distributed Processing, p. 141. IEEE Computer Society, Washington, DC, USA (2006)
Snavely, A., Tullsen, D.M.: Symbiotic jobscheduling for a simultaneous multithreaded processor. In: Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 234–244. ACM, New York, NY, USA (2000). https://doi.org/10.1145/378993.379244
Kim, S., Chandra, D., Solihin, Y.: Fair cache sharing and partitioning in a chip multiprocessor architecture. In: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pp. 111–122. IEEE Computer Society, Washington, DC, USA (2004) . https://doi.org/10.1109/PACT.2004.15
Fedorova, A., Seltzer, M., Smith, M.D.: Improving performance isolation on chip multiprocessors via an operating system scheduler. In: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pp. 25–38. IEEE Computer Society, Washington, DC, USA (2007). https://doi.org/10.1109/PACT.2007.40
Denning, P.J.: The working set model for program behavior. Commun. ACM 11(5), 323–333 (1968). https://doi.org/10.1145/363095.363141
Article MathSciNet MATH Google Scholar
Wong, W., Baer, J.L.: Modified LRU policies for improving second-level cache behavior. In: Proceedings of the 6th International Symposium on High Performance Computer Architecture, pp. 49–60 (2000). https://doi.org/10.1109/HPCA.2000.824338
Stone, H.S., Turek, J., Wolf, J.L.: Optimal partitioning of cache memory. IEEE Trans. Comput. 41(9), 1054–1068 (1992). https://doi.org/10.1109/12.165388
Article Google Scholar
Qureshi, M.K., Lynch, D.N., Mutlu, O., Patt, Y.N.: A case for MLP-aware cache replacement. In: Proceedings of the 33rd Annual International Symposium on Computer Architecture. IEEE Computer Society, Washington, DC, USA, pp. 167–178 (2006). https://doi.org/10.1109/ISCA.2006.5
Chiou, D., Devadas, S., Rudolph, L., Ang, B.S., Chiouy, D., Chiouy, D., Rudolphy, L., Rudolphy, L., Devadasy, S., Devadasy, S., Angz, B.S., Angz, B.S.: Dynamic cache partitioning via columnization. In: Proceedings of Design Automation Conference (2000)
Sherwood, T., Perelman, E., Hamerly, G., Calder, B.: Automatically characterizing large scale program behavior. In: Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 45–57. ACM, New York, NY, USA (2002). https://doi.org/10.1145/605397.605403
Chandra, D., Guo, F., Kim, S., Solihin, Y.: Predicting inter-thread cache contention on a chip multi-processor architecture. In: Proceedings of the 11th International Symposium on High Performance Computer Architecture, pp. 340–351. IEEE Computer Society, Washington, DC, USA (2005). https://doi.org/10.1109/HPCA.2005.27
Cazorla, F.J., Ramirez, A., Valero, M., Fernandez, E.: Dynamically controlled resource allocation in SMT processors. In: Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 171–182. IEEE Computer Society, Washington, DC, USA (2004). https://doi.org/10.1109/MICRO.2004.17
Kihm, J.L., Janiszewski, A.W., Connors, D.A.: Dynamically controlled resource allocation in SMT processors. In: Proceedings of International Conference on Computing, Communications and Control Technologies (2004)
Tian, K., Jiang, Y., Shen, X.: A study on optimally co-scheduling jobs of different lengths on chip multiprocessors. In: Proceedings of the 6th ACM Conference on Computing Frontiers, pp. 41–50. ACM, New York, NY, USA (2009). https://doi.org/10.1145/1531743.1531752
Jiang, Y., Tian, K., Shen, X.: Analysis and approximation of optimal co-scheduling on chip multiprocessors. In: Proceedings of the 5th International Conference on High Performance Embedded Architectures and Compilers, pp. 201–215. Springer, Berlin, Heidelberg (2010)
Chapter Google Scholar
Ding, C., Zhong, Y.: Predicting whole-program locality through reuse distance analysis. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 245–257. ACM, New York, NY, USA (2003). https://doi.org/10.1145/781131.781159
Suh, G.E., Devadas, S., Rudolph, L.: A new memory monitoring scheme for memory-aware scheduling and partitioning. In: Proceedings of the 8th International Symposium on High Performance Computer Architecture, pp. 117–128. IEEE Computer Society, Washington, DC, USA (2002)
Sugumar, R.A., Abraham, S.G.: Set-associative cache simulation using generalized binomial trees. ACM Trans. Comput. Syst. 13(1), 32–56 (1995). https://doi.org/10.1145/200912.200918
Article Google Scholar
DeVuyst, M., Kumar, R., Tullsen, D.M.: Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors. In: Proceedings of the 20th International Conference on Parallel and Distributed Processing, pp. 140–149. IEEE Computer Society, Washington, DC, USA (2006)

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
Ismail Akturk
Department of Computer Engineering, Bilkent University, Ankara, Turkey
Ozcan Ozturk

Authors

Ismail Akturk
View author publications
You can also search for this author in PubMed Google Scholar
Ozcan Ozturk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ismail Akturk.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Akturk, I., Ozturk, O. Adaptive Thread Scheduling in Chip Multiprocessors. Int J Parallel Prog 47, 1014–1044 (2019). https://doi.org/10.1007/s10766-019-00637-y

Download citation

Received: 27 March 2015
Accepted: 07 May 2019
Published: 14 May 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10766-019-00637-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive Thread Scheduling in Chip Multiprocessors

Abstract

Access this article

Similar content being viewed by others

Boosting white shark optimizer for global optimization and cloud scheduling problem

Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir

Parallel programming models for heterogeneous many-cores: a comprehensive survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Adaptive Thread Scheduling in Chip Multiprocessors

Abstract

Access this article

Similar content being viewed by others

Boosting white shark optimizer for global optimization and cloud scheduling problem

Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir

Parallel programming models for heterogeneous many-cores: a comprehensive survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation