OS Scheduling Algorithms for Improving the Performance of Multithreaded Workloads

Durbhakula, Murthy

doi:10.1007/978-3-030-22871-2_15

Murthy Durbhakula¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 997))

Included in the following conference series:

Intelligent Computing - Proceedings of the Computing Conference

1089 Accesses

Abstract

Major chip manufacturers have all introduced multicore microprocessors. Multi-socket systems built from these processors are used for running various server applications. However to the best of our knowledge current commercial operating systems are not optimized for multi-threaded workloads running on such servers. Cache-to-cache transfers and remote memory accesses impact the performance of such workloads. This paper presents a unified approach to optimizing OS scheduling algorithms for both cache-to-cache transfers and remote DRAM accesses that also takes cache affinity into account. By observing the patterns of local and remote cache-to-cache transfers as well as local and remote DRAM accesses for every thread in each scheduling quantum and applying different algorithms, we come up with a new schedule of threads for the next quantum taking cache affinity into account. This new schedule cuts down both remote cache-to-cache transfers and remote DRAM accesses for the next scheduling quantum and improves overall performance. We present two algorithms of varying complexity for optimizing cache-to-cache transfers. One of these is a new algorithm which is relatively simpler and performs better when combined with algorithms that optimize remote DRAM accesses. For optimizing remote DRAM accesses we present two algorithms. Though both algorithms differ in algorithmic complexity they perform equally well for the workloads presented in this paper. We used three different synthetic workloads to evaluate these algorithms. We also performed sensitivity analysis with respect to varying remote cache-to-cache transfer latency and remote DRAM latency. We show that these algorithms can cut down overall latency by up to 16.79% depending on the algorithm used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Thekkath, R., Eggers, S.J.: Impact of sharing-based thread placement on multi-threaded architectures. In: International Symposium on Computer Architecture (1994)
Google Scholar
Tam, D., Azimi, R., Stumm, M.: Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors. In: ACM SIGOPS Operating System Review, June 2007
Google Scholar
www.top500.org
Sridharan, S., et al.: Thread migration to improve synchronization performance. In: Workshop on Operating System Interference in High Performance Applications (2006)
Google Scholar
Nakajima, J., et al.: Enhancements for hyper-threading technology in the operating system – seeking the optimal micro-architectural scheduling. In: International Parallel and Distributed Processing Symposium (2005)
Google Scholar
Snavely, A., et al.: Symbiotic job scheduling for a simultaneous multithreading processor. In: Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2000)
Google Scholar
El-Moursy, A., et al.: Compatible phase co-scheduling on a CMP of multi-threaded processors. In: International Parallel and Distributed Processing Symposium. 2006
Google Scholar
Chandra, R., Devine, S., Verghise, B., Gupta, A., Rosenblum, M.: Scheduling and page migration for multiprocessor compute servers. In: Proceedings of ASPLOS (1994)
Google Scholar
Kaseridis, D., Stuecheli, J., Chen, J., John, L.K.: A bandwidth-aware memory-subsystem resource management using non-invasive resource profilers for large CMP systems. In: Proceedings of Sixteenth International Symposium on High Performance Computer Architecture (2010)
Google Scholar
İpek, E., Mutlu, O., Martínez, J.F., Caruana, R.: Self-optimizing memory controllers: A reinforcement learning approach, In: Proceedings of International Symposium on Computer Architecture, Beijing, China, June 2008
Google Scholar
Ahn, J.H., Erez, M., Dally, W.J.: The design space of data - parallel memory systems. In: Proceedings of SC, 2006
Google Scholar
Zhu, Z., Zhang, Z.: A performance comparison of DRAM memory system optimizations for SMT processors. In: Proceedings of HPCA-11 (2005)
Google Scholar
Nauman, R., Lim, W.-T., Thottethodi, M.: Effective management of DRAM bandwidth in multicore processors. In: Proceedings of PACT-2007
Google Scholar
Tang, L., Mars, J., Vachharajani, N., Hundt, R., Soffa, M.L.: The impact of memory subsystem resource sharing on datacenter applications. In: Proceedings of International Symposium on Computer Architecture (2011)
Google Scholar
Hur, I., Lin, C.: Adaptive history-based memory schedulers. In: Proceedings of the International Symposium on Microarchitecture (2004)
Google Scholar
Rixner, S., Dally, W.J., Kapasi, U., Mattson, P.R., Owens, J.D.: Memory access scheduling. In: Proceedings of International Symposium on Computer Architecture (2000)
Google Scholar
Kim, C., Huh, J.: Fairness-oriented OS scheduling support for multicore system. In: Proceedings of 2016 International Conference on Supercomputing
Google Scholar
Sahoo, P.K., Dehury, C.K.: Efficient data and CPU-intensive job scheduling algorithms for healthcare cloud. Elsevier Comput. Electr. Eng. 68, 119–139 (2018)
Google Scholar
Srikanthan, S., Dwarkadas, S., Shen, K.: Data sharing or resource contention: toward performance transparency on multicore systems. In: USENIX Annual Technical Conference (2015)
Google Scholar
Srikanthan, S., Dwarkadas, S., Shen, K.: Coherency stalls or latency tolerance: informed CPU scheduling for socket and core sharing. In: USENIX Annual Technical Conference (2016)
Google Scholar
Lepers, B., Quema, V., Fedorova, A.: Thread and memory placement on NUMA systems: asymmetry matters. In: USENIX Annual Technical Conference (2015)
Google Scholar
Harris, T., Maas, M., Marathe, V.J.: Callisto: co-scheduling parallel runtime systems. In: 9th EuroSys Conference (2014)
Google Scholar
Durbhakula, M.: Sharing aware OS scheduling algorithms for multi-socket multi-core servers. In: Proceedings of First International Forum on Next-Generation Multicore/Manycore Technologies (2008)
Google Scholar

Download references

Acknowledgments

I would like to thank Prof. Alan Cox of Rice University for initially discussing with me the concept of optimizing OS scheduling algorithms for improving the performance of various workloads. I would also like to thank various reviewers of this work for their comments and feedback. Finally I would like to thank my parents, wife, and kids for supporting me morally during the course of this work.

Author information

Authors and Affiliations

Indian Institute of Technology Hyderabad, Hyderabad, India
Murthy Durbhakula

Authors

Murthy Durbhakula
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Murthy Durbhakula .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai
The Science and Information SAI Organization, Bradford, West Yorkshire, UK
Rahul Bhatia
The Science and Information SAI Organization, Bradford, West Yorkshire, UK
Supriya Kapoor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Durbhakula, M. (2019). OS Scheduling Algorithms for Improving the Performance of Multithreaded Workloads. In: Arai, K., Bhatia, R., Kapoor, S. (eds) Intelligent Computing. CompCom 2019. Advances in Intelligent Systems and Computing, vol 997. Springer, Cham. https://doi.org/10.1007/978-3-030-22871-2_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-22871-2_15
Published: 23 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22870-5
Online ISBN: 978-3-030-22871-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics