Abstract
Major chip manufacturers have all introduced multicore microprocessors. Multi-socket systems built from these processors are used for running various server applications. However to the best of our knowledge current commercial operating systems are not optimized for multi-threaded workloads running on such servers. Cache-to-cache transfers and remote memory accesses impact the performance of such workloads. This paper presents a unified approach to optimizing OS scheduling algorithms for both cache-to-cache transfers and remote DRAM accesses that also takes cache affinity into account. By observing the patterns of local and remote cache-to-cache transfers as well as local and remote DRAM accesses for every thread in each scheduling quantum and applying different algorithms, we come up with a new schedule of threads for the next quantum taking cache affinity into account. This new schedule cuts down both remote cache-to-cache transfers and remote DRAM accesses for the next scheduling quantum and improves overall performance. We present two algorithms of varying complexity for optimizing cache-to-cache transfers. One of these is a new algorithm which is relatively simpler and performs better when combined with algorithms that optimize remote DRAM accesses. For optimizing remote DRAM accesses we present two algorithms. Though both algorithms differ in algorithmic complexity they perform equally well for the workloads presented in this paper. We used three different synthetic workloads to evaluate these algorithms. We also performed sensitivity analysis with respect to varying remote cache-to-cache transfer latency and remote DRAM latency. We show that these algorithms can cut down overall latency by up to 16.79% depending on the algorithm used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Thekkath, R., Eggers, S.J.: Impact of sharing-based thread placement on multi-threaded architectures. In: International Symposium on Computer Architecture (1994)
Tam, D., Azimi, R., Stumm, M.: Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors. In: ACM SIGOPS Operating System Review, June 2007
Sridharan, S., et al.: Thread migration to improve synchronization performance. In: Workshop on Operating System Interference in High Performance Applications (2006)
Nakajima, J., et al.: Enhancements for hyper-threading technology in the operating system – seeking the optimal micro-architectural scheduling. In: International Parallel and Distributed Processing Symposium (2005)
Snavely, A., et al.: Symbiotic job scheduling for a simultaneous multithreading processor. In: Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2000)
El-Moursy, A., et al.: Compatible phase co-scheduling on a CMP of multi-threaded processors. In: International Parallel and Distributed Processing Symposium. 2006
Chandra, R., Devine, S., Verghise, B., Gupta, A., Rosenblum, M.: Scheduling and page migration for multiprocessor compute servers. In: Proceedings of ASPLOS (1994)
Kaseridis, D., Stuecheli, J., Chen, J., John, L.K.: A bandwidth-aware memory-subsystem resource management using non-invasive resource profilers for large CMP systems. In: Proceedings of Sixteenth International Symposium on High Performance Computer Architecture (2010)
Ä°pek, E., Mutlu, O., MartÃnez, J.F., Caruana, R.: Self-optimizing memory controllers: A reinforcement learning approach, In: Proceedings of International Symposium on Computer Architecture, Beijing, China, June 2008
Ahn, J.H., Erez, M., Dally, W.J.: The design space of data - parallel memory systems. In: Proceedings of SC, 2006
Zhu, Z., Zhang, Z.: A performance comparison of DRAM memory system optimizations for SMT processors. In: Proceedings of HPCA-11 (2005)
Nauman, R., Lim, W.-T., Thottethodi, M.: Effective management of DRAM bandwidth in multicore processors. In: Proceedings of PACT-2007
Tang, L., Mars, J., Vachharajani, N., Hundt, R., Soffa, M.L.: The impact of memory subsystem resource sharing on datacenter applications. In: Proceedings of International Symposium on Computer Architecture (2011)
Hur, I., Lin, C.: Adaptive history-based memory schedulers. In: Proceedings of the International Symposium on Microarchitecture (2004)
Rixner, S., Dally, W.J., Kapasi, U., Mattson, P.R., Owens, J.D.: Memory access scheduling. In: Proceedings of International Symposium on Computer Architecture (2000)
Kim, C., Huh, J.: Fairness-oriented OS scheduling support for multicore system. In: Proceedings of 2016 International Conference on Supercomputing
Sahoo, P.K., Dehury, C.K.: Efficient data and CPU-intensive job scheduling algorithms for healthcare cloud. Elsevier Comput. Electr. Eng. 68, 119–139 (2018)
Srikanthan, S., Dwarkadas, S., Shen, K.: Data sharing or resource contention: toward performance transparency on multicore systems. In: USENIX Annual Technical Conference (2015)
Srikanthan, S., Dwarkadas, S., Shen, K.: Coherency stalls or latency tolerance: informed CPU scheduling for socket and core sharing. In: USENIX Annual Technical Conference (2016)
Lepers, B., Quema, V., Fedorova, A.: Thread and memory placement on NUMA systems: asymmetry matters. In: USENIX Annual Technical Conference (2015)
Harris, T., Maas, M., Marathe, V.J.: Callisto: co-scheduling parallel runtime systems. In: 9th EuroSys Conference (2014)
Durbhakula, M.: Sharing aware OS scheduling algorithms for multi-socket multi-core servers. In: Proceedings of First International Forum on Next-Generation Multicore/Manycore Technologies (2008)
Acknowledgments
I would like to thank Prof. Alan Cox of Rice University for initially discussing with me the concept of optimizing OS scheduling algorithms for improving the performance of various workloads. I would also like to thank various reviewers of this work for their comments and feedback. Finally I would like to thank my parents, wife, and kids for supporting me morally during the course of this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Durbhakula, M. (2019). OS Scheduling Algorithms for Improving the Performance of Multithreaded Workloads. In: Arai, K., Bhatia, R., Kapoor, S. (eds) Intelligent Computing. CompCom 2019. Advances in Intelligent Systems and Computing, vol 997. Springer, Cham. https://doi.org/10.1007/978-3-030-22871-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-22871-2_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22870-5
Online ISBN: 978-3-030-22871-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)