Abstract
Non-uniform memory architecture (NUMA) system has numerous nodes with shared last level cache (LLC). Their shared LLC has brought many benefits in the cache utilization. However, LLC can be seriously polluted by tasks that cause huge I/O traffic for a long time since inclusive cache architecture of LLC replaces valid cache line by back-invalidate. Many research on the page coloring, partitioning, and pollute buffer mechanism handled this cache pollution. But, there are no scheduling approaches considering I/O-intensive tasks in NUMA systems. To address the above problem, OS scheduling that reduces cache pollution is highly needed in NUMA systems.
In this paper, we propose a software-based mechanism that reduces shared LLC miss in NUMA systems. Our mechanism includes I/O traffic measurement and devil conscious scheduling. The experimental results show that LLC miss rate can be reduced up to 37.6%, and our approach improves execution time to 1.48%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Azimi, R., Tam, D., Soares, L., Stumm, M.: Enhancing operating system support for multicore processors by using hardware performance monitoring. In: ACM Special Interest Group on Operating System, pp. 56–65 (2009)
Blagodurov, S., Zhuravlev, S., Fedorova, A., Kamali, A.: A case for NUMA system-aware contention management on multicore systems. In: 19th Parallel Architectures and Compilation Techniques, pp. 557–558 (2010)
Kim, J., Kim, J., Ahn, D., Eom, Y.: Page Coloring Synchronization for Improving Cache Performance in Virtualization Environment. In: 11th Computational Science and its Applications, pp. 495–505 (2011)
Dey, T., Wang, W., Davidson, J.W., Soffa, M.L.: Characterizing multi-threaded applications based on shared-resource contention. IEEE Performance Analysis of Systems and Software, 76–86 (2011)
Chandra, D., Guo, F., Kim, S., Solihin, Y.: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture. In: 11th IEEE High-Performance Computer Architecture, pp. 340–351 (2005)
Soares, L., Tam, D., Stumm, M.: Reducing the Harmful Effects of Last-Level Cache Polluters with an OS-Level, Software-Only Pollute Buffer. IEEE MICRO Architecture, 258–269 (2008)
Ding, X., Wang, K., Zhang, X.: SRM-buffer: An OS buffer management technique to prevent last level cache from thrashing in multicores. In: 6th ACM European Conference on Computer Systems, pp. 243–256 (2011)
Jaleel, A., Borch, E., Bhandaru, M., Simon, C.S., Emer, J.: Achieving Non-Inclusive Cache Performance with Inclusive Caches: Temporal Locality Aware (TLA) Cache Management Policies. In: 43rd IEEE MICRO Architecture, pp. 151–162 (2010)
Molka, D., Hackenberg, D., Schöne, R., Müller, M.S.: Memory Performance and Coherency Effects on an Intel Nehalem Multiprocessor System. In: 18th IEEE Parallel Architectures and Compilation Techniques, pp. 261–270 (2009)
Zhuravlev, S., Blagodurov, S., Fedorova, A.: Addressing shared resource contention in multicore processors via scheduling. In: 15th ACM Architectural Support for Programming Languages and Operating Systems, pp. 129–142 (2010)
Qian, B., Yan, L.: The Research of the Inclusive Cache used in Multi-Core Processor. IEEE Electronic Packaging Technology & High Density Packaging, 1–4 (2008)
Tam, D., Azimi, R., Soares, L., Stumm, M.: RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations. In: 14th ACM Architectural Support for Programming Languages and Operating Systems, pp. 121–132 (2009)
Knauerhase, R., Brett, P., Hohlt, B., Li, T., Hahn, S.: Using OS Observations to Improve Performance in Multicore Systems. IEEE MICRO Achitecture, 54–66 (2008)
Xie, Y., Loh, G.H.: Dynamic Classification of Program Memory Behaviors in CMPs. In: 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects (2008)
The Linux Kernel Archives: THE proc FILESYSTEM, http://www.kernel.org/doc/Documentation/filesystems/proc.txt
Blagodurov, S., Fedorova, A.: User-level scheduling on NUMA system multicore systems under Linux. In: 13th Annual Linux Symposium (2011)
Jaleel, A.: Memory Characterization of Workloads Using Instrumentation-Driven Simulation, http://www.jaleels.org/ajaleel/workload/SPECanalysis.pdf
SPEC CPU2006 Documentation, http://www.spec.org/cpu2006/Docs/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
An, D., Kim, J., Han, J., Eom, Y.I. (2012). Reducing Last Level Cache Pollution in NUMA Multicore Systems for Improving Cache Performance. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2012. ICCSA 2012. Lecture Notes in Computer Science, vol 7335. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31137-6_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-31137-6_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31136-9
Online ISBN: 978-3-642-31137-6
eBook Packages: Computer ScienceComputer Science (R0)