Exploiting Semantics of Virtual Memory to Improve the Efficiency of the On-Chip Memory System

Li, Bin; Fang, Zhen; Zhao, Li; Jiang, Xiaowei; Li, Lin; Herdrich, Andrew; Iyer, Ravishankar; Makineni, Srihari

doi:10.1007/978-3-642-32820-6_24

Bin Li¹⁹,
Zhen Fang²⁰,
Li Zhao¹⁹,
Xiaowei Jiang¹⁹,
Lin Li¹⁹,
Andrew Herdrich¹⁹,
Ravishankar Iyer¹⁹ &
…
Srihari Makineni¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7484))

Included in the following conference series:

European Conference on Parallel Processing

3051 Accesses

Abstract

Different virtual memory regions (e.g., stack and heap) have different properties and characteristics. For example, stack data are thread-private by definition while heap data can be shared between threads. Compared with heap memory, stack memory tends to take a large number of accesses to a rather small number of pages. These facts have been largely ignored by designers. In this paper, we propose two novel designs that exploit stack memory’s unique characteristics to optimize the on-chip memory system.

The first design is Anticipatory Superpaging - automatically create superpages for stack memory at the first page fault in a potential superpage, increasing TLB reach and reducing TLB misses. It is transparent to applications and does not require kernel to employ online analysis algorithms and page copying. The second design is Stack-Aware Cache Placement - stack accesses are routed to their local slices in a distributed shared cache, while non-stack accesses are still routed using cacheline interleaving. The primary benefit of this mechanism is reduced power consumption of the on-chip interconnect. Our simulation shows that the first innovation reduces TLB misses by 10% - 20%, and the second one reduces interconnect power consumption by over 14%.

Download to read the full chapter text

Chapter PDF

Towards Eliminating Memory Virtualization Overhead

Toward multi-programmed workloads with different memory footprints: a self-adaptive last level cache scheduling scheme

Article 14 July 2017

Page Classifier and Placer: A Scheme of Managing Hybrid Caches

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Romer, T.H., Ohlrich, W.H., Karlin, A.R., Bershad, B.N.: Reducing TLB and memory overhead using online superpage promotion. In: ISCA 1995, pp. 176–187 (1995)
Google Scholar
McCurdy, C., Coxa, A.L., Vetter, J.: Investigating the TLB behavior of high-end scientific applications on commodity microprocessors. In: ISPASS 2008, pp. 95–104 (2008)
Google Scholar
Speight, E., Shafi, H., Zhang, L., Rajamony, R.: Adaptive mechanisms and policies for managing cache hierarchies in chip multiprocessors. In: ISCA 2005, pp. 346–356 (2005)
Google Scholar
Muralimanohar, N., Balasubramonian, R., Jouppi, N.: Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In: MICRO, vol. 40, pp. 3–14 (2007)
Google Scholar
Wang, H.S., Peh, L.S., Malik, S.: A power model for routers: Modeling Alpha 21364 and InfiniBand routers. IEEE Micro 23, 26–35 (2003)
Article Google Scholar
Lee, H.H.S., Ballapuram, C.S.: Energy efficient D-TLB and data cache using semantic-aware multilateral partitioning. In: ISLPED 2003, pp. 306–311 (2003)
Google Scholar
Zhao, L., Iyer, R., Moses, J., Illikkal, R., Makineni, S., Newell, D.: Exploring large-scale CMP architectures using ManySim. IEEE Micro 27, 21–33 (2007)
Article Google Scholar
Kahng, A., Li, B., Peh, L.-S., Samadi, K.: ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration. In: DATE 2009, pp. 423–428 (2009)
Google Scholar
Navarro, J., Iyer, S., Druschel, P., Cox, A.: Practical, transparent operating system support for superpages. SIGOPS Oper. Syst. Rev. 36, 89–104 (2002)
Article Google Scholar
Ganapathy, N., Schimmel, C.: General purpose operating system support for multiple page sizes. In: ATEC 1998 (1998)
Google Scholar
Subramanian, I., Mather, C., Peterson, K., Raghunath, B.: Implementation of multiple pagesize support in HP-UX. In: ATEC 1998, p. 9 (1998)
Google Scholar
Cascaval, C., Duesterwald, E., Sweeney, P.F., Wisniewski, R.W.: Multiple page size modeling and optimization. In: PaCT 2005, pp. 339–349 (2005)
Google Scholar
Nellans, D., Balasubramonian, R., Brunvand, E.: OS execution on multi-cores: is out-sourcing worthwhile? SIGOPS Oper. Syst. Rev. 43, 104–105 (2009)
Article Google Scholar
Huang, M., Renau, J., Yoo, S.M., Torrellas, J.: L1 data cache decomposition for energy efficiency. In: ISLPED 2001, pp. 10–15 (2001)
Google Scholar
Ballapuram, C.S., Sharif, A., Lee, H.H.S.: Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors. In: ASLPED 2008 (2008)
Google Scholar
Cho, S., Jin, L.: Managing distributed, shared L2 caches through OS-level page allocation. In: MICRO 2006 (2006)
Google Scholar
Jin, L., Cho, S.: SOS: A software-oriented distributed shared cache management approach for chip multiprocessors. In: PaCT 2009, pp. 361–371 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Intel Corporation, Hillsboro, OR, 97124, USA
Bin Li, Li Zhao, Xiaowei Jiang, Lin Li, Andrew Herdrich, Ravishankar Iyer & Srihari Makineni
Nvidia, Austin, TX, 78717, USA
Zhen Fang

Authors

Bin Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Fang
View author publications
You can also search for this author in PubMed Google Scholar
Li Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaowei Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Lin Li
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Herdrich
View author publications
You can also search for this author in PubMed Google Scholar
Ravishankar Iyer
View author publications
You can also search for this author in PubMed Google Scholar
Srihari Makineni
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Patras, Computer Technology Institute and Press “Diophantus”,, N. Kazantzaki, 26504, Rio, Greece
Christos Kaklamanis
University of Patras, University Building B, 26504, Rio, Greece
Theodore Papatheodorou
Computer Technology Institute and Press “Diophantus”, University of Patras, N. Kazantzaki, 26504, Rio, Greece
Paul G. Spirakis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, B. et al. (2012). Exploiting Semantics of Virtual Memory to Improve the Efficiency of the On-Chip Memory System. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds) Euro-Par 2012 Parallel Processing. Euro-Par 2012. Lecture Notes in Computer Science, vol 7484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32820-6_24

Download citation

DOI: https://doi.org/10.1007/978-3-642-32820-6_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32819-0
Online ISBN: 978-3-642-32820-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Exploiting Semantics of Virtual Memory to Improve the Efficiency of the On-Chip Memory System

Abstract

Chapter PDF

Similar content being viewed by others

Towards Eliminating Memory Virtualization Overhead

Toward multi-programmed workloads with different memory footprints: a self-adaptive last level cache scheduling scheme

Page Classifier and Placer: A Scheme of Managing Hybrid Caches

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Exploiting Semantics of Virtual Memory to Improve the Efficiency of the On-Chip Memory System

Abstract

Chapter PDF

Similar content being viewed by others

Towards Eliminating Memory Virtualization Overhead

Toward multi-programmed workloads with different memory footprints: a self-adaptive last level cache scheduling scheme

Page Classifier and Placer: A Scheme of Managing Hybrid Caches

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation