Abstract
Different virtual memory regions (e.g., stack and heap) have different properties and characteristics. For example, stack data are thread-private by definition while heap data can be shared between threads. Compared with heap memory, stack memory tends to take a large number of accesses to a rather small number of pages. These facts have been largely ignored by designers. In this paper, we propose two novel designs that exploit stack memory’s unique characteristics to optimize the on-chip memory system.
The first design is Anticipatory Superpaging - automatically create superpages for stack memory at the first page fault in a potential superpage, increasing TLB reach and reducing TLB misses. It is transparent to applications and does not require kernel to employ online analysis algorithms and page copying. The second design is Stack-Aware Cache Placement - stack accesses are routed to their local slices in a distributed shared cache, while non-stack accesses are still routed using cacheline interleaving. The primary benefit of this mechanism is reduced power consumption of the on-chip interconnect. Our simulation shows that the first innovation reduces TLB misses by 10% - 20%, and the second one reduces interconnect power consumption by over 14%.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Romer, T.H., Ohlrich, W.H., Karlin, A.R., Bershad, B.N.: Reducing TLB and memory overhead using online superpage promotion. In: ISCA 1995, pp. 176–187 (1995)
McCurdy, C., Coxa, A.L., Vetter, J.: Investigating the TLB behavior of high-end scientific applications on commodity microprocessors. In: ISPASS 2008, pp. 95–104 (2008)
Speight, E., Shafi, H., Zhang, L., Rajamony, R.: Adaptive mechanisms and policies for managing cache hierarchies in chip multiprocessors. In: ISCA 2005, pp. 346–356 (2005)
Muralimanohar, N., Balasubramonian, R., Jouppi, N.: Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In: MICRO, vol. 40, pp. 3–14 (2007)
Wang, H.S., Peh, L.S., Malik, S.: A power model for routers: Modeling Alpha 21364 and InfiniBand routers. IEEE Micro 23, 26–35 (2003)
Lee, H.H.S., Ballapuram, C.S.: Energy efficient D-TLB and data cache using semantic-aware multilateral partitioning. In: ISLPED 2003, pp. 306–311 (2003)
Zhao, L., Iyer, R., Moses, J., Illikkal, R., Makineni, S., Newell, D.: Exploring large-scale CMP architectures using ManySim. IEEE Micro 27, 21–33 (2007)
Kahng, A., Li, B., Peh, L.-S., Samadi, K.: ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration. In: DATE 2009, pp. 423–428 (2009)
Navarro, J., Iyer, S., Druschel, P., Cox, A.: Practical, transparent operating system support for superpages. SIGOPS Oper. Syst. Rev. 36, 89–104 (2002)
Ganapathy, N., Schimmel, C.: General purpose operating system support for multiple page sizes. In: ATEC 1998 (1998)
Subramanian, I., Mather, C., Peterson, K., Raghunath, B.: Implementation of multiple pagesize support in HP-UX. In: ATEC 1998, p. 9 (1998)
Cascaval, C., Duesterwald, E., Sweeney, P.F., Wisniewski, R.W.: Multiple page size modeling and optimization. In: PaCT 2005, pp. 339–349 (2005)
Nellans, D., Balasubramonian, R., Brunvand, E.: OS execution on multi-cores: is out-sourcing worthwhile? SIGOPS Oper. Syst. Rev. 43, 104–105 (2009)
Huang, M., Renau, J., Yoo, S.M., Torrellas, J.: L1 data cache decomposition for energy efficiency. In: ISLPED 2001, pp. 10–15 (2001)
Ballapuram, C.S., Sharif, A., Lee, H.H.S.: Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors. In: ASLPED 2008 (2008)
Cho, S., Jin, L.: Managing distributed, shared L2 caches through OS-level page allocation. In: MICRO 2006 (2006)
Jin, L., Cho, S.: SOS: A software-oriented distributed shared cache management approach for chip multiprocessors. In: PaCT 2009, pp. 361–371 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, B. et al. (2012). Exploiting Semantics of Virtual Memory to Improve the Efficiency of the On-Chip Memory System. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds) Euro-Par 2012 Parallel Processing. Euro-Par 2012. Lecture Notes in Computer Science, vol 7484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32820-6_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-32820-6_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32819-0
Online ISBN: 978-3-642-32820-6
eBook Packages: Computer ScienceComputer Science (R0)