Automatic Local Memory Management for Multicores Having Global Address Space

Yamamoto, Kouhei; Shirakawa, Tomoya; Oki, Yoshitake; Yoshida, Akimasa; Kimura, Keiji; Kasahara, Hironori

doi:10.1007/978-3-319-52709-3_21

Kouhei Yamamoto¹⁶,
Tomoya Shirakawa¹⁶,
Yoshitake Oki¹⁶,
Akimasa Yoshida^16,17,
Keiji Kimura¹⁶ &
…
Hironori Kasahara¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10136))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

920 Accesses
2 Citations

Abstract

Embedded multicore processors for hard real-time applications like automobile engine control require the usage of local memory on each processor core to precisely meet the real-time deadline constraints, since cache memory cannot satisfy the deadline requirements due to cache misses. To utilize local memory, programmers or compilers need to explicitly manage data movement and data replacement for local memory considering the limited size. However, such management is extremely difficult and time consuming for programmers. This paper proposes an automatic local memory management method by compilers through (i) multi-dimensional data decomposition techniques to fit working sets onto limited size local memory (ii) suitable block management structures, called Adjustable Blocks, to create application specific fixed size data transfer blocks (iii) multi-dimensional templates to preserve the original multi-dimensional representations of the decomposed multi-dimensional data that are mapped onto one-dimensional Adjustable Blocks (iv) block replacement policies from liveness analysis of the decomposed data, and (v) code size reduction schemes to generate shorter codes. The proposed local memory management method is implemented on the OSCAR multi-grain and multi-platform compiler and evaluated on the Renesas RP2 8 core embedded homogeneous multicore processor equipped with local and shared memory. Evaluations on 5 programs including multimedia and scientific applications show promising results. For instance, speedups on 8 cores compared to single core execution using off-chip shared memory on an AAC encoder program, a MPEG2 encoder program, Tomcatv, and Swim are improved from 7.14 to 20.12, 1.97 to 7.59, 5.73 to 7.38, and 7.40 to 11.30, respectively, when using local memory with the proposed method. These evaluations indicate the usefulness and the validity of the proposed local memory management method on real embedded multicore processors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Panda, P.R., et al.: Efficient utilization of scratch-pad memory in embedded processor applications. In: Proceedings of European conference on Design and Test (1997)
Google Scholar
Avissar, O., et al.: An optimal memory allocation scheme for scratch-pad-based embedded systems. ACM Trans. Embed. Comput. Syst. 1(1), 6–26 (2002)
Article Google Scholar
Steinke, S., et al.: Assigning program and data objects to scratchpad for energy reduction. In: Proceedings of Design, Automation and Test in Europe Conference and Exhibition (2002)
Google Scholar
Che, W., et al.: Compilation of stream programs for multicore processors that incorporate scratchpad memories. In: Proceedings of Design, Automation and Test in Europe Conference and Exhibition (2010)
Google Scholar
Udayakumaran, S., et al.: Dynamic allocation for scratch-pad memory using compile-time decisions. ACM Trans. Embed. Comput. Syst. 5(2), 472–511 (2006)
Article Google Scholar
Guo, Y., et al.: Data placement and duplication for embedded multicore systems with scratch pad memory. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 32(6), 809–817 (2013)
Article Google Scholar
Kandemir, M., et al.: Exploiting shared scratch pad memory space in embedded multiprocessor systems. In: Proceedings of Design Automation Conference (2002)
Google Scholar
Issenin, I., et al.: Multiprocessor system-on-chip data reuse analysis for exploring customized memory hierarchies. In: Proceedings of Design Automation Conference (2006)
Google Scholar
Kasahara, H., et al.: U.S. Patent No. 8,438,359, U.S. Patent and Trademark Office, Washington, DC (2013)
Google Scholar
Kimura, K., Mase, M., Mikami, H., Miyamoto, T., Shirako, J., Kasahara, H.: OSCAR API for real-time low-power multicores and its performance on multicores and smp servers. In: Gao, G.R., Pollock, L.L., Cavazos, J., Li, X. (eds.) LCPC 2009. LNCS, vol. 5898, pp. 188–202. Springer, Heidelberg (2010). doi:10.1007/978-3-642-13374-9_13
Chapter Google Scholar
Banakar, R., et al.: Scratchpad memory: design alternative for cache on-chip memory in embedded systems. In: Proceedings of International Symposium on Hardware/Software Codesign (2002)
Google Scholar
Wolfe, M.: More iteration space tiling. In: Proceedings of ACM/IEEE Conference on Supercomputing (1989)
Google Scholar
Kasahara, H., Honda, H., Mogi, A., Ogura, A., Fujiwara, K., Narita, S.: A multi-grain parallelizing compilation scheme for OSCAR (optimally scheduled advanced multiprocessor). In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds.) LCPC 1991. LNCS, vol. 589, pp. 283–297. Springer, Heidelberg (1992). doi:10.1007/BFb0038671
Chapter Google Scholar
Yoshida, A., et al.: Data-localization for Fortran macro-dataflow computation using partial static task assignment. In: Proceedings of International Conference on Supercomputing (1996)
Google Scholar
Ito, M., et al.: An 8640 MIPS SoC with independent poweroff control of 8 CPU and 8 RAMs by an automatic parallelizing compiler. In: Proceedings of IEEE International Solid State Circuits Conference (2008)
Google Scholar
Kennedy, K., et al.: Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Google Scholar
Padua, D., et al.: Advanced compiler optimizations for supercomputers. Commun. ACM 29, 1184–1201 (1986)
Article Google Scholar
https://www.renesas.com/en-in/products/microcontrollers-microprocessors/v850/v850e2mx/v850e2mx4.html

Download references

Acknowledgments

This work was partly supported by JSPS KAKENHI Grant Number JP15K00085.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Waseda University, Tokyo, Japan
Kouhei Yamamoto, Tomoya Shirakawa, Yoshitake Oki, Akimasa Yoshida, Keiji Kimura & Hironori Kasahara
Graduate School of Advanced Mathematical Sciences, Meiji University, Tokyo, Japan
Akimasa Yoshida

Authors

Kouhei Yamamoto
View author publications
You can also search for this author in PubMed Google Scholar
Tomoya Shirakawa
View author publications
You can also search for this author in PubMed Google Scholar
Yoshitake Oki
View author publications
You can also search for this author in PubMed Google Scholar
Akimasa Yoshida
View author publications
You can also search for this author in PubMed Google Scholar
Keiji Kimura
View author publications
You can also search for this author in PubMed Google Scholar
Hironori Kasahara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hironori Kasahara .

Editor information

Editors and Affiliations

University of Rochester , Rochester, New York, USA
Chen Ding
University of Rochester , Rochester, New York, USA
John Criswell
Huawei Inc. , Santa Clara, California, USA
Peng Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yamamoto, K., Shirakawa, T., Oki, Y., Yoshida, A., Kimura, K., Kasahara, H. (2017). Automatic Local Memory Management for Multicores Having Global Address Space. In: Ding, C., Criswell, J., Wu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2016. Lecture Notes in Computer Science(), vol 10136. Springer, Cham. https://doi.org/10.1007/978-3-319-52709-3_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-52709-3_21
Published: 24 January 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52708-6
Online ISBN: 978-3-319-52709-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics