Abstract
Embedded multicore processors for hard real-time applications like automobile engine control require the usage of local memory on each processor core to precisely meet the real-time deadline constraints, since cache memory cannot satisfy the deadline requirements due to cache misses. To utilize local memory, programmers or compilers need to explicitly manage data movement and data replacement for local memory considering the limited size. However, such management is extremely difficult and time consuming for programmers. This paper proposes an automatic local memory management method by compilers through (i) multi-dimensional data decomposition techniques to fit working sets onto limited size local memory (ii) suitable block management structures, called Adjustable Blocks, to create application specific fixed size data transfer blocks (iii) multi-dimensional templates to preserve the original multi-dimensional representations of the decomposed multi-dimensional data that are mapped onto one-dimensional Adjustable Blocks (iv) block replacement policies from liveness analysis of the decomposed data, and (v) code size reduction schemes to generate shorter codes. The proposed local memory management method is implemented on the OSCAR multi-grain and multi-platform compiler and evaluated on the Renesas RP2 8 core embedded homogeneous multicore processor equipped with local and shared memory. Evaluations on 5 programs including multimedia and scientific applications show promising results. For instance, speedups on 8 cores compared to single core execution using off-chip shared memory on an AAC encoder program, a MPEG2 encoder program, Tomcatv, and Swim are improved from 7.14 to 20.12, 1.97 to 7.59, 5.73 to 7.38, and 7.40 to 11.30, respectively, when using local memory with the proposed method. These evaluations indicate the usefulness and the validity of the proposed local memory management method on real embedded multicore processors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Panda, P.R., et al.: Efficient utilization of scratch-pad memory in embedded processor applications. In: Proceedings of European conference on Design and Test (1997)
Avissar, O., et al.: An optimal memory allocation scheme for scratch-pad-based embedded systems. ACM Trans. Embed. Comput. Syst. 1(1), 6–26 (2002)
Steinke, S., et al.: Assigning program and data objects to scratchpad for energy reduction. In: Proceedings of Design, Automation and Test in Europe Conference and Exhibition (2002)
Che, W., et al.: Compilation of stream programs for multicore processors that incorporate scratchpad memories. In: Proceedings of Design, Automation and Test in Europe Conference and Exhibition (2010)
Udayakumaran, S., et al.: Dynamic allocation for scratch-pad memory using compile-time decisions. ACM Trans. Embed. Comput. Syst. 5(2), 472–511 (2006)
Guo, Y., et al.: Data placement and duplication for embedded multicore systems with scratch pad memory. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 32(6), 809–817 (2013)
Kandemir, M., et al.: Exploiting shared scratch pad memory space in embedded multiprocessor systems. In: Proceedings of Design Automation Conference (2002)
Issenin, I., et al.: Multiprocessor system-on-chip data reuse analysis for exploring customized memory hierarchies. In: Proceedings of Design Automation Conference (2006)
Kasahara, H., et al.: U.S. Patent No. 8,438,359, U.S. Patent and Trademark Office, Washington, DC (2013)
Kimura, K., Mase, M., Mikami, H., Miyamoto, T., Shirako, J., Kasahara, H.: OSCAR API for real-time low-power multicores and its performance on multicores and smp servers. In: Gao, G.R., Pollock, L.L., Cavazos, J., Li, X. (eds.) LCPC 2009. LNCS, vol. 5898, pp. 188–202. Springer, Heidelberg (2010). doi:10.1007/978-3-642-13374-9_13
Banakar, R., et al.: Scratchpad memory: design alternative for cache on-chip memory in embedded systems. In: Proceedings of International Symposium on Hardware/Software Codesign (2002)
Wolfe, M.: More iteration space tiling. In: Proceedings of ACM/IEEE Conference on Supercomputing (1989)
Kasahara, H., Honda, H., Mogi, A., Ogura, A., Fujiwara, K., Narita, S.: A multi-grain parallelizing compilation scheme for OSCAR (optimally scheduled advanced multiprocessor). In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds.) LCPC 1991. LNCS, vol. 589, pp. 283–297. Springer, Heidelberg (1992). doi:10.1007/BFb0038671
Yoshida, A., et al.: Data-localization for Fortran macro-dataflow computation using partial static task assignment. In: Proceedings of International Conference on Supercomputing (1996)
Ito, M., et al.: An 8640 MIPS SoC with independent poweroff control of 8 CPU and 8 RAMs by an automatic parallelizing compiler. In: Proceedings of IEEE International Solid State Circuits Conference (2008)
Kennedy, K., et al.: Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Padua, D., et al.: Advanced compiler optimizations for supercomputers. Commun. ACM 29, 1184–1201 (1986)
https://www.renesas.com/en-in/products/microcontrollers-microprocessors/v850/v850e2mx/v850e2mx4.html
Acknowledgments
This work was partly supported by JSPS KAKENHI Grant Number JP15K00085.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Yamamoto, K., Shirakawa, T., Oki, Y., Yoshida, A., Kimura, K., Kasahara, H. (2017). Automatic Local Memory Management for Multicores Having Global Address Space. In: Ding, C., Criswell, J., Wu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2016. Lecture Notes in Computer Science(), vol 10136. Springer, Cham. https://doi.org/10.1007/978-3-319-52709-3_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-52709-3_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52708-6
Online ISBN: 978-3-319-52709-3
eBook Packages: Computer ScienceComputer Science (R0)