Skip to main content

Automatic Local Memory Management for Multicores Having Global Address Space

  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10136))

Abstract

Embedded multicore processors for hard real-time applications like automobile engine control require the usage of local memory on each processor core to precisely meet the real-time deadline constraints, since cache memory cannot satisfy the deadline requirements due to cache misses. To utilize local memory, programmers or compilers need to explicitly manage data movement and data replacement for local memory considering the limited size. However, such management is extremely difficult and time consuming for programmers. This paper proposes an automatic local memory management method by compilers through (i) multi-dimensional data decomposition techniques to fit working sets onto limited size local memory (ii) suitable block management structures, called Adjustable Blocks, to create application specific fixed size data transfer blocks (iii) multi-dimensional templates to preserve the original multi-dimensional representations of the decomposed multi-dimensional data that are mapped onto one-dimensional Adjustable Blocks (iv) block replacement policies from liveness analysis of the decomposed data, and (v) code size reduction schemes to generate shorter codes. The proposed local memory management method is implemented on the OSCAR multi-grain and multi-platform compiler and evaluated on the Renesas RP2 8 core embedded homogeneous multicore processor equipped with local and shared memory. Evaluations on 5 programs including multimedia and scientific applications show promising results. For instance, speedups on 8 cores compared to single core execution using off-chip shared memory on an AAC encoder program, a MPEG2 encoder program, Tomcatv, and Swim are improved from 7.14 to 20.12, 1.97 to 7.59, 5.73 to 7.38, and 7.40 to 11.30, respectively, when using local memory with the proposed method. These evaluations indicate the usefulness and the validity of the proposed local memory management method on real embedded multicore processors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Panda, P.R., et al.: Efficient utilization of scratch-pad memory in embedded processor applications. In: Proceedings of European conference on Design and Test (1997)

    Google Scholar 

  2. Avissar, O., et al.: An optimal memory allocation scheme for scratch-pad-based embedded systems. ACM Trans. Embed. Comput. Syst. 1(1), 6–26 (2002)

    Article  Google Scholar 

  3. Steinke, S., et al.: Assigning program and data objects to scratchpad for energy reduction. In: Proceedings of Design, Automation and Test in Europe Conference and Exhibition (2002)

    Google Scholar 

  4. Che, W., et al.: Compilation of stream programs for multicore processors that incorporate scratchpad memories. In: Proceedings of Design, Automation and Test in Europe Conference and Exhibition (2010)

    Google Scholar 

  5. Udayakumaran, S., et al.: Dynamic allocation for scratch-pad memory using compile-time decisions. ACM Trans. Embed. Comput. Syst. 5(2), 472–511 (2006)

    Article  Google Scholar 

  6. Guo, Y., et al.: Data placement and duplication for embedded multicore systems with scratch pad memory. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 32(6), 809–817 (2013)

    Article  Google Scholar 

  7. Kandemir, M., et al.: Exploiting shared scratch pad memory space in embedded multiprocessor systems. In: Proceedings of Design Automation Conference (2002)

    Google Scholar 

  8. Issenin, I., et al.: Multiprocessor system-on-chip data reuse analysis for exploring customized memory hierarchies. In: Proceedings of Design Automation Conference (2006)

    Google Scholar 

  9. Kasahara, H., et al.: U.S. Patent No. 8,438,359, U.S. Patent and Trademark Office, Washington, DC (2013)

    Google Scholar 

  10. Kimura, K., Mase, M., Mikami, H., Miyamoto, T., Shirako, J., Kasahara, H.: OSCAR API for real-time low-power multicores and its performance on multicores and smp servers. In: Gao, G.R., Pollock, L.L., Cavazos, J., Li, X. (eds.) LCPC 2009. LNCS, vol. 5898, pp. 188–202. Springer, Heidelberg (2010). doi:10.1007/978-3-642-13374-9_13

    Chapter  Google Scholar 

  11. Banakar, R., et al.: Scratchpad memory: design alternative for cache on-chip memory in embedded systems. In: Proceedings of International Symposium on Hardware/Software Codesign (2002)

    Google Scholar 

  12. Wolfe, M.: More iteration space tiling. In: Proceedings of ACM/IEEE Conference on Supercomputing (1989)

    Google Scholar 

  13. Kasahara, H., Honda, H., Mogi, A., Ogura, A., Fujiwara, K., Narita, S.: A multi-grain parallelizing compilation scheme for OSCAR (optimally scheduled advanced multiprocessor). In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds.) LCPC 1991. LNCS, vol. 589, pp. 283–297. Springer, Heidelberg (1992). doi:10.1007/BFb0038671

    Chapter  Google Scholar 

  14. Yoshida, A., et al.: Data-localization for Fortran macro-dataflow computation using partial static task assignment. In: Proceedings of International Conference on Supercomputing (1996)

    Google Scholar 

  15. Ito, M., et al.: An 8640 MIPS SoC with independent poweroff control of 8 CPU and 8 RAMs by an automatic parallelizing compiler. In: Proceedings of IEEE International Solid State Circuits Conference (2008)

    Google Scholar 

  16. Kennedy, K., et al.: Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers Inc., San Francisco (2001)

    Google Scholar 

  17. Padua, D., et al.: Advanced compiler optimizations for supercomputers. Commun. ACM 29, 1184–1201 (1986)

    Article  Google Scholar 

  18. https://www.renesas.com/en-in/products/microcontrollers-microprocessors/v850/v850e2mx/v850e2mx4.html

Download references

Acknowledgments

This work was partly supported by JSPS KAKENHI Grant Number JP15K00085.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hironori Kasahara .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Yamamoto, K., Shirakawa, T., Oki, Y., Yoshida, A., Kimura, K., Kasahara, H. (2017). Automatic Local Memory Management for Multicores Having Global Address Space. In: Ding, C., Criswell, J., Wu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2016. Lecture Notes in Computer Science(), vol 10136. Springer, Cham. https://doi.org/10.1007/978-3-319-52709-3_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-52709-3_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-52708-6

  • Online ISBN: 978-3-319-52709-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics