OSCAR API for Real-Time Low-Power Multicores and Its Performance on Multicores and SMP Servers

Kimura, Keiji; Mase, Masayoshi; Mikami, Hiroki; Miyamoto, Takamichi; Shirako, Jun; Kasahara, Hironori

doi:10.1007/978-3-642-13374-9_13

OSCAR API for Real-Time Low-Power Multicores and Its Performance on Multicores and SMP Servers

Keiji Kimura¹⁸,
Masayoshi Mase¹⁸,
Hiroki Mikami¹⁸,
Takamichi Miyamoto¹⁸,
Jun Shirako¹⁸ &
…
Hironori Kasahara¹⁸

Conference paper

850 Accesses
17 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5898))

Abstract

OSCAR (Optimally Scheduled Advanced Multiprocessor) API has been designed for real-time embedded low-power multicores to generate parallel programs for various multicores from different vendors by using the OSCAR parallelizing compiler. The OSCAR API has been developed by Waseda University in collaboration with Fujitsu Laboratory, Hitachi, NEC, Panasonic, Renesas Technology, and Toshiba in an METI/NEDO project entitled “Multicore Technology for Realtime Consumer Electronics.” By using the OSCAR API as an interface between the OSCAR compiler and backend compilers, the OSCAR compiler enables hierarchical multigrain parallel processing with memory optimization under capacity restriction for cache memory, local memory, distributed shared memory, and on-chip/off-chip shared memory; data transfer using a DMA controller; and power reduction control using DVFS (Dynamic Voltage and Frequency Scaling), clock gating, and power gating for various embedded multicores. In addition, a parallelized program automatically generated by the OSCAR compiler with OSCAR API can be compiled by the ordinary OpenMP compilers since the OSCAR API is designed on a subset of the OpenMP. This paper describes the OSCAR API and its compatibility with the OSCAR compiler by showing code examples. Performance evaluations of the OSCAR compiler and the OSCAR API are carried out using an IBM Power5+ workstation, an IBM Power6 high-end SMP server, and a newly developed consumer electronics multicore chip RP2 by Renesas, Hitachi and Waseda. From the results of scalability evaluation, it is found that on an average, the OSCAR compiler with the OSCAR API can exploit 5.8 times speedup over the sequential execution on the Power5+ workstation with eight cores and 2.9 times speedup on RP2 with four cores, respectively. In addition, the OSCAR compiler can accelerate an IBM XL Fortran compiler up to 3.3 times on the Power6 SMP server. Due to low-power optimization on RP2, the OSCAR compiler with the OSCAR API achieves a maximum power reduction of 84% in the real-time execution mode.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Pham, D., Asano, S., Bolliger, M., Day, M., Hofstee, H., Johns, C., Kahle, J., Kameyama, A., Keaty, J., Masubuchi, Y., Riley, M., Shippy, D., Stasiak, D., Wang, M., Warnock, J., Weitzel, S., Wendel, D., Yamazaki, T., Yazawa, K.: The design and implementation of a first-generation cell processor. In: Proc. of IEEE International Solid State Circuits Conference (ISSCC 2005) (February 2005)
Google Scholar
Yoshida, Y., Kamei, T., Hayase, K., Shibahara, S., Nishii, O., Hattori, T., Hasegawa, A., Takada, M., Irie, N., Uchiyama, K., Odaka, T., Takada, K., Kimura, K., Kasahara, H.: A 4320mips four-processor core smp/amp with individually managed clock frequency for low power consumption. In: Proc. of IEEE International Solid State Circuits Conference (ISSCC 2007) (February 2007)
Google Scholar
Ito, M., Hattori, T., Yoshida, Y., Hayase, K., Hayashi, T., Nishii, O., Yasu, Y., Hasegawa, A., Takada, M., Ito, M., Mizuno, H., Uchiyama, K., Odaka, T., Shirako, J., Mase, M., Kimura, K., Kasahara, H.: An 8640 mips soc with independent power-off control of 8 cpu and 8 rams by an automatic parallelizing compiler. In: Proc. of IEEE International Solid State Circuits Conference (ISSCC 2008) (February 2008)
Google Scholar
Shiota, T., Kawasaki, K.I., Kawabe, Y., Shibamoto, W., Sato, A., Hashimoto, T., Hayakawa, F., Tago, S.I., Okano, H., Nakamura, Y., Miyake, H., Suga, A., Takahashi, H.: A 51.2gops, 1.0gb/s-dma single-chip multi-processor integrating quadruple 8-way vliw processors. In: Proc. of IEEE International Solid State Circuits Conference (ISSCC 2005) (February 2005)
Google Scholar
Torii, S., Suzuki, S., Tomonaga, H., Tokue, T., Sakai, J., Suzuki, N., Murakami, K., Hiraga, T., Shigemoto, K., Tatebe, Y., Obuchi, E., Kayama, N., Edahiro, M., Kusano, T., Nishi, N.: A 600mips 120mw 70ua leakage triple-cpu mobile application processor chip. In: Proc. of IEEE International Solid State Circuits Conference (ISSCC 2005) (February 2005)
Google Scholar
Hall, M.W., Anderson, J.M., Amarasinghe, S.P., Murphy, B.R., Wei Liao, S., Bugnion, E., Lain, M.S., Benchmark, S.: Maximizing multiprocessor performance with the suif compiler. IEEE Computer 29, 84–89 (1996)
Google Scholar
Blume, W., Doallo, R., Eigenmann, R., Grout, J., Hoeflinger, J., Lawrence, T., Lee, J., Padua, D., Paek, Y., Pottenger, B., Rauchwerger, L., Tu, P.: Parallel programming with polaris. IEEE Computer 29, 78–82 (1996)
Google Scholar
Association, T.M.: Multicore communication api specification
Google Scholar
http://www.khronos.org/opencl/
Kasahara, H., Obata, M., Ishizaka, K.: Automatic coarse grain task parallel processing on smp using openmp. In: Midkiff, S.P., Moreira, J.E., Gupta, M., Chatterjee, S., Ferrante, J., Prins, J.F., Pugh, B., Tseng, C.-W. (eds.) LCPC 2000. LNCS, vol. 2017, p. 189. Springer, Heidelberg (2001)
Chapter Google Scholar
Obata, M., Shirako, J., Kaminaga, H., Ishizaka, K., Kasahara, H.: Hierarchical parallelism control for multigrain parallel processing. In: Pugh, B., Tseng, C.-W. (eds.) LCPC 2002. LNCS, vol. 2481, pp. 31–44. Springer, Heidelberg (2005)
Chapter Google Scholar
Kimura, K., Wada, Y., Nakano, H., Kodaka, T., Shirako, J., Ishizaka, K., Kasahara, H.: Multigrain parallel processing on compiler cooperative chip multiprocessor. In: Proc. of 9th Workshop on Interaction between Compilers and Computer Architectures (INTERACT-9) (February 2005)
Google Scholar
Yoshida, A., Koshizuka, K., Kasahara, H.: Data-localization for fortran macro-dataflow computation using partial static task assignment. In: Proc. of 10th ACM International Conference on Supercomputing (May 1996)
Google Scholar
Ishizaka, K., Obata, M., Kasahara, H.: Coarse grain task parallel processing with cache optimization on shared memory multiprocessor. In: Dietz, H.G. (ed.) LCPC 2001. LNCS, vol. 2624, pp. 352–365. Springer, Heidelberg (2003)
Chapter Google Scholar
Kasahara, H., Kogo, M., Tobita, T., Masuda, T., Tanaka, T.: An automatic coarse grain parallel processing scheme using multiprocessor scheduling algorithms considering overlap of task execution and data transfer. In: Proc. SCI 1999 and ISAS 1999 (August 1999)
Google Scholar
Shirako, J., Oshiyama, N., Wada, Y., Shikano, H., Kimura, K., Kasahara, H.: Compiler control power saving scheme for multi core processors. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds.) LCPC 2005. LNCS, vol. 4339, pp. 362–376. Springer, Heidelberg (2006)
Chapter Google Scholar
http://www.openmp.org/
http://www.kasahara.cs.waseda.ac.jp/
Mikami, H., Shirako, J., Mase, M., Miyamoto, T., Nakano, H., Takano, F., Hayashi, A., Wada, Y., Kimura, K., Kasahara, H.: Performance of oscar multigrain parallelizing compiler on multicore processors. In: Proc. of 14th Workshop on Compilers for Parallel Computing (CPC 2009) (January 2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Waseda University, 3-4-1 Okubo, Shinjuku-ku, Tokyo, Japan
Keiji Kimura, Masayoshi Mase, Hiroki Mikami, Takamichi Miyamoto, Jun Shirako & Hironori Kasahara

Authors

Keiji Kimura
View author publications
You can also search for this author in PubMed Google Scholar
Masayoshi Mase
View author publications
You can also search for this author in PubMed Google Scholar
Hiroki Mikami
View author publications
You can also search for this author in PubMed Google Scholar
Takamichi Miyamoto
View author publications
You can also search for this author in PubMed Google Scholar
Jun Shirako
View author publications
You can also search for this author in PubMed Google Scholar
Hironori Kasahara
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, University of Delaware, 19716, Newark, DE, USA
Guang R. Gao & Xiaoming Li &
Department of Computer and Information Sciences, University of Delaware, 19716, Newark, DE, USA
Lori L. Pollock & John Cavazos &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kimura, K., Mase, M., Mikami, H., Miyamoto, T., Shirako, J., Kasahara, H. (2010). OSCAR API for Real-Time Low-Power Multicores and Its Performance on Multicores and SMP Servers. In: Gao, G.R., Pollock, L.L., Cavazos, J., Li, X. (eds) Languages and Compilers for Parallel Computing. LCPC 2009. Lecture Notes in Computer Science, vol 5898. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13374-9_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-13374-9_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13373-2
Online ISBN: 978-3-642-13374-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics