A Parallelizing Compiler Cooperative Heterogeneous Multicore Processor Architecture

Wada, Yasutaka; Hayashi, Akihiro; Masuura, Takeshi; Shirako, Jun; Nakano, Hirofumi; Shikano, Hiroaki; Kimura, Keiji; Kasahara, Hironori

doi:10.1007/978-3-642-24568-8_11

A Parallelizing Compiler Cooperative Heterogeneous Multicore Processor Architecture

Yasutaka Wada¹⁷,
Akihiro Hayashi¹⁷,
Takeshi Masuura¹⁷,
Jun Shirako¹⁷,
Hirofumi Nakano¹⁷,
Hiroaki Shikano¹⁷,
Keiji Kimura¹⁷ &
…
Hironori Kasahara¹⁷

Chapter

695 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((THIPEAC,volume 6760))

Abstract

Heterogeneous multicore architectures, integrating several kinds of accelerator cores in addition to general purpose processor cores, have been attracting much attention to realize high performance with low power consumption. To attain effective high performance, high application software productivity, and low power consumption on heterogeneous multicores, cooperation between an architecture and a parallelizing compiler is important. This paper proposes a compiler cooperative heterogeneous multicore architecture and parallelizing compilation scheme for it. Performance of the proposed scheme is evaluated on the heterogeneous multicore integrating Hitachi and Renesas’ SH4A processor cores and Hitachi’s FE-GA accelerator cores, using an MP3 encoder. The heterogeneous multicore gives us 14.34 times speedup with two SH4As and two FE-GAs, and 26.05 times speedup with four SH4As and four FE-GAs against sequential execution with a single SH4A. The cooperation between the heterogeneous multicore architecture and the parallelizing compiler enables to achieve high performance in a short development period.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hammond, L., Hubbert, B.A., Siu, M., Prabhu, M.K., Chen, M., Olukotun, K.: The stanford hydra CMP. IEEE Micro 20, 71–84 (2000)
Article Google Scholar
ARM Limited: ARM11 MPCore Processor Technical Reference Manual (2005)
Google Scholar
Friedrich, J., McCredie, B., James, N., Huott, B., Curran, B., Fluhr, E., Mittal, G., Chan, E., Chan, Y., Plass, D., Chu, S., Le, H., Clark, L., Ripley, J., Taylor, S., Dilullo, J., Lanzerotti, M.: Design of the Power6 microprocessor. In: Digest of Technical Papers of the 2007 IEEE International Solid-State Circuits Conference, pp. 96–97 (February 2007)
Google Scholar
Taylor, M.B., Kim, J., Miller, J., Wentzlaff, D., Ghodrat, F., Greenwald, B., Hoffman, H., Johnson, P., Lee, J.W., Lee, W., Ma, A., Saraf, A., Seneski, M., Shnidman, N., Strumpen, V., Frank, M., Amarasinghe, S., Agarwal, A.: The raw microprocessor: A computational fabric for software circuits and general purpose programs. IEEE Micro 22, 25–35 (2002)
Article Google Scholar
Sankaralingam, K., Nagarajan, R., Liu, H., Kim, C., Huh, J., Burger, D., Keckler, S.W., Moore, C.R.: Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In: Proceedings of the 30th Annual International Symposium on Computer Architecture, pp. 422–433 (June 2003)
Google Scholar
Shiota, T., Kawasaki, K., Kawabe, Y., Shibamoto, W., Sato, A., Hashimoto, T., Hayakawa, F., Tago, S., Okano, H., Nakamura, Y., Miyake, H., Suga, A., Takahashi, H.: A 51.2GOPS 1.0GB/s-DMA single-chip multi-processor integrating quadruple 8-Way VLIW processors. In: Digest of Technical Papers of the 2005 IEEE International Solid-State Circuits Conference, pp. 194–593 (February 2005)
Google Scholar
Sohi, G.S., Breach, S.E., Vijaykumar, T.N.: Multiscalar processors. In: Proceedings of 22nd Annual International Symposium on Computer Architecture, pp. 414–425 (June 1995)
Google Scholar
Vangal, S., Howard, J., Ruhl, G., Dighe, S., Wilson, H., Tschanz, J., Finan, D., Iyer, P., Singh, A., Jacob, T., Jain, S., Venkataraman, S., Hoskote, Y., Borkar, N.: An 80-Tile 1.28TFLOPS network-on-chip in 65nm CMOS. In: Digest of Technical Papers of the 2007 IEEE International Solid-State Circuits Conference, pp. 98–589 (February 2007)
Google Scholar
Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M., Dubey, P., Junkins, S., Lake, A., Sugerman, J., Cavin, R., Espasa, R., Grochowski, E., Juan, T., Hanrahan, P.: Larrabee: A many-core x86 architecture for visual computing. ACM Transactions on Graphics 27(3) (2008)
Google Scholar
Pham, D., Asano, S., Bolliger, M., Day, M.N., Hofstee, H.P., Johns, C., Kahle, J., Kameyama, A., Keaty, J., Masubuchi, Y., Riley, M., Shippy, D., Stasiak, D., Suzuoki, M., Wang, M., Warnock, J., Weitzel, S., Wendel, D., Yamazaki, T., Yazawa, K.: The design and implementation of a first-generation CELL processor. In: Digest of Technical Papers of the 2005 IEEE International Solid-State Circuits Conference, pp. 184–592 (February 2005)
Google Scholar
Khailany, B., Williams, T., Lin, J., Long, E., Rygh, M., Tovey, D., Dally, W.J.: A programmable 512 GOPS stream processor for signal, image, and video processing. In: Digest of Technical Papers of the 2007 IEEE International Solid-State Circuits Conference, pp. 272–602 (February 2007)
Google Scholar
Torii, S., Suzuki, S., Tomonaga, H., Tokue, T., Sakai, J., Suzuki, N., Murakami, K., Hiraga, T., Shigemoto, K., Tatebe, Y., Ohbuchi, E., Kayama, N., Edahiro, M., Kusano, T., Nishi, N.: A 600MIPS 120mW 70μA leakage triple-CPU mobile application processor chip. In: Digest of Technical Papers of the 2005 IEEE International Solid-State Circuits Conference, pp. 136–589 (February 2005)
Google Scholar
Ito, M., Todaka, T., Tsunoda, T., Tanaka, H., Kodama, T., Shikano, H., Onouchi, M., Uchiyama, K., Odaka, T., Kamei, T., Nagahama, E., Kusaoke, M., Nitta, Y., Wada, Y., Kimura, K., Kasahara, H.: Heterogeneous multiprocessor on a chip which enables 54x AAC-LC stereo encoding. In: Proceedings of the 2007 IEEE Symposium on VLSI Circuits, pp. 18–19 (June 2007)
Google Scholar
Kumar, R., Tullsen, D.M., Ranganathan, P., Jouppi, N.P., Farkas, K.I.: Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. In: Proceedings of the 31st Annual International Symposium on Computer Architecture, pp. 64–75 (June 2004)
Google Scholar
Shikano, H., Suzuki, Y., Wada, Y., Shirako, J., Kimura, K., Kasahara, H.: Performance evaluation of heterogeneous chip multi-processor with MP3 audio encoder. In: Proceedings of the IEEE Symposium on Low-Power and High Speed Chips, pp. 349–363 (April 2006)
Google Scholar
Noda, H., Tanizaki, T., Gyohten, T., Dosaka, K., Nakajima, M., Mizumoto, K., Yoshida, K., Iwao, T., Nishijima, T., Okuno, Y., Arimoto, K.: The circuits and robust design methodology of the massively parallel processor based on the matrix architecture. In: Digest of Technical Papers of the 2006 Symposium on VLSI Circuits, pp. 210–211 (2006)
Google Scholar
NVIDIA Corporation: NVIDIA CUDA Compute Unified Device Architecture Programming Guide (2008)
Google Scholar
Xie, T., Qin, X.: Stochastic scheduling with availability constraints in heterogeneous clusters. In: Proceedings of the 2006 IEEE International Conference on Cluster Computing, pp. 1–10 (September 2006)
Google Scholar
Sih, G.C., Lee, E.A.: A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures. IEEE Transactions on Parallel and Distributed Systems 4, 175–187 (1993)
Article Google Scholar
Chan, W.Y., Li, C.K.: Scheduling tasks in DAG to heterogeneous processor system. In: Proceedings of the 6th Euromicro Workshop on Parallel and Distributed Processing, pp. 27–31 (January 1998)
Google Scholar
Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems 13, 260–274 (2002)
Article Google Scholar
Kasahara, H., Honda, H., Narita, S.: Parallel processing of near fine grain tasks using static scheduling on OSCAR (Optimally SCheduled Advanced multiprocessoR). In: Proceedings of Supercomputing ’90, pp. 856–864 (November 1990)
Google Scholar
Kimura, K., Kodaka, T., Obata, M., Kasahara, H.: Multigrain parallel processing on OSCAR CMP. In: Proceedings of the 2003 International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems (January 2003)
Google Scholar
Ishizaka, K., Miyamoto, T., Shirako, J., Obata, M., Kimura, K., Kasahara, H.: Performance of OSCAR multigrain parallelizing compiler on SMP servers. In: Proceedings of the 17th International Workshop on Languages and Compilers for Parallel Computing (September 2004)
Google Scholar
Kimura, K., Wada, Y., Nakano, H., Kodaka, T., Shirako, J., Ishizaka, K., Kasahara, H.: Multigrain parallel processing on compiler cooperative chip multiprocessor. In: Proceedings of the 9th Annual Workshop on Interaction between Compilers and Computer Architectures, pp. 11–20 (February 2005)
Google Scholar
Kasahara, H., Ogata, W., Kimura, K., Matsui, G., Matsuzaki, H., Okamoto, M., Yoshida, A., Honda, H.: OSCAR multi-grain architecture and its evaluation. In: Proceedings of the 1997 International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems, pp. 106–115 (October 1997)
Google Scholar
Kasahara, H., Honda, H., Mogi, A., Ogura, A., Fujiwara, K., Narita, S.: A multi-grain parallelizing compilation scheme for OSCAR (Optimally scheduled advanced multiprocessor). In: Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing, pp. 283–297 (August 1991)
Google Scholar
Obata, M., Shirako, J., Kaminaga, H., Ishizaka, K., Kasahara, H.: Hierarchical parallelism control for multigrain parallel processing. In: Pugh, B., Tseng, C.-W. (eds.) LCPC 2002. LNCS, vol. 2481, pp. 31–44. Springer, Heidelberg (2005)
Chapter Google Scholar
Shirako, J., Nagasawa, K., Ishizaka, K., Obata, M., Kasahara, H.: Selective inline expansion for improvement of multi grain parallelism. In: The IASTED International Conference on Parallel and Distributed Computing and Networks, pp. 128–134 (February 2004)
Google Scholar
Yoshida, Y., Kamei, T., Hayase, K., Shibahara, S., Nishii, O., Hattori, T., Hasegawa, A., Takada, M., Irie, N., Uchiyama, K., Odaka, T., Takada, K., Kimura, K., Kasahara, H.: A 4320MIPS four-processor core SMP/AMP with individually managed clock frequency for low power consumption. In: Digest of Technical Papers of the 2007 IEEE International Solid-State Circuits Conference, pp. 100–590 (February 2007)
Google Scholar
Kodama, T., Tsunoda, T., Takada, M., Tanaka, H., Akita, Y., Sato, M., Ito, M.: Flexible engine: A dynamic reconfigurable accelerator with high performance and low power consumption. In: Proceedings of the IEEE Symposium on Low-Power and High Speed Chips, pp. 393–408 (April 2006)
Google Scholar
UZURA3: MPEG1/LayerIII encoder in FORTRAN90, http://members.at.infoseek.co.jp/kitaurawa/index_e.html

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Waseda University, 3-4-1 Ohkubo, Shinjuku-ku, Tokyo, 169-8555, Japan
Yasutaka Wada, Akihiro Hayashi, Takeshi Masuura, Jun Shirako, Hirofumi Nakano, Hiroaki Shikano, Keiji Kimura & Hironori Kasahara

Authors

Yasutaka Wada
View author publications
You can also search for this author in PubMed Google Scholar
Akihiro Hayashi
View author publications
You can also search for this author in PubMed Google Scholar
Takeshi Masuura
View author publications
You can also search for this author in PubMed Google Scholar
Jun Shirako
View author publications
You can also search for this author in PubMed Google Scholar
Hirofumi Nakano
View author publications
You can also search for this author in PubMed Google Scholar
Hiroaki Shikano
View author publications
You can also search for this author in PubMed Google Scholar
Keiji Kimura
View author publications
You can also search for this author in PubMed Google Scholar
Hironori Kasahara
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Chalmers University of Technology, 412 96, Gothenburg, Sweden
Per Stenström

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wada, Y. et al. (2011). A Parallelizing Compiler Cooperative Heterogeneous Multicore Processor Architecture. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers IV. Lecture Notes in Computer Science, vol 6760. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24568-8_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-24568-8_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24567-1
Online ISBN: 978-3-642-24568-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics