Skip to main content

A Parallelizing Compiler Cooperative Heterogeneous Multicore Processor Architecture

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((THIPEAC,volume 6760))

Abstract

Heterogeneous multicore architectures, integrating several kinds of accelerator cores in addition to general purpose processor cores, have been attracting much attention to realize high performance with low power consumption. To attain effective high performance, high application software productivity, and low power consumption on heterogeneous multicores, cooperation between an architecture and a parallelizing compiler is important. This paper proposes a compiler cooperative heterogeneous multicore architecture and parallelizing compilation scheme for it. Performance of the proposed scheme is evaluated on the heterogeneous multicore integrating Hitachi and Renesas’ SH4A processor cores and Hitachi’s FE-GA accelerator cores, using an MP3 encoder. The heterogeneous multicore gives us 14.34 times speedup with two SH4As and two FE-GAs, and 26.05 times speedup with four SH4As and four FE-GAs against sequential execution with a single SH4A. The cooperation between the heterogeneous multicore architecture and the parallelizing compiler enables to achieve high performance in a short development period.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hammond, L., Hubbert, B.A., Siu, M., Prabhu, M.K., Chen, M., Olukotun, K.: The stanford hydra CMP. IEEE Micro 20, 71–84 (2000)

    Article  Google Scholar 

  2. ARM Limited: ARM11 MPCore Processor Technical Reference Manual (2005)

    Google Scholar 

  3. Friedrich, J., McCredie, B., James, N., Huott, B., Curran, B., Fluhr, E., Mittal, G., Chan, E., Chan, Y., Plass, D., Chu, S., Le, H., Clark, L., Ripley, J., Taylor, S., Dilullo, J., Lanzerotti, M.: Design of the Power6 microprocessor. In: Digest of Technical Papers of the 2007 IEEE International Solid-State Circuits Conference, pp. 96–97 (February 2007)

    Google Scholar 

  4. Taylor, M.B., Kim, J., Miller, J., Wentzlaff, D., Ghodrat, F., Greenwald, B., Hoffman, H., Johnson, P., Lee, J.W., Lee, W., Ma, A., Saraf, A., Seneski, M., Shnidman, N., Strumpen, V., Frank, M., Amarasinghe, S., Agarwal, A.: The raw microprocessor: A computational fabric for software circuits and general purpose programs. IEEE Micro 22, 25–35 (2002)

    Article  Google Scholar 

  5. Sankaralingam, K., Nagarajan, R., Liu, H., Kim, C., Huh, J., Burger, D., Keckler, S.W., Moore, C.R.: Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In: Proceedings of the 30th Annual International Symposium on Computer Architecture, pp. 422–433 (June 2003)

    Google Scholar 

  6. Shiota, T., Kawasaki, K., Kawabe, Y., Shibamoto, W., Sato, A., Hashimoto, T., Hayakawa, F., Tago, S., Okano, H., Nakamura, Y., Miyake, H., Suga, A., Takahashi, H.: A 51.2GOPS 1.0GB/s-DMA single-chip multi-processor integrating quadruple 8-Way VLIW processors. In: Digest of Technical Papers of the 2005 IEEE International Solid-State Circuits Conference, pp. 194–593 (February 2005)

    Google Scholar 

  7. Sohi, G.S., Breach, S.E., Vijaykumar, T.N.: Multiscalar processors. In: Proceedings of 22nd Annual International Symposium on Computer Architecture, pp. 414–425 (June 1995)

    Google Scholar 

  8. Vangal, S., Howard, J., Ruhl, G., Dighe, S., Wilson, H., Tschanz, J., Finan, D., Iyer, P., Singh, A., Jacob, T., Jain, S., Venkataraman, S., Hoskote, Y., Borkar, N.: An 80-Tile 1.28TFLOPS network-on-chip in 65nm CMOS. In: Digest of Technical Papers of the 2007 IEEE International Solid-State Circuits Conference, pp. 98–589 (February 2007)

    Google Scholar 

  9. Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M., Dubey, P., Junkins, S., Lake, A., Sugerman, J., Cavin, R., Espasa, R., Grochowski, E., Juan, T., Hanrahan, P.: Larrabee: A many-core x86 architecture for visual computing. ACM Transactions on Graphics 27(3) (2008)

    Google Scholar 

  10. Pham, D., Asano, S., Bolliger, M., Day, M.N., Hofstee, H.P., Johns, C., Kahle, J., Kameyama, A., Keaty, J., Masubuchi, Y., Riley, M., Shippy, D., Stasiak, D., Suzuoki, M., Wang, M., Warnock, J., Weitzel, S., Wendel, D., Yamazaki, T., Yazawa, K.: The design and implementation of a first-generation CELL processor. In: Digest of Technical Papers of the 2005 IEEE International Solid-State Circuits Conference, pp. 184–592 (February 2005)

    Google Scholar 

  11. Khailany, B., Williams, T., Lin, J., Long, E., Rygh, M., Tovey, D., Dally, W.J.: A programmable 512 GOPS stream processor for signal, image, and video processing. In: Digest of Technical Papers of the 2007 IEEE International Solid-State Circuits Conference, pp. 272–602 (February 2007)

    Google Scholar 

  12. Torii, S., Suzuki, S., Tomonaga, H., Tokue, T., Sakai, J., Suzuki, N., Murakami, K., Hiraga, T., Shigemoto, K., Tatebe, Y., Ohbuchi, E., Kayama, N., Edahiro, M., Kusano, T., Nishi, N.: A 600MIPS 120mW 70μA leakage triple-CPU mobile application processor chip. In: Digest of Technical Papers of the 2005 IEEE International Solid-State Circuits Conference, pp. 136–589 (February 2005)

    Google Scholar 

  13. Ito, M., Todaka, T., Tsunoda, T., Tanaka, H., Kodama, T., Shikano, H., Onouchi, M., Uchiyama, K., Odaka, T., Kamei, T., Nagahama, E., Kusaoke, M., Nitta, Y., Wada, Y., Kimura, K., Kasahara, H.: Heterogeneous multiprocessor on a chip which enables 54x AAC-LC stereo encoding. In: Proceedings of the 2007 IEEE Symposium on VLSI Circuits, pp. 18–19 (June 2007)

    Google Scholar 

  14. Kumar, R., Tullsen, D.M., Ranganathan, P., Jouppi, N.P., Farkas, K.I.: Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. In: Proceedings of the 31st Annual International Symposium on Computer Architecture, pp. 64–75 (June 2004)

    Google Scholar 

  15. Shikano, H., Suzuki, Y., Wada, Y., Shirako, J., Kimura, K., Kasahara, H.: Performance evaluation of heterogeneous chip multi-processor with MP3 audio encoder. In: Proceedings of the IEEE Symposium on Low-Power and High Speed Chips, pp. 349–363 (April 2006)

    Google Scholar 

  16. Noda, H., Tanizaki, T., Gyohten, T., Dosaka, K., Nakajima, M., Mizumoto, K., Yoshida, K., Iwao, T., Nishijima, T., Okuno, Y., Arimoto, K.: The circuits and robust design methodology of the massively parallel processor based on the matrix architecture. In: Digest of Technical Papers of the 2006 Symposium on VLSI Circuits, pp. 210–211 (2006)

    Google Scholar 

  17. NVIDIA Corporation: NVIDIA CUDA Compute Unified Device Architecture Programming Guide (2008)

    Google Scholar 

  18. Xie, T., Qin, X.: Stochastic scheduling with availability constraints in heterogeneous clusters. In: Proceedings of the 2006 IEEE International Conference on Cluster Computing, pp. 1–10 (September 2006)

    Google Scholar 

  19. Sih, G.C., Lee, E.A.: A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures. IEEE Transactions on Parallel and Distributed Systems 4, 175–187 (1993)

    Article  Google Scholar 

  20. Chan, W.Y., Li, C.K.: Scheduling tasks in DAG to heterogeneous processor system. In: Proceedings of the 6th Euromicro Workshop on Parallel and Distributed Processing, pp. 27–31 (January 1998)

    Google Scholar 

  21. Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems 13, 260–274 (2002)

    Article  Google Scholar 

  22. Kasahara, H., Honda, H., Narita, S.: Parallel processing of near fine grain tasks using static scheduling on OSCAR (Optimally SCheduled Advanced multiprocessoR). In: Proceedings of Supercomputing ’90, pp. 856–864 (November 1990)

    Google Scholar 

  23. Kimura, K., Kodaka, T., Obata, M., Kasahara, H.: Multigrain parallel processing on OSCAR CMP. In: Proceedings of the 2003 International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems (January 2003)

    Google Scholar 

  24. Ishizaka, K., Miyamoto, T., Shirako, J., Obata, M., Kimura, K., Kasahara, H.: Performance of OSCAR multigrain parallelizing compiler on SMP servers. In: Proceedings of the 17th International Workshop on Languages and Compilers for Parallel Computing (September 2004)

    Google Scholar 

  25. Kimura, K., Wada, Y., Nakano, H., Kodaka, T., Shirako, J., Ishizaka, K., Kasahara, H.: Multigrain parallel processing on compiler cooperative chip multiprocessor. In: Proceedings of the 9th Annual Workshop on Interaction between Compilers and Computer Architectures, pp. 11–20 (February 2005)

    Google Scholar 

  26. Kasahara, H., Ogata, W., Kimura, K., Matsui, G., Matsuzaki, H., Okamoto, M., Yoshida, A., Honda, H.: OSCAR multi-grain architecture and its evaluation. In: Proceedings of the 1997 International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems, pp. 106–115 (October 1997)

    Google Scholar 

  27. Kasahara, H., Honda, H., Mogi, A., Ogura, A., Fujiwara, K., Narita, S.: A multi-grain parallelizing compilation scheme for OSCAR (Optimally scheduled advanced multiprocessor). In: Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing, pp. 283–297 (August 1991)

    Google Scholar 

  28. Obata, M., Shirako, J., Kaminaga, H., Ishizaka, K., Kasahara, H.: Hierarchical parallelism control for multigrain parallel processing. In: Pugh, B., Tseng, C.-W. (eds.) LCPC 2002. LNCS, vol. 2481, pp. 31–44. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  29. Shirako, J., Nagasawa, K., Ishizaka, K., Obata, M., Kasahara, H.: Selective inline expansion for improvement of multi grain parallelism. In: The IASTED International Conference on Parallel and Distributed Computing and Networks, pp. 128–134 (February 2004)

    Google Scholar 

  30. Yoshida, Y., Kamei, T., Hayase, K., Shibahara, S., Nishii, O., Hattori, T., Hasegawa, A., Takada, M., Irie, N., Uchiyama, K., Odaka, T., Takada, K., Kimura, K., Kasahara, H.: A 4320MIPS four-processor core SMP/AMP with individually managed clock frequency for low power consumption. In: Digest of Technical Papers of the 2007 IEEE International Solid-State Circuits Conference, pp. 100–590 (February 2007)

    Google Scholar 

  31. Kodama, T., Tsunoda, T., Takada, M., Tanaka, H., Akita, Y., Sato, M., Ito, M.: Flexible engine: A dynamic reconfigurable accelerator with high performance and low power consumption. In: Proceedings of the IEEE Symposium on Low-Power and High Speed Chips, pp. 393–408 (April 2006)

    Google Scholar 

  32. UZURA3: MPEG1/LayerIII encoder in FORTRAN90, http://members.at.infoseek.co.jp/kitaurawa/index_e.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Wada, Y. et al. (2011). A Parallelizing Compiler Cooperative Heterogeneous Multicore Processor Architecture. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers IV. Lecture Notes in Computer Science, vol 6760. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24568-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24568-8_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24567-1

  • Online ISBN: 978-3-642-24568-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics