Skip to main content

Cache- and Communication-aware Application Mapping for Shared-cache Multicore Processors

  • Conference paper
  • First Online:
Architecture of Computing Systems – ARCS 2015 (ARCS 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9017))

Included in the following conference series:

Abstract

We propose and study a mapping algorithm optimized for shared-cache multicore processors. Performance requirement of modern applications is constantly growing. Processing huge amount of data in real-time is a trend even for mobile devices. It is common to find a octa-core processor in mobile phones or tablets. We will be able to see embedded devices with tens of cores in the next few years, if the trend continues. Conventional mapping algorithms are not well designed for shared-cache multicore processors. We discuss the importance of application mapping in terms of inter-application communication and shared-cache access delay. An algorithm is proposed with optimizations of the two aspects. We introduce a method with low computation complexity. First the mapping region is calculated with the congregate degree of nodes, then the region is expanded with a strategy in which the nearest nodes with lowest average cache latency are selected. The comparison with other mapping algorithms shows up to 13.9% improvement in average inter-application communication distance, with near optimal values considering the average cache latency. The results from real applications show that, the execution time and power consumption of the proposed algorithm has improved for 8% and 16.7% respectively, compared with an incremental mapping algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. AMD: Family 10th amd phenom processor product data sheet (November 2008), http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs

  2. Chen, Y.J., Yang, C.L., Chang, Y.S.: An architectural co-synthesis algorithm for energy-aware network-on-chip design. Journal of Systems Architecture 55(5-6), 299–309 (2009)

    Google Scholar 

  3. Choi, I., Zhao, M., Yang, X., Yeung, D.: Experience with improving distributed shared cache performance on tilera’s tile processor. Computer Architecture Letters 10(2), 45–48 (2011)

    Article  Google Scholar 

  4. Chou, C.L., Ogras, U., Marculescu, R.: Energy- and performance-aware incremental mapping for networks on chip with multiple voltage levels. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 27(10), 1866–1879 (2008)

    Article  Google Scholar 

  5. Dally, W., Towles, B.: Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco (2003)

    Google Scholar 

  6. Sharma, D., Pradhan, D.K.: Processor allocation in hypercube multicomputers: Fast and efficient strategies for cubic and noncubic allocation. IEEE Trans. Parallel Distrib. Syst. 6(10), 1108–1122 (1995)

    Google Scholar 

  7. Fattah, M., Rahmani, A.M., Xu, T., Kanduri, A., Liljeberg, P., Plosila, J., Tenhunen, H.: Mixed-criticality run-time task mapping for noc-based many-core systems. In: 2014 22nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 458–465 (February 2014)

    Google Scholar 

  8. Fleig, T., Mattes, O., Karl, W.: Evaluation of adaptive memory management techniques on the tilera tile-gx platform. In: 2014 27th International Conference on Architecture of Computing Systems (ARCS), pp. 1–8 (February 2014)

    Google Scholar 

  9. Hakem, M., Butelle, F.: Dynamic critical path scheduling parallel programs onto multiprocessors. In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2005), Workshop 8, vol. 9, pp. 203.2. IEEE Computer Society, Washington, DC (2005)

    Google Scholar 

  10. Hu, J., Marculescu, R.: Energy-aware communication and task scheduling for network-on-chip architectures under real-time constraints. In: Proceedings of the Conference on Design, Automation and Test in Europe, DATE 2004, vol. 1, pp. 10234. IEEE Computer Society, Washington, DC (2004)

    Google Scholar 

  11. Kahng, A.B., Li, B., Peh, L.S., Samadi, K.: Orion 2.0: a fast and accurate noc power and area model for early-stage design space exploration. In: Proceedings of the Conference on Design, Automation and Test in Europe, DATE 2009, pp. 423–428. European Design and Automation Association, 3001 Leuven, Belgium (2009)

    Google Scholar 

  12. Kim, C., Burger, D., Keckler, S.W.: An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In: Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS X, pp. 211–222. ACM, New York (2002)

    Google Scholar 

  13. Laudon, J., Lenoski, D.: The sgi origin: A ccnuma highly scalable server. In: The 24th Annual International Symposium on Computer Architecture, Conference Proceedings, pp. 241–251 (June 1997)

    Google Scholar 

  14. Lei, T., Kumar, S.: A two-step genetic algorithm for mapping task graphs to a network on chip architecture. In: Proceedings. Euromicro Symposium on Digital System Design, pp. 180–187 (September 2003)

    Google Scholar 

  15. Leutenegger, S.T., Vernon, M.K.: The performance of multiprogrammed multiprocessor scheduling algorithms. SIGMETRICS Perform. Eval. Rev. 18(1), 226–236 (1990)

    Article  Google Scholar 

  16. Magnusson, P., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., Werner, B.: Simics: A full system simulation platform. Computer 35(2), 50–58 (2002)

    Article  Google Scholar 

  17. Martin, M.M., Sorin, D.J., Beckmann, B.M., Marty, M.R., Xu, M., Alameldeen, A.R., Moore, K.E., Hill, M.D., Wood, D.A.: Multifacet’s general execution-driven multiprocessor simulator (gems) toolset. Computer Architecture News (September 2005)

    Google Scholar 

  18. TGG: Task graph generator (July 2014), http://taskgraphgen.sourceforge.net/

  19. Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The splash-2 programs: Characterization and methodological considerations. In: Proceedings of the 22nd International Symposium on Computer Architecture, pp. 24–36 (June 1995)

    Google Scholar 

  20. Xu, T., Guang, L., Yin, A., Yang, B., Liljeberg, P., Tenhunen, H.: An analysis of designing 2d/3d chip multiprocessor wit different cache architecture. In: NORCHIP 2010, p. 1–6 (November 2010)

    Google Scholar 

  21. Xu, T.C., Liljeberg, P., Plosila, J., Tenhunen, H.: Exploration of heuristic scheduling algorithms for 3d multicore processors. In: Proceedings of the 15th International Workshop on Software and Compilers for Embedded Systems, SCOPES 2012, pp. 22–31. ACM, New York (2012)

    Google Scholar 

  22. Xu, T.C., Liljeberg, P., Plosila, J., Tenhunen, H.: A high-efficiency low-cost heterogeneous 3d network-on-chip design. In: Proceedings of the Fifth International Workshop on Network on Chip Architectures, NoCArc 2012, pp. 37–42. ACM, New York (2012)

    Google Scholar 

  23. Xu, T.C., Liljeberg, P., Tenhunen, H.: A Minimal Average Accessing Time Scheduler for Multicore Processors. In: Xiang, Y., Cuzzocrea, A., Hobbs, M., Zhou, W. (eds.) ICA3PP 2011, Part II. LNCS, vol. 7017, pp. 287–299. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  24. Yang, C.Q., Reddy, A.: A taxonomy for congestion control algorithms in packet switching networks. IEEE Network 9(4), 34–45 (1995)

    Article  Google Scholar 

  25. Zhou, X., Chen, W., Zheng, W.: Cache sharing management for performance fairness in chip multiprocessors. In: 18th International Conference on Parallel Architectures and Compilation Techniques, PACT 2009, pp. 384–393 (September 2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Canhao Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Xu, T.C., Leppänen, V. (2015). Cache- and Communication-aware Application Mapping for Shared-cache Multicore Processors. In: Pinho, L., Karl, W., Cohen, A., Brinkschulte, U. (eds) Architecture of Computing Systems – ARCS 2015. ARCS 2015. Lecture Notes in Computer Science(), vol 9017. Springer, Cham. https://doi.org/10.1007/978-3-319-16086-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16086-3_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16085-6

  • Online ISBN: 978-3-319-16086-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics