Reducing NoC and Memory Contention for Manycores

Chandru, Vishwanathan; Mueller, Frank

doi:10.1007/978-3-319-30695-7_22

Reducing NoC and Memory Contention for Manycores

Vishwanathan Chandru¹⁹ &
Frank Mueller¹⁹

Conference paper

1337 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9637))

Abstract

Platforms consisting of many computing cores have become the mainstream in high performance computing, general purpose-computing and, lately, embedded systems. Such systems provide increased processing power and system availability, but often impose latencies and contention for memory accesses as multiple cores try to reference data at the same time. This may result in sub-optimal performance unless special allocation policies are employed. On a multi-processor board with 50 or more processing cores, the NoC (Network On Chip) adds to this challenge. This work evaluates the impact of bank-aware and controller-aware allocation on NoC contention. Experiments show that targeted memory allocation results in reduced execution times and NoC contention, the latter of which has not been studied before at this scale.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Wulf, W.A., McKee, S.A.: Hitting the memory wall: implications of the obvious. ACM SIGARCH Comput. Archit. News 23(1), 20–24 (1995)
Article Google Scholar
Programming The Tile Processor, Tilera. http://www.tilera.com/
Application Libraries Reference Manual, Tilera. http://www.tilera.com/
Tilera processor family. www.tilera.com
Intel xeon phi, April 2015. https://www-ssl.intel.com/content/www/us/en/processors/xeon/xeon-phi-coprocessor-datasheet.html
Single-chip cloud computer. blogs.intel.com/intellabs/2009/12/sccloudcomp.php
Adapteva processor family. www.adapteva.com/products/silicon-devices/e16g301/
Tile Processor I/O Device Guide, Tilera. http://www.tilera.com/
Tile Processor User Architecture Overview, Tilera. http://www.tilera.com/
Yun, H., Mancuso, R., Wu, Z.-P., Pellizzoni, R.: Palloc: dram bank-aware memory allocator for performance isolation on multicore platforms. In: IEEE Real-Time and Embedded Technology and Applications Symposium, vol. 356 (2014)
Google Scholar
Jeong, M.K., Yoon, D.H., Sunwoo, D., Sullivan, M., Lee, I., Erez, M.: Balancing DRAM locality and parallelism in shared memory CMP systems. In: International Symposium on High Performance Computer Architecture, pp. 1–12 (2012)
Google Scholar
Liu, L., Cui, Z., Xing, M., Bao, Y., Chen, M., Wu, C.: A software memory partition approach for eliminating bank-level interference in multicore systems. In: International Conference on Parallel Architectures and Compilation Techniques, pp. 367–376 (2012)
Google Scholar
Tile Processor User Architecture Reference, Tilera. http://www.tilera.com/scm
Park, H., Baek, S., Choi, J., Lee, D., Noh, S.H.: Regularities considered harmful: forcing randomness to memory accesses to reduce row buffer conflicts for multi-core, multi-bank systems. ACM SIGPLAN Notices 48(4), 181–192 (2013)
Article Google Scholar
Muralidhara, S.P., Subramanian, L., Mutlu, O., Kandemir, M., Moscibroda, T.: Reducing memory interference in multicore systems via application-aware memory channel partitioning. In: International Symposium on Microarchitecture, pp. 374–385 (2011)
Google Scholar
Reineke, J., Liu, I., Patel, H.D., Kim, S., Lee, E.A.: Pret dram controller: bank privatization for predictability and temporal isolation. In: International conference on Hardware/software codesign and system synthesis, pp. 99–108 (2011)
Google Scholar
Wu, Z.P., Krish, Y., Pellizzoni, R.: Worst case analysis of DRAM latency in multi-requestor systems. In: 34th IEEE Real-Time Systems Symposium (RTSS), pp. 372–383 (2013)
Google Scholar
Akesson, B., Goossens, K., Ringhofer, M.: Predator: a predictable SDRAM memory controller. In: International Conference on Hardware/Software Codesign and System Synthesis, pp. 251–256 (2007)
Google Scholar
Goossens, S., Akesson, B., Goossens, K.: Conservative open-page policy for mixed time-criticality memory controllers. In: Conference on Design, Automation and Test in Europe, pp. 525–530 (2013)
Google Scholar
Paolieri, M., Quiñones, E., Cazorla, F.J., Valero, M.: An analyzable memory controller for hard real-time CMPs. IEEE Embed. Syst. Lett. 1(4), 86–90 (2009)
Article Google Scholar
Åkesson, B., Steffens, L., Strooisma, E., Goossens, K. et al.: Real-time scheduling of hybrid systems using credit-controlled static-priority arbitration. In: RTCSA (2008)
Google Scholar
Kim, S., Chandra, D., Solihin, Y.: Fair cache sharing and partitioning in a chip multiprocessor architecture. In: International Conference on Parallel Architectures and Compilation Techniques, pp. 111–122 (2004)
Google Scholar
Nesbit, K.J., Laudon, J., Smith, J.E.: Virtual private caches. ACM SIGARCH Comput. Archit. News 35(2), 57–68 (2007)
Article Google Scholar
Liedtke, J., Hartig, H., Hohmuth, M.: OS-controlled cache predictability for real-time systems. In: Third IEEE Real-Time Technology and Applications Symposium, Proceedings, pp. 213–224 (1997)
Google Scholar
Lin, J., Lu, Q., Ding, X., Zhang, Z., Zhang, X., Sadayappan, P.: Gaining insights into multicore cache partitioning: bridging the gap between simulation and real systems. In: IEEE 14th International Symposium on High Performance Computer Architecture, HPCA 2008, pp. 367–378 (2008)
Google Scholar
Zhang, X., Dwarkadas, S., Shen, K.: Towards practical page coloring-based multicore cache management. In: European conference on Computer systems, pp. 89–102 (2009)
Google Scholar
Soares, L., Tam, D., Stumm, M.: Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer. In: International Symposium on Microarchitecture, pp. 258–269 (2008)
Google Scholar
Ding, X., Wang, K., Zhang, X.: SRM-Buffer,: an OS buffer management technique to prevent last level cache from thrashing in multicores. In: Conference on Computer systems, pp. 243–256 (2011)
Google Scholar
Ward, B.C., Herman, J.L., Kenna, C.J., Anderson, J.H.: Outstanding paper award: making shared caches more predictable on multicore platforms. In: 25th Euromicro Conference on Real-Time Systems (ECRTS), pp. 157–167 (2013)
Google Scholar
Mancuso, R., Dudko, R., Betti, E., Cesati, M., Caccamo, M., Pellizzoni, R.: Real-time cache management framework for multi-core architectures. In: IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), pp. 45–54 (2013)
Google Scholar
Buono, D., Danelutto, M., Lametti, S., Torquati, M.: Parallel patterns for general purpose many-core. In: Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 131–139 (2013)
Google Scholar

Download references

Acknowledgment

Tilera Corporation provided technical support of the research. This work was funded in part by NSF grants 1239246 and 1058779 as well as a grant from AFOSR via Securboration.

Author information

Authors and Affiliations

North Carolina State University, Raleigh, NC, USA
Vishwanathan Chandru & Frank Mueller

Authors

Vishwanathan Chandru
View author publications
You can also search for this author in PubMed Google Scholar
Frank Mueller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Frank Mueller .

Editor information

Editors and Affiliations

Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
Frank Hannig
Faculty of Engineering (FEUP), University of Porto, Porto, Portugal
João M. P. Cardoso
Universität zu Lübeck, Lübeck, Germany
Thilo Pionteck
Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
Dietmar Fey
Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
Wolfgang Schröder-Preikschat
Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
Jürgen Teich

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chandru, V., Mueller, F. (2016). Reducing NoC and Memory Contention for Manycores. In: Hannig, F., Cardoso, J.M.P., Pionteck, T., Fey, D., Schröder-Preikschat, W., Teich, J. (eds) Architecture of Computing Systems – ARCS 2016. ARCS 2016. Lecture Notes in Computer Science(), vol 9637. Springer, Cham. https://doi.org/10.1007/978-3-319-30695-7_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-30695-7_22
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30694-0
Online ISBN: 978-3-319-30695-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics