Abstract
Platforms consisting of many computing cores have become the mainstream in high performance computing, general purpose-computing and, lately, embedded systems. Such systems provide increased processing power and system availability, but often impose latencies and contention for memory accesses as multiple cores try to reference data at the same time. This may result in sub-optimal performance unless special allocation policies are employed. On a multi-processor board with 50 or more processing cores, the NoC (Network On Chip) adds to this challenge. This work evaluates the impact of bank-aware and controller-aware allocation on NoC contention. Experiments show that targeted memory allocation results in reduced execution times and NoC contention, the latter of which has not been studied before at this scale.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Wulf, W.A., McKee, S.A.: Hitting the memory wall: implications of the obvious. ACM SIGARCH Comput. Archit. News 23(1), 20–24 (1995)
Programming The Tile Processor, Tilera. http://www.tilera.com/
Application Libraries Reference Manual, Tilera. http://www.tilera.com/
Tilera processor family. www.tilera.com
Intel xeon phi, April 2015. https://www-ssl.intel.com/content/www/us/en/processors/xeon/xeon-phi-coprocessor-datasheet.html
Single-chip cloud computer. blogs.intel.com/intellabs/2009/12/sccloudcomp.php
Adapteva processor family. www.adapteva.com/products/silicon-devices/e16g301/
Tile Processor I/O Device Guide, Tilera. http://www.tilera.com/
Tile Processor User Architecture Overview, Tilera. http://www.tilera.com/
Yun, H., Mancuso, R., Wu, Z.-P., Pellizzoni, R.: Palloc: dram bank-aware memory allocator for performance isolation on multicore platforms. In: IEEE Real-Time and Embedded Technology and Applications Symposium, vol. 356 (2014)
Jeong, M.K., Yoon, D.H., Sunwoo, D., Sullivan, M., Lee, I., Erez, M.: Balancing DRAM locality and parallelism in shared memory CMP systems. In: International Symposium on High Performance Computer Architecture, pp. 1–12 (2012)
Liu, L., Cui, Z., Xing, M., Bao, Y., Chen, M., Wu, C.: A software memory partition approach for eliminating bank-level interference in multicore systems. In: International Conference on Parallel Architectures and Compilation Techniques, pp. 367–376 (2012)
Tile Processor User Architecture Reference, Tilera. http://www.tilera.com/scm
Park, H., Baek, S., Choi, J., Lee, D., Noh, S.H.: Regularities considered harmful: forcing randomness to memory accesses to reduce row buffer conflicts for multi-core, multi-bank systems. ACM SIGPLAN Notices 48(4), 181–192 (2013)
Muralidhara, S.P., Subramanian, L., Mutlu, O., Kandemir, M., Moscibroda, T.: Reducing memory interference in multicore systems via application-aware memory channel partitioning. In: International Symposium on Microarchitecture, pp. 374–385 (2011)
Reineke, J., Liu, I., Patel, H.D., Kim, S., Lee, E.A.: Pret dram controller: bank privatization for predictability and temporal isolation. In: International conference on Hardware/software codesign and system synthesis, pp. 99–108 (2011)
Wu, Z.P., Krish, Y., Pellizzoni, R.: Worst case analysis of DRAM latency in multi-requestor systems. In: 34th IEEE Real-Time Systems Symposium (RTSS), pp. 372–383 (2013)
Akesson, B., Goossens, K., Ringhofer, M.: Predator: a predictable SDRAM memory controller. In: International Conference on Hardware/Software Codesign and System Synthesis, pp. 251–256 (2007)
Goossens, S., Akesson, B., Goossens, K.: Conservative open-page policy for mixed time-criticality memory controllers. In: Conference on Design, Automation and Test in Europe, pp. 525–530 (2013)
Paolieri, M., Quiñones, E., Cazorla, F.J., Valero, M.: An analyzable memory controller for hard real-time CMPs. IEEE Embed. Syst. Lett. 1(4), 86–90 (2009)
Åkesson, B., Steffens, L., Strooisma, E., Goossens, K. et al.: Real-time scheduling of hybrid systems using credit-controlled static-priority arbitration. In: RTCSA (2008)
Kim, S., Chandra, D., Solihin, Y.: Fair cache sharing and partitioning in a chip multiprocessor architecture. In: International Conference on Parallel Architectures and Compilation Techniques, pp. 111–122 (2004)
Nesbit, K.J., Laudon, J., Smith, J.E.: Virtual private caches. ACM SIGARCH Comput. Archit. News 35(2), 57–68 (2007)
Liedtke, J., Hartig, H., Hohmuth, M.: OS-controlled cache predictability for real-time systems. In: Third IEEE Real-Time Technology and Applications Symposium, Proceedings, pp. 213–224 (1997)
Lin, J., Lu, Q., Ding, X., Zhang, Z., Zhang, X., Sadayappan, P.: Gaining insights into multicore cache partitioning: bridging the gap between simulation and real systems. In: IEEE 14th International Symposium on High Performance Computer Architecture, HPCA 2008, pp. 367–378 (2008)
Zhang, X., Dwarkadas, S., Shen, K.: Towards practical page coloring-based multicore cache management. In: European conference on Computer systems, pp. 89–102 (2009)
Soares, L., Tam, D., Stumm, M.: Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer. In: International Symposium on Microarchitecture, pp. 258–269 (2008)
Ding, X., Wang, K., Zhang, X.: SRM-Buffer,: an OS buffer management technique to prevent last level cache from thrashing in multicores. In: Conference on Computer systems, pp. 243–256 (2011)
Ward, B.C., Herman, J.L., Kenna, C.J., Anderson, J.H.: Outstanding paper award: making shared caches more predictable on multicore platforms. In: 25th Euromicro Conference on Real-Time Systems (ECRTS), pp. 157–167 (2013)
Mancuso, R., Dudko, R., Betti, E., Cesati, M., Caccamo, M., Pellizzoni, R.: Real-time cache management framework for multi-core architectures. In: IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), pp. 45–54 (2013)
Buono, D., Danelutto, M., Lametti, S., Torquati, M.: Parallel patterns for general purpose many-core. In: Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 131–139 (2013)
Acknowledgment
Tilera Corporation provided technical support of the research. This work was funded in part by NSF grants 1239246 and 1058779 as well as a grant from AFOSR via Securboration.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Chandru, V., Mueller, F. (2016). Reducing NoC and Memory Contention for Manycores. In: Hannig, F., Cardoso, J.M.P., Pionteck, T., Fey, D., Schröder-Preikschat, W., Teich, J. (eds) Architecture of Computing Systems – ARCS 2016. ARCS 2016. Lecture Notes in Computer Science(), vol 9637. Springer, Cham. https://doi.org/10.1007/978-3-319-30695-7_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-30695-7_22
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30694-0
Online ISBN: 978-3-319-30695-7
eBook Packages: Computer ScienceComputer Science (R0)