MAP-numa: Access Patterns Used to Characterize the NUMA Memory Access Optimization Techniques and Algorithms

Luo, Qiuming; Liu, Chenjian; Kong, Chang; Cai, Ye

doi:10.1007/978-3-642-35606-3_24

Qiuming Luo^20,21,
Chenjian Liu²¹,
Chang Kong²¹ &
…
Ye Cai^20,21,22

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7513))

Included in the following conference series:

IFIP International Conference on Network and Parallel Computing

2450 Accesses
1 Citations

Abstract

Some typical memory access patterns are provided and programmed in C, which can be used as benchmark to characterize the various techniques and algorithms aim to improve the performance of NUMA memory access. These access patterns, called MAP-numa (Memory Access Patterns for NUMA), currently include three classes, whose working data sets are corresponding to 1-dimension array, 2-dimension matrix and 3-dimension cube. It is dedicated for NUMA memory access optimization other than measuring the memory bandwidth and latency. MAP-numa is an alternative to those exist benchmarks such as STREAM, pChase, etc. It is used to verify the optimizations’ (made automatically/manually to source code/executive binary) capacities by investigating what locality leakage can be remedied. Some experiment results are shown, which give an example of using MAP-numa to evaluate some optimizations based on Oprofile sampling.

Download to read the full chapter text

Chapter PDF

Selective Data Migration Between Locality Groups in NUMA Systems

A Study on Modeling and Optimization of Memory Systems

Article 30 January 2021

SharP Unified Memory Allocator: An Intent-Based Memory Allocator for Extreme-Scale Systems

Keywords

References

Zhang, X., Qin, X.: Performance Prediction and Evaluation of Parallel Processing on a NUMA Multiprocessor. IEEE Trans. Software Eng. 17(10), 1059–1068 (1991)
Article Google Scholar
LaRowe Jr., R.P., Ellis, C.S., Holliday, M.A.: Evaluation of NUMA Memory Management Through Modeling and Measurements. IEEE Transactions on Parallel and Distributed Systems, 686–701 (1992)
Google Scholar
Brecht, T.B.: On the importance of parallel application placement in NUMA multiproces-sors. In: Proc. of SEDMS IV, Symposium on Experiences with Distributed and Multiprocessor Systems, pp. 1–18. USENIX Association (1993)
Google Scholar
Holliday, M.A., Stumm, M.: Performance Evaluation of Hierarchical Ring-Based Shared Memory Multiprocessors. IEEE Trans. Computers 43(1), 52–67 (1994)
Article Google Scholar
Drepper, U.: What every programmer should know about memory (2007), http://people.redhat.com/drepper/cpumemory.pdf
Kleen, A.: A NUMA API for linux. Technical report, Novell Inc., Suse Linux Products GmbH (2005)
Google Scholar
Ribeiro, C.P., Méhaut, J.-F., Carissimi, A., Fernandes, L.G.: Memory Affinity for Hierachical Shared Memory Multiprocessors. In: 21st International Symposium on Computer Architecture and High Performance Computing, pp. 59–66 (2009)
Google Scholar
Lameter, C.: Local and remote memory: Memory in a Linux/NUMA system (2006), ftp://ftp.tlk-l.net/pub/linux/kernel/people/christoph/pmig/numamemory.pdf
Broquedis, F., Furmento, N., Goglin, B., Wacrenier, P., Namyst, R.: ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures. International Journal of Parallel Programming (Spring 2010)
Google Scholar
Yang, R., Antony, J., Rendell, A., Robson, D., Strazdins, P.: Profiling Directed NUMA Optimization on Linux System: A Case Study of the Gaussian Computational Chemistry Code. In: 2011 IEEE International Parallel&Distributed Processing Symposium, pp. 1046–1057 (2011)
Google Scholar
McCurdy, C., Vetter, J.: Memphis: Finding and Fixing numa-related performance problems on Multi-core platforms. In: Proceedings of ISPASS, pp. 87–96 (2010)
Google Scholar
Cruz, E., Pousa, C., Alves, M., Carissimi, A., Navaux, P., Mehaut, J.-F.: Using Memory Access Traces to Map Threads and Data on Hierarchical Multi-core Platforms. In: 2011 IEEE International Parallel & Distributed Processing Symposium, pp. 551–558 (2011)
Google Scholar
Diener, M., Madruga, F., Rodrigues, E., Alves, M., Schneider, J., Navaux, P., Heiss, H.U.: Evaluating thread placement based on memory access patterns for multi-core processors. In: 2010 12th IEEE International Conference on High Performance Computing and Communications, pp. 491–496 (2010)
Google Scholar
Osiakwan, C., Akl, S.: The maximum weight perfect matching problem for complete weighted graphs is in pc. In: Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing, pp. 880–887 (1990)
Google Scholar
Castro, M., Fernandes, L.G., Ribeiro, C.P., Méhaut, J.-F., de Aguiar, M.S.: NUMA-ICTM: A Parallel Version of ICTM Exploiting Memory Placement Strategies for NUMA Machines. In: PDSEC 2009: Parallel and Distributed Processing Symposium, International, pp. 1–8 (2009)
Google Scholar
Cruz, E., Alves, M., Carissimi, A., Navaux, P., Pousa, C., Méhaut, J.-F.: Memory-aware Thread and Data Mapping for Hierarchical Multi-core Platforms. International Journal of Networking and Computing, 97–116 (2012)
Google Scholar
Tudor, M., Teo, Y., See, S.: Understanding Off-Chip Memory Contention of Parallel Programs in Multicore Systems. In: 2011 International Conference on Parallel Processing, pp. 602–611 (2011)
Google Scholar
Rodrigues, E.R., Madruga, F.L., Navaux, P.O.A., Panetta, J.: Multi-core aware process mapping and its impact on communication overhead of parallel applications. In: ISCC, pp. 811–817 (2009)
Google Scholar
Hursey, J., Squyres, J.M., Dontje, T.: Locality-Aware Parallel Process Mapping for Multi-Core HPC Systems. In: 2011 IEEE International Conference on Cluster Computing, pp. 527–531 (2011)
Google Scholar
Drongowski, P.J.: Instruction-Based Sampling: A New Performance Analysis Technique for AMD Family 10h Processors. Advanced Micro Devices, Inc. (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

National High Performance Computing Center (NHPCC), Shenzhen, China
Qiuming Luo & Ye Cai
College of Computer Science and Software Engineering, Shenzhen University, China
Qiuming Luo, Chenjian Liu, Chang Kong & Ye Cai
State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Ye Cai

Authors

Qiuming Luo
View author publications
You can also search for this author in PubMed Google Scholar
Chenjian Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chang Kong
View author publications
You can also search for this author in PubMed Google Scholar
Ye Cai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, SeoulTech, 172 Gongreung 2-dong, Nowon-gu, 139-743, Seoul, Korea
James J. Park
School of Information Technologies, The University of Sydney, Building J12, 2006, Sydney, NSW, Australia
Albert Zomaya
Division of Computer Engineering, Mokwon University, 88 Do-An-Buk-Ro, Seo-gu, 302-729, Daejeon, Korea
Sang-Soo Yeo
Department of Computer and Information Science and Engineering, University of Florida, CSE 301, 32611, Gainesville, FL, USA
Sartaj Sahni

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luo, Q., Liu, C., Kong, C., Cai, Y. (2012). MAP-numa: Access Patterns Used to Characterize the NUMA Memory Access Optimization Techniques and Algorithms. In: Park, J.J., Zomaya, A., Yeo, SS., Sahni, S. (eds) Network and Parallel Computing. NPC 2012. Lecture Notes in Computer Science, vol 7513. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35606-3_24

Download citation

DOI: https://doi.org/10.1007/978-3-642-35606-3_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35605-6
Online ISBN: 978-3-642-35606-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MAP-numa: Access Patterns Used to Characterize the NUMA Memory Access Optimization Techniques and Algorithms

Abstract

Chapter PDF

Similar content being viewed by others

Selective Data Migration Between Locality Groups in NUMA Systems

A Study on Modeling and Optimization of Memory Systems

SharP Unified Memory Allocator: An Intent-Based Memory Allocator for Extreme-Scale Systems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

MAP-numa: Access Patterns Used to Characterize the NUMA Memory Access Optimization Techniques and Algorithms

Abstract

Chapter PDF

Similar content being viewed by others

Selective Data Migration Between Locality Groups in NUMA Systems

A Study on Modeling and Optimization of Memory Systems

SharP Unified Memory Allocator: An Intent-Based Memory Allocator for Extreme-Scale Systems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation