Abstract
Some typical memory access patterns are provided and programmed in C, which can be used as benchmark to characterize the various techniques and algorithms aim to improve the performance of NUMA memory access. These access patterns, called MAP-numa (Memory Access Patterns for NUMA), currently include three classes, whose working data sets are corresponding to 1-dimension array, 2-dimension matrix and 3-dimension cube. It is dedicated for NUMA memory access optimization other than measuring the memory bandwidth and latency. MAP-numa is an alternative to those exist benchmarks such as STREAM, pChase, etc. It is used to verify the optimizations’ (made automatically/manually to source code/executive binary) capacities by investigating what locality leakage can be remedied. Some experiment results are shown, which give an example of using MAP-numa to evaluate some optimizations based on Oprofile sampling.
Chapter PDF
Similar content being viewed by others
References
Zhang, X., Qin, X.: Performance Prediction and Evaluation of Parallel Processing on a NUMA Multiprocessor. IEEE Trans. Software Eng. 17(10), 1059–1068 (1991)
LaRowe Jr., R.P., Ellis, C.S., Holliday, M.A.: Evaluation of NUMA Memory Management Through Modeling and Measurements. IEEE Transactions on Parallel and Distributed Systems, 686–701 (1992)
Brecht, T.B.: On the importance of parallel application placement in NUMA multiproces-sors. In: Proc. of SEDMS IV, Symposium on Experiences with Distributed and Multiprocessor Systems, pp. 1–18. USENIX Association (1993)
Holliday, M.A., Stumm, M.: Performance Evaluation of Hierarchical Ring-Based Shared Memory Multiprocessors. IEEE Trans. Computers 43(1), 52–67 (1994)
Drepper, U.: What every programmer should know about memory (2007), http://people.redhat.com/drepper/cpumemory.pdf
Kleen, A.: A NUMA API for linux. Technical report, Novell Inc., Suse Linux Products GmbH (2005)
Ribeiro, C.P., Méhaut, J.-F., Carissimi, A., Fernandes, L.G.: Memory Affinity for Hierachical Shared Memory Multiprocessors. In: 21st International Symposium on Computer Architecture and High Performance Computing, pp. 59–66 (2009)
Lameter, C.: Local and remote memory: Memory in a Linux/NUMA system (2006), ftp://ftp.tlk-l.net/pub/linux/kernel/people/christoph/pmig/numamemory.pdf
Broquedis, F., Furmento, N., Goglin, B., Wacrenier, P., Namyst, R.: ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures. International Journal of Parallel Programming (Spring 2010)
Yang, R., Antony, J., Rendell, A., Robson, D., Strazdins, P.: Profiling Directed NUMA Optimization on Linux System: A Case Study of the Gaussian Computational Chemistry Code. In: 2011 IEEE International Parallel&Distributed Processing Symposium, pp. 1046–1057 (2011)
McCurdy, C., Vetter, J.: Memphis: Finding and Fixing numa-related performance problems on Multi-core platforms. In: Proceedings of ISPASS, pp. 87–96 (2010)
Cruz, E., Pousa, C., Alves, M., Carissimi, A., Navaux, P., Mehaut, J.-F.: Using Memory Access Traces to Map Threads and Data on Hierarchical Multi-core Platforms. In: 2011 IEEE International Parallel & Distributed Processing Symposium, pp. 551–558 (2011)
Diener, M., Madruga, F., Rodrigues, E., Alves, M., Schneider, J., Navaux, P., Heiss, H.U.: Evaluating thread placement based on memory access patterns for multi-core processors. In: 2010 12th IEEE International Conference on High Performance Computing and Communications, pp. 491–496 (2010)
Osiakwan, C., Akl, S.: The maximum weight perfect matching problem for complete weighted graphs is in pc. In: Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing, pp. 880–887 (1990)
Castro, M., Fernandes, L.G., Ribeiro, C.P., Méhaut, J.-F., de Aguiar, M.S.: NUMA-ICTM: A Parallel Version of ICTM Exploiting Memory Placement Strategies for NUMA Machines. In: PDSEC 2009: Parallel and Distributed Processing Symposium, International, pp. 1–8 (2009)
Cruz, E., Alves, M., Carissimi, A., Navaux, P., Pousa, C., Méhaut, J.-F.: Memory-aware Thread and Data Mapping for Hierarchical Multi-core Platforms. International Journal of Networking and Computing, 97–116 (2012)
Tudor, M., Teo, Y., See, S.: Understanding Off-Chip Memory Contention of Parallel Programs in Multicore Systems. In: 2011 International Conference on Parallel Processing, pp. 602–611 (2011)
Rodrigues, E.R., Madruga, F.L., Navaux, P.O.A., Panetta, J.: Multi-core aware process mapping and its impact on communication overhead of parallel applications. In: ISCC, pp. 811–817 (2009)
Hursey, J., Squyres, J.M., Dontje, T.: Locality-Aware Parallel Process Mapping for Multi-Core HPC Systems. In: 2011 IEEE International Conference on Cluster Computing, pp. 527–531 (2011)
Drongowski, P.J.: Instruction-Based Sampling: A New Performance Analysis Technique for AMD Family 10h Processors. Advanced Micro Devices, Inc. (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 IFIP International Federation for Information Processing
About this paper
Cite this paper
Luo, Q., Liu, C., Kong, C., Cai, Y. (2012). MAP-numa: Access Patterns Used to Characterize the NUMA Memory Access Optimization Techniques and Algorithms. In: Park, J.J., Zomaya, A., Yeo, SS., Sahni, S. (eds) Network and Parallel Computing. NPC 2012. Lecture Notes in Computer Science, vol 7513. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35606-3_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-35606-3_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35605-6
Online ISBN: 978-3-642-35606-3
eBook Packages: Computer ScienceComputer Science (R0)