Abstract
Due to the varying latencies between memory banks, efficient shared memory access is challenging on modern NUMA architectures. This has a major impact on the shared memory performance of parallel programs, particularly those written in languages with automatic memory management.
This paper presents a performance evaluation of distributed and shared heap implementations of parallel Haskell on a state-of-the-art physical shared memory NUMA machine. The evaluation exposes bottlenecks in the shared-memory management, which results in limits to scalability beyond 25 out of the 48 cores.
We demonstrate that a hybrid system, GUMSMP, that combines both distributed and shared heap abstractions consistently outperforms the shared memory GHC implementation on seven benchmarks by a factor of 3.3 on average. Specifically, we show that the best results are obtained when sharing memory only within a single NUMA region, and using distributed memory system abstractions across the regions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aljabri, M., Loidl, H.-W., Trinder, P.W.: The design and implementation of GUMSMP: a multilevel parallel haskell implementation. In: Proceedings of the 25th ACM SIGPLAN Symposium on Implementation and Application of Functional Languages, IFL 2013. ACM, Nijmegen (2013). http://dx.doi.org/10.1145/2620678.2620682
Alnowaiser, K.: A study of connected object locality in numa heaps. In: Proceedings of the 2014 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, MSPC 2014, pp. 1:1–1:9. ACM, New York (2014). http://doi.acm.org/10.1145/2618128.2618132
Auhagen, S., Bergstrom, L., Fluet, M., Reppy, J.: Garbage collection for multicore NUMA machines. In: Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, MSPC 2011, pp. 51–57. ACM, New York (2011). http://doi.acm.org/10.1145/1988915.1988929
Benner, R., Echeverria, V.T.E., Onunkwo, U., Patel, J., Zage, D.: Harnessing manycore processors for scalable, highly efficient, and adaptable firewall solutions. In: 2013 International Conference on Computing, Networking and Communications (ICNC), pp. 637–641, January 2013. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6504161&isnumber=6504039
Bergstrom, L.: Measuring numa effects with the stream benchmark. CoRR, abs/1103.3225 (2011). http://dblp.uni-trier.de/db/journals/corr/corr1103.html#abs-1103-3225
Berthold, J., Loidl, H.-W., Hammond, K.: PAEAN: Portable Runtime Support for Physically-Shared-Nothing Architectures in Parallel Haskell Dialects. Journal of Functional Programming (2015). To appear in Special Issue on Runtime-environments
Gidra, L., Thomas, G., Sopena, J., Shapiro, M.: Assessing the scalability of garbage collectors on many cores. In: Proceedings of the 6th Workshop on Programming Languages and Operating Systems, PLOS 2011, pp. 7:1–7:5. ACM, New York (2011). http://doi.acm.org/10.1145/2039239.2039249
Jones Jr., D., Marlow, S., Singh, S.: Parallel performance tuning for haskell. In: Proceedings of the 2nd ACM SIGPLAN Symposium on Haskell, Haskell 2009, pp. 81–92. ACM, New York (2009). http://doi.acm.org/10.1145/1596638.1596649
Lameter, C.: NUMA (Non-Uniform Memory Access): An Overview. Queue 11(7), 40:40–40:51 (2013). http://doi.acm.org/10.1145/2508834.2513149
Lester, D.: An efficient distributed garbage collection algorithm. In: Odijk, E., Rem, M., Syre, J.-C. (eds.) PARLE 1989. LNCS, vol. 365, pp. 207–223. Springer, Heidelberg (1989). http://dx.doi.org/10.1007/3540512845_41
Marlow, S., Peyton Jones, S.L.: Multicore garbage collection with local heaps. In: Proceedings of the International Symposium on Memory Management, ISMM 2011, pp. 21–32. ACM, New York (2011). http://doi.acm.org/10.1145/1993478.1993482
Marlow, S., Peyton Jones, S.L., Singh, S.: Runtime support for multicore haskell. In: Proceedings of the 14th ACM SIGPLAN International Conference on Functional Programming, ICFP 2009, pp. 65–78. ACM, New York (2009). http://doi.acm.org/10.1145/1596550.1596563
Marlow, S., Peyton Jones, S.L.: The Glasgow Haskell Compiler. In: The Architecture of Open Source Applications, vol. 2. lulu.com (2012). http://www.aosabook.org/en/ghc.html
Marlow, S., Harris, T., James, R.P., Peyton Jones, S.L.: Parallel generational-copying garbage collection with a block-structured heap. In: Proceedings of the 7th International Symposium on Memory Management, ISMM 2008, pp. 11–20. ACM, New York (2008). http://doi.acm.org/10.1145/1375634.1375637
Peyton Jones, S.L.: Parallel Implementations of Functional Programming Languages. Comput. J. 32, 175–186 (1989). http://portal.acm.org/citation.cfm?id=63410.63418
Su, C., Li, D., Nikolopoulos, D.S., Grove, M., Cameron, K., de Supinski, B.R.: Critical Path-based Thread Placement for NUMA Systems. SIGMETRICS Perform. Eval. Rev. 40(2), 106–112 (2012). http://doi.acm.org/10.1145/2381056.2381079
Tan, L., Yufei, R., Dantong, Y., Shudong, J., Robertazzi, T.: Characterization of input/output bandwidth performance models in NUMA architecture for data intensive applications. In: 2013 42nd International Conference on Parallel Processing (ICPP), pp. 369–378, October 2013. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6687370&isnumber=6687321
Terboven, C., Schmidl, D., Cramer, T., an Mey, D.: Assessing OpenMP tasking implementations on NUMA architectures. In: Chapman, B.M., Müller, M.S., Massaioli, F., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 182–195. Springer, Heidelberg (2012). http://link.springer.com/chapter/10.1007%2F978-3-642-30961-8_14
Trinder, P., Hammond, K., Mattson Jr., J.S., Partridge, A.S., Peyton Jones, S.L.: GUM: a portable parallel implementation of haskell. In: Programming Languages Design and Implementation, PLDI 1996, Philadelphia, PA, USA, pp. 79–88, May 1996. http://dx.doi.org/10.1145/231379.231392
Yang, E.Z.: The GHC Runtime System, July 2013. http://ezyang.com/jfp-ghc-rts-draft.pdf
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Aljabri, M., Loidl, HW., Trinder, P. (2015). Balancing Shared and Distributed Heaps on NUMA Architectures. In: Hage, J., McCarthy, J. (eds) Trends in Functional Programming. TFP 2014. Lecture Notes in Computer Science(), vol 8843. Springer, Cham. https://doi.org/10.1007/978-3-319-14675-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-14675-1_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14674-4
Online ISBN: 978-3-319-14675-1
eBook Packages: Computer ScienceComputer Science (R0)