Balancing Shared and Distributed Heaps on NUMA Architectures

Aljabri, Malak; Loidl, Hans-Wolfgang; Trinder, Phil

doi:10.1007/978-3-319-14675-1_1

Malak Aljabri¹⁵,
Hans-Wolfgang Loidl¹⁶ &
Phil Trinder¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8843))

Included in the following conference series:

International Symposium on Trends in Functional Programming

637 Accesses

Abstract

Due to the varying latencies between memory banks, efficient shared memory access is challenging on modern NUMA architectures. This has a major impact on the shared memory performance of parallel programs, particularly those written in languages with automatic memory management.

This paper presents a performance evaluation of distributed and shared heap implementations of parallel Haskell on a state-of-the-art physical shared memory NUMA machine. The evaluation exposes bottlenecks in the shared-memory management, which results in limits to scalability beyond 25 out of the 48 cores.

We demonstrate that a hybrid system, GUMSMP, that combines both distributed and shared heap abstractions consistently outperforms the shared memory GHC implementation on seven benchmarks by a factor of 3.3 on average. Specifically, we show that the best results are obtained when sharing memory only within a single NUMA region, and using distributed memory system abstractions across the regions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aljabri, M., Loidl, H.-W., Trinder, P.W.: The design and implementation of GUMSMP: a multilevel parallel haskell implementation. In: Proceedings of the 25th ACM SIGPLAN Symposium on Implementation and Application of Functional Languages, IFL 2013. ACM, Nijmegen (2013). http://dx.doi.org/10.1145/2620678.2620682
Alnowaiser, K.: A study of connected object locality in numa heaps. In: Proceedings of the 2014 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, MSPC 2014, pp. 1:1–1:9. ACM, New York (2014). http://doi.acm.org/10.1145/2618128.2618132
Auhagen, S., Bergstrom, L., Fluet, M., Reppy, J.: Garbage collection for multicore NUMA machines. In: Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, MSPC 2011, pp. 51–57. ACM, New York (2011). http://doi.acm.org/10.1145/1988915.1988929
Benner, R., Echeverria, V.T.E., Onunkwo, U., Patel, J., Zage, D.: Harnessing manycore processors for scalable, highly efficient, and adaptable firewall solutions. In: 2013 International Conference on Computing, Networking and Communications (ICNC), pp. 637–641, January 2013. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6504161&isnumber=6504039
Bergstrom, L.: Measuring numa effects with the stream benchmark. CoRR, abs/1103.3225 (2011). http://dblp.uni-trier.de/db/journals/corr/corr1103.html#abs-1103-3225
Berthold, J., Loidl, H.-W., Hammond, K.: PAEAN: Portable Runtime Support for Physically-Shared-Nothing Architectures in Parallel Haskell Dialects. Journal of Functional Programming (2015). To appear in Special Issue on Runtime-environments
Google Scholar
Gidra, L., Thomas, G., Sopena, J., Shapiro, M.: Assessing the scalability of garbage collectors on many cores. In: Proceedings of the 6th Workshop on Programming Languages and Operating Systems, PLOS 2011, pp. 7:1–7:5. ACM, New York (2011). http://doi.acm.org/10.1145/2039239.2039249
Jones Jr., D., Marlow, S., Singh, S.: Parallel performance tuning for haskell. In: Proceedings of the 2nd ACM SIGPLAN Symposium on Haskell, Haskell 2009, pp. 81–92. ACM, New York (2009). http://doi.acm.org/10.1145/1596638.1596649
Lameter, C.: NUMA (Non-Uniform Memory Access): An Overview. Queue 11(7), 40:40–40:51 (2013). http://doi.acm.org/10.1145/2508834.2513149
Lester, D.: An efficient distributed garbage collection algorithm. In: Odijk, E., Rem, M., Syre, J.-C. (eds.) PARLE 1989. LNCS, vol. 365, pp. 207–223. Springer, Heidelberg (1989). http://dx.doi.org/10.1007/3540512845_41
Google Scholar
Marlow, S., Peyton Jones, S.L.: Multicore garbage collection with local heaps. In: Proceedings of the International Symposium on Memory Management, ISMM 2011, pp. 21–32. ACM, New York (2011). http://doi.acm.org/10.1145/1993478.1993482
Marlow, S., Peyton Jones, S.L., Singh, S.: Runtime support for multicore haskell. In: Proceedings of the 14th ACM SIGPLAN International Conference on Functional Programming, ICFP 2009, pp. 65–78. ACM, New York (2009). http://doi.acm.org/10.1145/1596550.1596563
Marlow, S., Peyton Jones, S.L.: The Glasgow Haskell Compiler. In: The Architecture of Open Source Applications, vol. 2. lulu.com (2012). http://www.aosabook.org/en/ghc.html
Marlow, S., Harris, T., James, R.P., Peyton Jones, S.L.: Parallel generational-copying garbage collection with a block-structured heap. In: Proceedings of the 7th International Symposium on Memory Management, ISMM 2008, pp. 11–20. ACM, New York (2008). http://doi.acm.org/10.1145/1375634.1375637
Peyton Jones, S.L.: Parallel Implementations of Functional Programming Languages. Comput. J. 32, 175–186 (1989). http://portal.acm.org/citation.cfm?id=63410.63418
Su, C., Li, D., Nikolopoulos, D.S., Grove, M., Cameron, K., de Supinski, B.R.: Critical Path-based Thread Placement for NUMA Systems. SIGMETRICS Perform. Eval. Rev. 40(2), 106–112 (2012). http://doi.acm.org/10.1145/2381056.2381079
Tan, L., Yufei, R., Dantong, Y., Shudong, J., Robertazzi, T.: Characterization of input/output bandwidth performance models in NUMA architecture for data intensive applications. In: 2013 42nd International Conference on Parallel Processing (ICPP), pp. 369–378, October 2013. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6687370&isnumber=6687321
Terboven, C., Schmidl, D., Cramer, T., an Mey, D.: Assessing OpenMP tasking implementations on NUMA architectures. In: Chapman, B.M., Müller, M.S., Massaioli, F., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 182–195. Springer, Heidelberg (2012). http://link.springer.com/chapter/10.1007%2F978-3-642-30961-8_14
Google Scholar
Trinder, P., Hammond, K., Mattson Jr., J.S., Partridge, A.S., Peyton Jones, S.L.: GUM: a portable parallel implementation of haskell. In: Programming Languages Design and Implementation, PLDI 1996, Philadelphia, PA, USA, pp. 79–88, May 1996. http://dx.doi.org/10.1145/231379.231392
Yang, E.Z.: The GHC Runtime System, July 2013. http://ezyang.com/jfp-ghc-rts-draft.pdf

Download references

Author information

Authors and Affiliations

School of Computing Science, University of Glasgow, Glasgow, G12 8QQ, Scotland, UK
Malak Aljabri & Phil Trinder
School of Mathematical and Computer Sciences, Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS, Scotland, UK
Hans-Wolfgang Loidl

Authors

Malak Aljabri
View author publications
You can also search for this author in PubMed Google Scholar
Hans-Wolfgang Loidl
View author publications
You can also search for this author in PubMed Google Scholar
Phil Trinder
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Malak Aljabri .

Editor information

Editors and Affiliations

Utrecht University, Utrecht, The Netherlands
Jurriaan Hage
Vassar College, Wappingers Falls, New York, USA
Jay McCarthy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aljabri, M., Loidl, HW., Trinder, P. (2015). Balancing Shared and Distributed Heaps on NUMA Architectures. In: Hage, J., McCarthy, J. (eds) Trends in Functional Programming. TFP 2014. Lecture Notes in Computer Science(), vol 8843. Springer, Cham. https://doi.org/10.1007/978-3-319-14675-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-14675-1_1
Published: 27 December 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14674-4
Online ISBN: 978-3-319-14675-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics