Advertisement

Extending hardware transactional memory capacity via rollback-only transactions and suspend/resume

POWER8 TM
  • Shady Issa
  • Pascal FelberEmail author
  • Alexander Matveev
  • Paolo Romano
Article

Abstract

Transactional memory (TM) aims at simplifying concurrent programming via the familiar abstraction of atomic transactions. Recently, Intel and IBM have integrated hardware based TM (HTM) implementations in commodity processors, paving the way for the mainstream adoption of the TM paradigm. Yet, existing HTM implementations suffer from a crucial limitation, which hampers the adoption of HTM as a general technique for regulating concurrent access to shared memory: the inability to execute transactions whose working sets exceed the capacity of CPU caches. In this article we propose P8TM, a novel approach that mitigates this limitation on IBM’s POWER8 architecture by leveraging a key combination of hardware and software techniques to support different execution paths. P8TM also relies on self-tuning mechanisms aimed at dynamically switching between different execution modes to best adapt to the workload characteristics. In-depth evaluation with several benchmarks indicates that P8TM can achieve striking performance gains in workloads that stress the capacity limitations of HTM, while achieving performance on par with HTM even in unfavourable workloads.

Keywords

Hardware transactional memory Parallel programming Concurrency Self-tuning 

Notes

Acknowledgements

This work was supported by Portuguese funds through Fundação para a Ciência e Tecnologia via projects UID/CEC/50021/2019 and PTDC/EEISCR/1743/2014.

References

  1. 1.
    Afek, Y., Levy, A., Morrison, A.: Programming with hardware lock elision. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’13, pp. 295–296. ACM, New York (2013).  https://doi.org/10.1145/2442516.2442552
  2. 2.
    Arbel, M., Attiya, H.: Concurrent updates with RCU: Search tree as an example. In: Proceedings of the 2014 ACM Symposium on Principles of Distributed Computing, PODC ’14, pp. 196–205. ACM, New York (2014).  https://doi.org/10.1145/2611462.2611471
  3. 3.
    Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3, 397–422 (2003)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Berry, D., Fristedt, B.: Bandit Problems. Chapman and Hall, London (1985)CrossRefGoogle Scholar
  5. 5.
    Boehm, H., Gottschlich, J., Luchangco, V., Michael, M., Moir, M., Nelson, C., Riegel, T., Shpeisman, T., Wong, M.: Transactional language constructs for C++. ISO/IEC JTC1/SC22 WG21 (C++) (2012)Google Scholar
  6. 6.
    Cain, H.W., Michael, M.M., Frey, B., May, C., Williams, D., Le, H.: Robust architectural support for transactional memory in the power architecture. In: Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA ’13, pp. 225–236. ACM, New York (2013).  https://doi.org/10.1145/2485922.2485942
  7. 7.
    Calciu, I., Shpeisman, T., Pokam, G., Herlihy, M.: Improved single global lock fallback for best-effort hardware transactional memory. In: 9th ACM SIGPLAN Wkshp. on Transactional Computing (2014)Google Scholar
  8. 8.
    Clements, A.T., Kaashoek, M.F., Zeldovich, N.: Scalable address spaces using RCU balanced trees. In: Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pp. 199–210. ACM, New York (2012).  https://doi.org/10.1145/2150976.2150998
  9. 9.
    Dalessandro, L., Carouge, F., White, S., Lev, Y., Moir, M., Scott, M.L., Spear, M.F.: Hybrid NOrec: A case study in the effectiveness of best effort hardware transactional memory. In: Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, pp. 39–52. ACM, New York (2011).  https://doi.org/10.1145/1950365.1950373
  10. 10.
    Damron, P., Fedorova, A., Lev, Y., Luchangco, V., Moir, M., Nussbaum, D.: Hybrid transactional memory. In: Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XII, pp. 336–346. ACM, New York (2006).  https://doi.org/10.1145/1168857.1168900
  11. 11.
    Di Sanzo, P., Ciciani, B., Palmieri, R., Quaglia, F., Romano, P.: On the analytical modeling of concurrency control algorithms for software transactional memories: the case of commit-time-locking. Perform. Eval. 69(5), 187–205 (2012).  https://doi.org/10.1016/j.peva.2011.05.002 CrossRefGoogle Scholar
  12. 12.
    di Sanzo, P., Quaglia, F., Palmieri, R.: Analytical modelling of commit-time-locking algorithms for software transactional memories. In: Int. CMG Conference (2010)Google Scholar
  13. 13.
    Dice, D., Harris, T.L., Kogan, A., Lev, Y., Moir, M.: Hardware extensions to make lazy subscription safe. CoRR arXiv:1407.6968 (2014)
  14. 14.
    Dice, D., Lev, Y., Moir, M., Nussbaum, D.: Early experience with a commercial hardware transactional memory implementation. In: Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XIV, pp. 157–168. ACM, New York (2009).  https://doi.org/10.1145/1508244.1508263
  15. 15.
    Dice, D., Shavit, N.: Understanding tradeoffs in software transactional memory. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO ’07, pp. 21–33. IEEE Computer Society, Washington, DC (2007).  https://doi.org/10.1109/CGO.2007.38
  16. 16.
    Diegues, N., Romano, P.: Self-tuning intel transactional synchronization extensions. In: 11th International Conference on Autonomic Computing (ICAC 14), pp. 209–219. USENIX Association, Philadelphia (2014). https://www.usenix.org/conference/icac14/technical-sessions/presentation/diegues
  17. 17.
    Diegues, N., Romano, P., Garbatov, S.: Seer: probabilistic scheduling for hardware transactional memory. In: Proceedings of the 27th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’15, pp. 224–233. ACM, New York (2015).  https://doi.org/10.1145/2755573.2755578. http://doi.acm.org.focus.lib.kth.se/10.1145/2755573.2755578
  18. 18.
    Diegues, N., Romano, P., Rodrigues, L.: Virtues and limitations of commodity hardware transactional memory. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, PACT ’14, pp. 3–14. ACM, New York (2014).  https://doi.org/10.1145/2628071.2628080
  19. 19.
    Evan Jones: tpccbench. https://github.com/evanj/tpccbench (2007). Accessed 23 Sept 2019
  20. 20.
    Felber, P., Issa, S., Matveev, A., Romano, P.: Hardware read-write lock elision. In: Proceedings of the Eleventh European Conference on Computer Systems, EuroSys ’16, pp. 34:1–34:15. ACM, New York (2016).  https://doi.org/10.1145/2901318.2901346
  21. 21.
    Garivier, A., Moulines, E.: On upper-confidence bound policies for switching bandit problems. In: Proceedings of the 22nd International Conference on Algorithmic Learning Theory, ALT’11, pp. 174–188. Springer, Berlin (2011). http://dl.acm.org/citation.cfm?id=2050345.2050365 Google Scholar
  22. 22.
    Goel, B., Titos-Gil, R., Negi, A., McKee, S.A., Stenstrom, P.: Performance and energy analysis of the restricted transactional memory implementation on haswell. In: 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp. 615–624 (2014).  https://doi.org/10.1109/IPDPS.2014.70
  23. 23.
    Guerraoui, R., Kapalka, M.: On the correctness of transactional memory. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’08, pp. 175–184. ACM, New York (2008).  https://doi.org/10.1145/1345206.1345233
  24. 24.
    Harris, T.L.: A pragmatic implementation of non-blocking linked-lists. In: Proceedings of the 15th International Conference on Distributed Computing, DISC ’01, pp. 300–314. Springer, London (2001). http://dl.acm.org/citation.cfm?id=645958.676105 Google Scholar
  25. 25.
    Hart, T.E., McKenney, P.E., Brown, A.D., Walpole, J.: Performance of memory reclamation for lockless synchronization. J. Parallel Distrib. Comput. 67(12), 1270–1285 (2007).  https://doi.org/10.1016/j.jpdc.2007.04.010 CrossRefzbMATHGoogle Scholar
  26. 26.
    Issa, S.: P8TM source code and benchmarks. https://github.com/shadyalaa/POWER8TM (2017). Accessed 23 Sept 2019
  27. 27.
    Jacobi, C., Slegel, T., Greiner, D.: Transactional memory architecture and implementation for IBM System Z. In: Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-45, pp. 25–36. IEEE Computer Society, Washington, DC (2012).  https://doi.org/10.1109/MICRO.2012.12
  28. 28.
    Java docs: ReentrantReadWriteLock. https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReentrantReadWriteLock.html (2018). Accessed 23 Sept 2019
  29. 29.
    Kumar, S., Chu, M., Hughes, C.J., Kundu, P., Nguyen, A.: Hybrid transactional memory. In: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’06, pp. 209–220. ACM, New York (2006).  https://doi.org/10.1145/1122971.1123003
  30. 30.
    Lai, T., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6, 4–22 (1985)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Le, H.Q., Guthrie, G.L., Williams, D.E., Michael, M.M., Frey, B.G., Starke, W.J., May, C., Odaira, R., Nakaike, T.: Transactional memory support in the IBM POWER8 processor. IBM J. Res. Dev. 59(1), 8:1–8:14 (2015).  https://doi.org/10.1147/JRD.2014.2380199 CrossRefGoogle Scholar
  32. 32.
    Marathe, V.J., Scherer, W.N., Scott, M.L.: Adaptive software transactional memory. In: Proceedings of the 19th International Conference on Distributed Computing, DISC’05, pp. 354–368. Springer, Berlin (2005).  https://doi.org/10.1007/11561927_26 Google Scholar
  33. 33.
    Matveev, A., Shavit, N.: Reduced Hardware NOrec: a safe and scalable hybrid transactional memory. In: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’15, pp. 59–71. ACM, New York (2015).  https://doi.org/10.1145/2694344.2694393
  34. 34.
    Mckenney, P.E., Slingwine, J.D.: Read-copy update: using execution history to solve concurrency problems. In: Parallel and Distributed Computing and Systems, pp. 509–518. Las Vegas, NV (1998)Google Scholar
  35. 35.
    McKenney, P.E., Walpole, J.: What is RCU, fundamentally? https://lwn.net/Articles/262464/ (2007). Accessed 23 Sept 2019
  36. 36.
    Minh, C.C., Chung, J., Kozyrakis, C., Olukotun, K.: Stamp: Stanford transactional applications for multi-processing. In: IEEE International Symposium on Workload Characterization, 2008. IISWC 2008, pp. 35–46. IEEE (2008)Google Scholar
  37. 37.
    Nakaike, T., Odaira, R., Gaudet, M., Michael, M.M., Tomari, H.: Quantitative comparison of hardware transactional memory for Blue Gene/Q, zEnterprise EC12, Intel Core, and POWER8. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, ISCA ’15, pp. 144–157. ACM, New York (2015).  https://doi.org/10.1145/2749469.2750403
  38. 38.
    Nussbaum, D., Lev, Y., Moir, M.: PhTM: Phased transactional memory. In: The Second ACM SIGPLAN Workshop on Transactional Computing, TRANSACT ’07. ACM, New York (2007). https://urresearch.rochester.edu/institutionalPublicationPublicView.action?institutionalItemId=4058
  39. 39.
    Ortner, R., Ryabko, D., Auer, P., Munos, R.: Regret bounds for restless markov bandits. In: Proceedings of the 23rd International Conference on Algorithmic Learning Theory, ALT’12, pp. 214–228. Springer, Berlin (2012).  https://doi.org/10.1007/978-3-642-34106-9_19 Google Scholar
  40. 40.
    Raminhas, P., Issa, T., Romano, P.: Enhancing efficiency of hybrid transactional memory via dynamic data partitioning schemes. In: Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid ’18, pp. 1–11 (2018).  https://doi.org/10.1109/CCGRID.2018.94
  41. 41.
    Rossbach, C.J., Hofmann, O.S., Witchel, E.: Is transactional programming actually easier? In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’10, pp. 47–56. ACM, New York (2010).  https://doi.org/10.1145/1693453.1693462
  42. 42.
    Stonebraker, M., Madden, S., Abadi, D.J., Harizopoulos, S., Hachem, N., Helland, P.: The end of an architectural era: (it’s time for a complete rewrite). In: Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB ’07, pp. 1150–1160. VLDB Endowment (2007). http://dl.acm.org/citation.cfm?id=1325851.1325981
  43. 43.
    TPC Council: TPC-C Benchmark. http://www.tpc.org/tpcc (2011). Accessed 23 Sept 2019
  44. 44.
    Wang, A., Gaudet, M., Wu, P., Amaral, J.N., Ohmacht, M., Barton, C., Silvera, R., Michael, M.: Evaluation of Blue Gene/Q hardware support for transactional memories. In: Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, PACT ’12, pp. 127–136. ACM, New York (2012).  https://doi.org/10.1145/2370816.2370836
  45. 45.
    Wang, Q., Kulkarni, S., Cavazos, J., Spear, M.: A transactional memory with automatic performance tuning. ACM Trans. Archit. Code Optim. 8(4), 54:1–54:23 (2012).  https://doi.org/10.1145/2086696.2086733 CrossRefGoogle Scholar
  46. 46.
    Yoo, R.M., Hughes, C.J., Lai, K., Rajwar, R.: Performance evaluation of Intel® transactional synchronization extensions for high-performance computing. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’13, pp. 19:1–19:11. ACM, New York (2013).  https://doi.org/10.1145/2503210.2503232
  47. 47.
    Yu, J.Y., Mannor, S.: Piecewise-stationary bandit problems with side observations. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, pp. 1177–1184. ACM, New York (2009).  https://doi.org/10.1145/1553374.1553524

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Shady Issa
    • 1
  • Pascal Felber
    • 2
    Email author
  • Alexander Matveev
    • 3
  • Paolo Romano
    • 1
  1. 1.INESC-ID/Instituto Superior TécnicoUniversity of LisbonLisbonPortugal
  2. 2.University of NeuchâtelNeuchâtelSwitzerland
  3. 3.MITCambridgeUSA

Personalised recommendations