International Journal of Parallel Programming

, Volume 46, Issue 6, pp 1304–1328 | Cite as

Hardware Transactional Memory Exploration in Coherence-Free Many-Core Architectures

  • Dimitra Papagiannopoulou
  • Andrea Marongiu
  • Tali Moreshet
  • Luca Benini
  • Maurice Herlihy
  • R. Iris BaharEmail author


High-end embedded systems, like their general-purpose counterparts, are turning to many-core cluster-based shared-memory architectures that provide a shared memory abstraction subject to non-uniform memory access costs. In order to keep the cores and memory hierarchy simple, many-core embedded systems tend to employ simple, scratchpad-like memories, rather than hardware managed caches that require some form of cache coherence management. These “coherence-free” systems still require some means to synchronize memory accesses and guarantee memory consistency. Conventional lock-based approaches may be employed to accomplish the synchronization, but may lead to both usability and performance issues. Instead, speculative synchronization, such as hardware transactional memory, may be a more attractive approach. However, hardware speculative techniques traditionally rely on the underlying cache-coherence protocol to synchronize memory accesses among the cores. The lack of a cache-coherence protocol adds new challenges in the design of hardware speculative support. In this article, we present a new scheme for hardware transactional memory (HTM) support within a cluster-based, many-core embedded system that lacks an underlying cache-coherence protocol. We propose two alternative data versioning implementations for the HTM support, Full-Mirroring and Distributed Logging and we conduct a performance comparison between them. To the best of our knowledge, these are the first designs for speculative synchronization for this type of architecture. Through a set of benchmark experiments using our simulation platform, we show that our designs can achieve significant performance improvements over traditional lock-based schemes.


Transactional memory Embedded systems Parallel processing Coherence-free memory architectures 


  1. 1.
    Adapteva: Epiphany-IV 64-core 28nm microprocessor (E64G401). (2013)
  2. 2. IBM releases “world’s most powerful” 5.5GHz processor., 8 Sept 2012
  3. 3.
    Bortolotti, D., Pinto, C., Marongiu, A., Ruggiero, M., Benini, L.: Virtualsoc: A full-system simulation environment for massively parallel heterogeneous system-on-chip. In: 2013 IEEE International Symposium on Parallel and Distributed Processing, pp. 2182–2187 (2013).
  4. 4.
    Ferri, C., Marongiu, A., Lipton, B., Moreshet, T., Bahar, R.I., Herlihy, M., Benini, L.: SoC-TM: integrated HW/SW support for transactional memory programming on embedded mpsocs. In: CODES, pp. 39–48. Taipei, Taiwan (2011)Google Scholar
  5. 5.
    Ferri, C., Wood, S., Moreshet, T., Bahar, R.I., Herlihy, M.: Embedded-TM: energy and complexity-effective hardware transactional memory for embedded multicore systems. J. Parallel Distrib. Comput. 70(10), 1042–1052 (2010)CrossRefGoogle Scholar
  6. 6.
    Helmstetter, C., Joloboff, V.: SimSoC: a systemC TLM integrated ISS for full system simulation. In: IEEE Asia Pacific Conference, pp. 1759–1762 (2008)Google Scholar
  7. 7.
    Herlihy, M., Moss, J.E.B.: Transactional memory: architectural support for lock-free data structures. In: ISCA, pp. 289–300 (1993).
  8. 8.
    Hong, S., Oguntebi, T., Casper, J., Bronson, N., Kozyrakis, C., Olukotun, K.: Eigenbench: A simple exploration tool for orthogonal tm characteristics. In: Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’10), IISWC ’10, pp. 1–11. IEEE Computer Society, Washington (2010).
  9. 9.
    Hong, S., Oguntebi, T., Casper, J., Bronson, N., Kozyrakis, C., Olukotun, K.: Eigenbench: a simple exploration tool for orthogonal TM characteristics. In: IEEE International Symposium on Workload Characterization (IISWC), 2010, pp. 1–11 (2010).
  10. 10.
    Intel Corporation: Transactional Synchronization in Haswell., 8 Sept 2012
  11. 11.
    Kalray: MPPA 256—Programmable Manycore Processor.
  12. 12.
    Kunz, L., Girão, G., Wagner, F.: Evaluation of a hardware transactional memory model in an NoC-based embedded MPSoC. In: SBCCI, pp. 85–90. São Paulo, Brazil (2010)Google Scholar
  13. 13.
    Melpignano, D., Benini, L., Flamand, E., Jego, B., Lepley, T., Haugou, G., Clermidy, F., Dutoit, D.: Platform 2012, a many-core computing accelerator for embedded SoCs: performance evaluation of visual analytics applications. In: DAC, pp. 1137–1142. ACM (2012)Google Scholar
  14. 14.
    Meunier, Q., Petrot, F.: Lightweight transactional memory systems for NoCs based architectures: design, implementation and comparison of two policies. J. Parallel Distrib. Comput. 70(10), 1024–1041 (2010)CrossRefGoogle Scholar
  15. 15.
    Minh, C.C., Chung, J., Kozyrakis, C., Olukotun, K.: STAMP: Stanford transactional applications for multi-processing. In: International Symposium on Workload Characterization (2008)Google Scholar
  16. 16.
    Moore, K.E., Bobba, J., Moravan, M.J., Hill, M.D., Wood, D.A.: LogTM: log-based transactional memory. In: HPCA, pp. 254–265 (2006)Google Scholar
  17. 17.
    NVIDIA: NVIDIA’s next generation CUDA compute architecture: Fermi. White paper, NVIDIA (2009)Google Scholar
  18. 18.
    Papagiannopoulou, D., Capodanno, G., Moreshet, T., Herlihy, M., Bahar, R.: Energy-efficient and high-performance lock speculation hardware for embedded multicore systems. ACM Trans. Embed. Comput. Syst. (2015). CrossRefGoogle Scholar
  19. 19.
    Papagiannopoulou, D., Marongiu, A., Moreshet, T., Benini, L., Herlihy, M., Bahar, R.: Playing with fire: transactional memory revisited for error-resilient and energy-efficient MPSoC execution. In: GLSVLSI (2015).
  20. 20.
    Papagiannopoulou, D., Moreshet, T., Marongiu, A., Benini, L., Herlihy, M., Bahar, R.: Speculative synchronization for coherence-free embedded NUMA architectures. In: SAMOS, pp. 99–106 (2014).
  21. 21.
    Rajwar, R., Goodman, J.R.: Speculative lock elision: enabling highly concurrent multithreaded execution. In: MICRO, pp. 294–305 (2001).
  22. 22.
    Rajwar, R., Goodman, J.R.: Transactional lock-free execution of lock-based programs. In: ASPLOS, pp. 5–17 (2002).

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Dimitra Papagiannopoulou
    • 1
  • Andrea Marongiu
    • 2
    • 3
  • Tali Moreshet
    • 4
  • Luca Benini
    • 2
    • 3
  • Maurice Herlihy
    • 1
  • R. Iris Bahar
    • 1
    Email author
  1. 1.Brown UniversityProvidenceUSA
  2. 2.ETH ZurichZurichSwitzerland
  3. 3.DEI — University of BolognaBolognaItaly
  4. 4.Boston UniversityBostonUSA

Personalised recommendations