Redundant Execution on Heterogeneous Multi-cores Utilizing Transactional Memory

  • Rico Amslinger
  • Sebastian Weis
  • Christian Piatka
  • Florian Haas
  • Theo Ungerer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10793)


Cycle-by-cycle lockstep execution as implemented by current embedded processors is unsuitable for energy-efficient heterogeneous multi-cores, because the different cores are not cycle synchronous. Furthermore, current and future safety-critical applications demand fail-operational execution, which requires mechanisms for error recovery.

In this paper, we propose a loosely-coupled redundancy approach which combines an in-order with an out-of-order core and utilizes transactional memory for error recovery. The critical program is run in dual-modular redundancy on the out-of-order and the in-order core. The memory accesses of the out-of-order core are used to prefetch for the in-order core. The transactional memory system’s checkpointing mechanism is leveraged to recover from errors. The resulting system runs up to 2.9 times faster than a lockstep system consisting of two in-order cores and consumes up to 35% less energy at the same performance than a lockstep system consisting of two out-of-order cores.


Fault tolerance Multi-core Heterogeneous system Transactional memory Cache 


  1. 1.
    ARM Ltd.: Cortex-R5 and Cortex-R5F - Technical Reference Manual (2011). Revision r1p1
  2. 2.
    ARM Ltd.: big.LITTLE Technology: The Future of Mobile (2013).
  3. 3.
    Austin, T.M., Sohi, G.S.: Zero-cycle loads: microarchitecture support for reducing load latency. In: Proceedings of the 28th Annual International Symposium on Microarchitecture, pp. 82–92 (1995)Google Scholar
  4. 4.
    Baer, J.L., Chen, T.F.: An effective on-chip preloading scheme to reduce data access penalty. In: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, Supercomputing 1991, pp. 176–186. ACM (1991)Google Scholar
  5. 5.
    Bernick, D., Bruckert, B., Vigna, P., Garcia, D., Jardine, R., Klecka, J., Smullen, J.: NonStop® advanced architecture. In: International Conference on Dependable Systems and Networks (DSN), pp. 12–21 (2005)Google Scholar
  6. 6.
    Binkert, N., Beckmann, B., Black, G., Reinhardt, S.K., Saidi, A., Basu, A., Hestness, J., Hower, D.R., Krishna, T., Sardashti, S., et al.: The gem5 simulator. ACM SIGARCH Comput. Archit. News 39(2), 1–7 (2011)CrossRefGoogle Scholar
  7. 7.
    Butko, A., Bruguier, F., Gamatié, A., Sassatelli, G., Novo, D., Torres, L., Robert, M.: Full-system simulation of big.LITTLE multicore architecture for performance and energy exploration. In: IEEE 10th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip (MCSoC), pp. 201–208. IEEE (2016)Google Scholar
  8. 8.
    Freescale Semiconductor: Safety Manual for Qorivva MPC5643L (2013).
  9. 9.
    Haas, F., Weis, S., Metzlaff, S., Ungerer, T.: Exploiting Intel TSX for fault-tolerant execution in safety-critical systems. In: IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), pp. 197–202 (2014)Google Scholar
  10. 10.
    Haas, F., Weis, S., Ungerer, T., Pokam, G., Wu, Y.: Fault-tolerant execution on COTS multi-core processors with hardware transactional memory support. In: Knoop, J., Karl, W., Schulz, M., Inoue, K., Pionteck, T. (eds.) ARCS 2017. LNCS, vol. 10172, pp. 16–30. Springer, Cham (2017). CrossRefGoogle Scholar
  11. 11.
    Hammarlund, P., Martinez, A.J., Bajwa, A.A., Hill, D.L., Hallnor, E., Jiang, H., Dixon, M., Derr, M., Hunsaker, M., Kumar, R., et al.: Haswell: the fourth-generation Intel core processor. IEEE Micro 34(2), 6–20 (2014)CrossRefGoogle Scholar
  12. 12.
    Infineon Technologies AG: Highly integrated and performance optimized 32-bit microcontrollers for automotive and industrial applications (2017).
  13. 13.
    Klauser, A., Austin, T., Grunwald, D., Calder, B.: Dynamic hammock predication for non-predicated instruction set architectures. In: International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 278–285 (1998)Google Scholar
  14. 14.
    LaFrieda, C., Ipek, E., Martinez, J., Manohar, R.: Utilizing dynamically coupled cores to form a resilient chip multiprocessor. In: 37th International Conference on Dependable Systems and Networks (DSN), pp. 317–326 (2007)Google Scholar
  15. 15.
    Reinhardt, S.K., Mukherjee, S.S.: Transient fault detection via simultaneous multithreading. In: 27th Annual International Symposium on Computer Architecture (ISCA), pp. 25–36. ACM (2000)Google Scholar
  16. 16.
    Rotenberg, E.: AR-SMT: a microarchitectural approach to fault tolerance in microprocessors. In: 29th International Symposium on Fault-Tolerant Computing (FTCS), pp. 84–91 (1999)Google Scholar
  17. 17.
    Sánchez, D., Aragón, J., Garcıa, J.: A log-based redundant architecture for reliable parallel computation. In: International Conference on High Performance Computing (HiPC) (2010)Google Scholar
  18. 18.
    Sundaramoorthy, K., Purser, Z., Rotenberg, E.: Slipstream processors: improving both performance and fault tolerance. ACM SIGPLAN Not. 35(11), 257–268 (2000)CrossRefGoogle Scholar
  19. 19.
    Yalcin, G., Unsal, O., Cristal, A.: FaulTM: error detection and recovery using hardware transactional memory. In: Conference on Design, Automation and Test in Europe (DATE), pp. 220–225 (2013)Google Scholar
  20. 20.
    Yen, L., Bobba, J., Marty, M.R., Moore, K.E., Volos, H., Hill, M.D., Swift, M.M., Wood, D.A.: LogTM-SE: decoupling hardware transactional memory from caches. In: IEEE 13th International Symposium on High Performance Computer Architecture (HPCA), pp. 261–272 (2007)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Rico Amslinger
    • 1
  • Sebastian Weis
    • 1
  • Christian Piatka
    • 1
  • Florian Haas
    • 1
  • Theo Ungerer
    • 1
  1. 1.University of AugsburgAugsburgGermany

Personalised recommendations