Skip to main content

Transactional Memory for Reliability

  • Chapter
  • 1134 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8913))

Abstract

It is foreseen that technology trends will increase the transient and permanent fault rates in future processors. Thus providing reliability for both the applications running on personal computers and running on mission-critical systems is becoming an absolute necessity. A reliable system requires the inclusion of two key capabilities: 1) error detection and 2) error recovery mechanisms. Transactional Memory (TM) provides an ideal base for both error detection and error recovery. First, TM provides mechanisms to abort transactions in case of a conflict, thus they discard or undo all the tentative memory updates and restart the execution from the beginning of the transaction. Thus, a transaction’s start can be viewed as a locally checkpointed stable state which can be used for error recovery. Second, transactional semantics allows the error detection to be deferred until a transaction commits (or the value becomes externally visible), so that the cost of error detection can be reduced compared to traditional error detection schemes (in which error detection is conducted et every instruction [26]) while its efficiency can be increased.

In this chapter, we first explain the hardware faults and aspects of reliability schemes such as error detection and error recovery. Then, we explain the major requirements of reliability schemes and the similarities between these requirements and transactional memory basics. Finally, we present current research landscape for reliability schemes using transactional memory.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adir, A., Goodman, D., Hershcovich, D., Hershkovitz, O., Hickerson, B., Holtz, K., Kadry, W., Koyfman, A., Ludden, J., Meissner, C., Nahir, A., Pratt, R.R., Schiffli, M., Onge, B., Thompto, B., Tsanko, E., Ziv, A.: Verification of transactional memory in power8. In: Proceedings of the The 51st Annual Design Automation Conference on Design Automation Conference, pp. 58:1–58:6 (2014)

    Google Scholar 

  2. Agarwal, R., Garg, P., Torrellas, J.: Rebound: scalable checkpointing for coherent shared memory. In: Proceedings of the 38th Annual International Symposium on Computer Architecture, ISCA 2011, pp. 153–164 (2011)

    Google Scholar 

  3. Franklin, M., et al.: Built-in Self-Testing of Random-Access Memories. IEEE Computer 23(10) (October1990)

    Google Scholar 

  4. Wells., P.M., et al.: Adapting to Intermittent Faults in Multicore Systems. In: Proceedings of the 13th ASPLOS, pp. 255–264 (2008)

    Google Scholar 

  5. Baumann, R.: Soft errors in advanced computer systems. IEEE Design and Test 22, 258–266 (2005)

    Article  Google Scholar 

  6. Bidokhti, N.: SEU Concept to Reality (Allocation, Prediction, Mitigation). In: RAMS (2010)

    Google Scholar 

  7. Bieniusa, A., Fuhrmann, T.: Consistency in hindsight: A fully decentralized stm algorithm, pp. 1–12 (2010)

    Google Scholar 

  8. Bocchino, R.L., Adve, V.S., Chamberlain, B.L.: Software transactional memory for large scale clusters. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 247–258 (2008)

    Google Scholar 

  9. Carvalho, N., Romano, P., Rodrigues, L.: A generic framework for replicated software transactional memories. In: Proceedings of the Tenth IEEE International Symposium on Networking Computing and Applications, pp. 271–274 (2011)

    Google Scholar 

  10. Chen, D.: Local Rollback for Fault-Tolerance in Parallel Computing systems, United States Patent Application, 12/696780 (2011)

    Google Scholar 

  11. Constantinescu, C.: Trends and challenges in vlsi circuit reliability. IEEE Micro 23, 14–19 (2003)

    Article  Google Scholar 

  12. Couceiro, M., Romano, P., Carvalho, N., Rodrigues, L.: D2stm: Dependable distributed software transactional memory. In: Proceedings of the 2009 15th IEEE Pacific Rim International Symposium on Dependable Computing, pp. 307–313 (2009)

    Google Scholar 

  13. Dhoke, A., Ravindran, B., Zhang, B.: On closed nesting and checkpointing in fault-tolerant distributed transactional memory. In: IEEE International Symposium on Parallel and Distributed Processing, pp. 41–52 (2013)

    Google Scholar 

  14. Fetzer, C., Felber, P.: Transactional memory for dependable embedded systems. In: 7th Workshop on Hot Topics in System Dependability (HotDep), pp. 223–227. IEEE (2011)

    Google Scholar 

  15. Gong, R., Dai, K., Wang, Z.: Transient Fault Recovery on Chip Multiprocessor based on Dual Core Redundancy and Context Saving. In: International Conference for Young Computer Scientists, pp. 148–153 (2008)

    Google Scholar 

  16. Hammond, L., Wong, V., Chen, M., Carlstrom, B.D., Davis, J.D., Hertzberg, B., Prabhu, M.K., Wijaya, H., Kozyrakis, C., Olukotun, K.: Transactional memory coherence and consistency. SIGARCH Computer Architecture News 32(2), 102 (2004)

    Article  Google Scholar 

  17. Kotselidis, C., Ansari, M., Jarvis, K., Lujn, M., Kirkham, C., Watson, I.: Distm: A software transactional memory framework for clusters. In: Proceedings of the International Conference on Parallel Processing (ICPP), pp. 51–58 (2008)

    Google Scholar 

  18. Michalak, S.E., Harris, K.W., Hengartner, N.W., Takala, B.E., Wender, S.A.: Predicting the Number of Fatal Soft Errors in Los Alamos National Labratory’s ASC Q Computer. IEEE Transactions on Device and Materials Reliability 5, 329–335 (2005)

    Article  Google Scholar 

  19. Moore, K., Bobba, J., Moravan, M., Hill, M., Wood, D.: LogTM: log-based transactional memory, vol. 12, pp. 254–265. Austin, Texas (2006)

    Google Scholar 

  20. Mukherjee, S.S., Kontz, M., Reinhardt, S.K.: Detailed Design and Evaluation of Redundant Multithreading Alternatives. In: Proceedings of the International Symposium on Computer Architecture, pp. 99–110 (2002)

    Google Scholar 

  21. Mukherjee, S.: Architecture Design for Soft Errors (2008)

    Google Scholar 

  22. Rashid, L., Pattabiraman, K., Gopalakrishnan, S.: Towards understanding the effects of intermittent hardware faults on programs. Dependable Systems and Networks Workshops, 101–106 (2010)

    Google Scholar 

  23. Riegel, T., Felber, P., Fetzer, C.: Composable error recovery with transactional memory. Bulletin of the European Association for Theoretical Computer Science (BEATCS) 99 (2009)

    Google Scholar 

  24. Romano, P., Rodrigues, L., Carvalho, N., Cachopo, J.: Cloud-tm: Harnessing the cloud with distributed transactional memories. SIGOPS Oper. Syst. Rev. 44(2), 1–6 (2010)

    Google Scholar 

  25. Sanchez, D., Cebrian, J.M., Garcia, J.M., Aragon, J.L.: Soft-error mitigation by means of decoupled transactional memory threads. Distributed Computing, 1–16 (2014)

    Google Scholar 

  26. Slegel, T.J.A.: IBM’s S/390 G5 Microprocessor Design. IEEE Micro 19, 12–23 (1999)

    Article  Google Scholar 

  27. Tomić, S., Perfumo, C., Kulkarni, C., Armejach, A., Cristal, A., Unsal, O., Harris, T., Valero, M.: Eazyhtm: eager-lazy hardware transactional memory. In: Micro-42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, New York, NY, USA, pp. 145–155 (2009)

    Google Scholar 

  28. Wamhoff, J.-T., Schwalbe, M., Faqeh, R., Fetzer, C., Felber, P.: Transactional encoding for tolerating transient hardware errors. In: Higashino, T., Katayama, Y., Masuzawa, T., Potop-Butucaru, M., Yamashita, M. (eds.) SSS 2013. LNCS, vol. 8255, pp. 1–16. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  29. Weaver, C., Emer, J., Mukherjee, S.S., Reinhardt, S.K.: Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor. In: Proceedings of the 31st Annual International Symposium on Computer Architecture, pp. 264–275 (2004)

    Google Scholar 

  30. Wood, A., Jardine, R., Bartlett, W.: Data integrity in HP NonStop servers. In: Workshop on SELSE (2006)

    Google Scholar 

  31. Yalcin, G., Unsal, O., Cristal, A.: FaulTM: Fault-Tolerance Using Hardware Transactional Memory. In: Design, Automation and Test in Europe DATE (2012)

    Google Scholar 

  32. Yalcin, G., Unsal, O., Cristal, A.: Fault Tolerance for Multi-Threaded Applications by Leveraging Hardware Transactional Memory. In: International Conference on Computing Frontiers (2013)

    Google Scholar 

  33. Yalcin, G., Unsal, O., Cristal, A., Hur, I., Valero, M.: FaulTM: Fault-Tolerance Using Hardware Transactional Memory. In: Workshop on Parallel Execution of Sequential Programs on Multi-Core Architecture PESPMA (2010)

    Google Scholar 

  34. Yalcin, G., Unsal, O.S., Cristal, A., Hur, I., Valero, M.: SymptomTM: Symptom-Based Error Detection and Recovery Using Hardware Transactional Memory. In: International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 199–200. IEEE (2011)

    Google Scholar 

  35. Yoo, R.M., Hughes, C.J., Lai, K., Rajwar, R.: Performance evaluation of intel transactional synchronization extensions for high-performance computing. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 19:1-19:11 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Yalcin, G., Unsal, O. (2015). Transactional Memory for Reliability. In: Guerraoui, R., Romano, P. (eds) Transactional Memory. Foundations, Algorithms, Tools, and Applications. Lecture Notes in Computer Science, vol 8913. Springer, Cham. https://doi.org/10.1007/978-3-319-14720-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14720-8_13

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14719-2

  • Online ISBN: 978-3-319-14720-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics