Abstract
As the capacity of caches increases dramatically with new processors, soft errors originating in cache memories has become a major reliability concern for high performance processors. This paper presents application specific soft error vulnerability analysis in order to understand an application’s responses to soft errors from different levels of caches. Based on a high-performance processor simulator called Graphite, we have implemented a fault injection framework that can selectively inject bit flips to different levels of caches. We simulated a wide range of relevant bit error patterns and measured the applications’ vulnerabilities to bit errors. Our experimental results have shown the differing vulnerabilities of applications to bit errors in different levels of caches (e.g. the application failure rate for one program is more than the doulbe of that for another program for a given cache); the results have also indicated the probabilities of different failure behaviors for the given applications.
This work is funded by Intel and by the Institute for the Promotion of Innovation through Science and Technology in Flanders (IWT).
Chapter PDF
Similar content being viewed by others
References
Baumann, R.: Soft errors in advanced computer systems. IEEE Design & Test of Computers 22(3), 258–266 (2005)
Bronevetsky, G., de Supinski, B.R.: Soft error vulnerability of iterative linear algebra methods. In: SELSE (2007)
Carlson, T.E., Heirman, W., Eeckhout, L.: Exploring the level of abstraction for scalable and accurate parallel multicore simulation. In: SC (2011)
Cochran, W.G.: Sampling Techniques, 3rd edn. John Wiley (1977)
da Lu, C., Reed, D.A.: Assessing fault sensitivity in MPI applications. In: SC, p. 37. IEEE Computer Society (2004)
Daveau, J.-M., Blampey, A., Gasiot, G., Bulone, J., Roche, P.: An industrial fault injection platform for soft-error dependability analysis and hardening of complex system-on-a-chip. In: IRPS, pp. 212–220 (2009)
Heidel, D., Marchal, P., et al.: Single-event upsets and multiple-bit upsets on a 45nm SOI SRAM. IEEE Transactions on Nuclear Science 56(6), 3499–3504 (2009)
Kim, J., Hardavellas, N., Mai, K., Falsafi, B., Hoe, J.C.: Multi-bit error tolerant caches using two-dimensional error coding. In: MICRO, pp. 197–209 (2007)
Luk, C.-K., Cohn, R.S., Muth, R., Patil, H., Klauser, A., Geoffrey Lowney, P., Wallace, S., Reddi, V.J., Hazelwood, K.M.: Pin: building customized program analysis tools with dynamic instrumentation. In: PLDI, pp. 190–200 (2005)
Mak, T.M., Mitra, S., Zhang, M.: DFT assisted built-in soft error resilience. In: IOLTS, p. 69 (2005)
Miller, J.E., Kasture, H., Kurian, G., Gruenwald III, C., Beckmann, N., Celio, C., Eastep, J., Agarwal, A.: Graphite: A distributed parallel simulator for multicores. In: HPCA, pp. 1–12 (2010)
Mukherjee, S.S., Weaver, C.T., Emer, J.S., Reinhardt, S.K., Austin, T.M.: A systematic methodology to compute the archi- tectural vulnerability factors for a high-performance microprocessor. In: MICRO, pp. 29–42. ACM/IEEE (2003)
Ramachandran, P., Kudva, P., Kellington, J.W., Schumann, J., Sanda, P.: Statistical fault injection. In: DSN, pp. 122–127. IEEE Computer Society (2008)
Rao, S., Sanda, P., Ackaret, J., Barrera, A., Yanez, J., Mitra, S.: Examing workload dependence of soft error rates. In: SELSE (2008)
Ruckerbauer, F.X., Georgakos, G.: Soft error rates in 65nm SRAMs analysis of new phenomena. In: IOLTS, pp. 203–204 (2007)
Schroeder, B., Gibson, G.A.: A large-scale study of failures in high performance computing systems. In: DSN, pp. 249–258 (2006)
Wang, N.J., Fertig, M., Patel, S.J.: Y-branches: When you come to a fork in the road, take it. In: IEEE PACT, pp. 56–66 (2003)
Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: Characterization and methodological considerations. In: ISCA, pp. 24–36 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ma, Z., Carlson, T., Heirman, W., Eeckhout, L. (2012). Evaluating Application Vulnerability to Soft Errors in Multi-level Cache Hierarchy. In: Alexander, M., et al. Euro-Par 2011: Parallel Processing Workshops. Euro-Par 2011. Lecture Notes in Computer Science, vol 7156. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29740-3_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-29740-3_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29739-7
Online ISBN: 978-3-642-29740-3
eBook Packages: Computer ScienceComputer Science (R0)