Abstract
Balancing costs and quality of offered IT service is a challenging task for data center providers. In the case of availability, fault tolerance can be applied by introducing redundancy mechanisms into the service design. Redundancy allocation problems can be defined as combinatorial optimization problems to identify cost-effective redundancy configurations in which availability objectives are met. However, these approaches should be flexible to trade-off effort and benefit in a specific scenario. Therefore, a redundancy allocation problem is proposed in this chapter that is capable of modeling the specific characteristics of the IT system to be analyzed. In order to identify suitable design configurations, a generic Petri net simulation model is combined with a genetic algorithm. By defining the solution algorithm adaptively to the complexity of the considered problem definition, users are able to reduce modeling as well as computational effort. The suitability of the approach is demonstrated in the use-case of an international application service provider.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdelkader, R., et al.: Search Algorithms for Engineering Optimization, pp. 241–258. InTech, Rijeka, Croatia (2013)
Anon: Military Standard: Reliability Modeling and Prediction (MIL-STD-756B), U.S. Department of Defense, Washington D.C., USA (1981)
Ardakan, M.A., Hamadani, A.Z.: Reliability–redundancy allocation problem with cold-standby redundancy strategy. Simul. Model. Pract. Theory. 42, 107–118 (2014)
Barroso, L.A., Clidaras, J., Hölzle, U.: In: Hill, M.D. (ed.) The Datacenter as a Computer, 2nd edn. Morgan & Claypool Publishers, San Rafael (2013)
Bondavalli, A., et al.: Threshold-based mechanisms to discriminate transient from intermittent faults. IEEE Trans. Comput. 49(3), 230–245 (2000)
Bosse, S., Splieth, M., Turowski, K.: Multi-objective optimization of IT service availability and costs. Reliab. Eng. Syst. Saf. 147, 142–155 (2016)
Callou, G., et al.: A petri net-based approach to the quantification of data center dependability. In: Pawlewski, P. (ed.) Petri Nets - Manufacturing and Computer Science, pp. 313–336. InTech, Rijeka (2012)
Cao, D., Murat, A., Chinnam, R.B.: Efficient exact optimization of multi-objective redundancy allocation problems in series-parallel systems. Reliab. Eng. Syst. Saf. 111, 154–163 (2013)
Caserta, M., Voß, S.: An exact algorithm for the reliability redundancy allocation problem. Eur. J. Oper. Res. 244, 110–116 (2015)
Chambari, A., et al.: A bi-objective model to optimize reliability and cost of system with a choice of redundancy strategies. Comput. Ind. Eng. 63, 109–119 (2012)
Chellappan, C., Vijayalakshmi, G.: Dependability modeling and analysis of hybrid redundancy systems. Int. J. Qual. Reliab. Manag. 26, 76–96 (2009)
Chen, T.-C.: IAs based approach for reliability redundancy allocation problems. Appl. Math. Comput. 182, 1556–1567 (2006)
Chen, T.-C., You, P.-S.: Immune algorithms-based approach for redundant reliability problems with multiple component choices. Comput. Ind. 56, 195–205 (2005)
Chern, M.-S.: On the computational complexity of reliability redundancy allocation in a series system. Oper. Res. Lett. 11, 309–315 (1992)
Chi, D.-H., Kuo, W.: Optimal design for software reliability and development cost. IEEE J. Sel. Areas Commun. 8(2), 276–282 (1990)
Ciardo, G., Muppala, J.K., Trivedi, K.S.: SPNP: stochastic petri net package. In: Proceedings of the 3rd International Workshop PNPM, pp. 142–151. IEEE Computer Society (1989)
Coit, D.W., Konak, A.: Multiple weighted objectives heuristic for the redundancy allocation problem. IEEE Trans. Reliab. 55, 551–558 (2006)
Coit, D.W., Smith, A.E.: Reliability optimization of series-parallel systems using a genetic algorithm. IEEE Trans. Reliab. 45, 254–266 (1996)
Deb, K. et al.: A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. In: Proceedings of the 6th International Conference on Parallel Problem Solving from Nature. Lecture Notes in Computer Science. Springer, Berlin/Heidelberg (2000)
Emeakaroha, V.C., et al.: Towards autonomic detection of SLA violations in cloud infrastructures. Futur. Gener. Comput. Syst. 28(7), 1017–1029 (2012)
Fan, X., Weber, W.-D., Barroso, L.A.: Power provisioning for a warehouse-sized computer. In: Proceedings of the 34th International Symposium on Computer Architecture. San Diego, CA, USA, pp. 13–23 (2007)
Fonseca, C.M., Fleming, P.J.: An overview of evolutionary algorithms in multiobjective optimization. Evol. Comput. 3(1), 1–16 (1995)
Franke, U.: Optimal IT service availability: shorter outages, or fewer? IEEE Trans. Netw. Serv. Manag. 9, 22–33 (2012)
Franke, U., Johnson, P., König, J.: An architecture framework for enterprise IT service availability analysis. Softw. Syst. Model. 13, 1417–1445 (2014)
Garg, H., Sharma, S.P.: Multi-objective reliability-redundancy allocation problem using particle swarm optimization. Comput. Ind. Eng. 64, 247–255 (2013)
Hoffmann, G.A., Salfner, F., Malek, M.: Advanced Failure Prediction in Complex Software Systems. Informatik-Bericht 172 der Humboldt-Universität zu Berlin (2004)
Hunnebeck, L.: ITIL Service Design 2011 Edition. The Stationery Office, Norwich (2011)
Immonen, A., Niemelä, E.: Survey of reliability and availability prediction methods from the viewpoint of software architecture. Softw. Syst. Model. 7, 49–65 (2008)
Jewell, D.: Performance modeling and engineering. In: Liu, Z., Xia, C.H. (eds.) pp. 29–55. Springer, Boston (2008)
Jiansheng, G., et al.: Uncertain multiobjective redundancy allocation problem of repairable systems based on artificial bee colony algorithm. Chin. J. Aeronaut. 27(6), 1477–1487 (2014)
Kettelle, J.D.J.: Least-cost allocations of reliability investment. Oper. Res. 10(2), 249–265 (1962)
Krcmar, H.: Informationsmanagement, 6th edn. Springer, Berlin (2015)
Kulturel-Konak, S., Smith, A.E., Coit, D.W.: Efficiently solving the redundancy allocation problem using tabu search. IIE Trans. 35, 515–526 (2003)
Kulturel-Konak, S., Smith, A.E., Normal, B.A.: Multi-objective tabu search using a multinomial probability mass function. Eur. J. Oper. Res. 169, 918–931 (2006)
Kwakernaak, H.: Fuzzy random variables-I. Definitions and theorems. Inf. Sci. 15(1), 1–29 (1978)
Laprie, J.-C.: Dependable computing: concepts, limits, challenges. In: 25th IEEE International Symposium on Fault-Tolerant Computing. Pasadena, CA, USA, pp. 42–54 (1995)
Lee, P.A., Anderson, T.: Fault Tolerance: Principles and Practice, 2nd edn. Springer-Verlag, Wien (1990)
Lewis, L.: Service level management definition, architecture and research challenges. In: IEEE Global Telecommunications Conference, pp. 1974–1978 (1999)
Liang, Y.-C., Smith, A.E.: An ant colony optimization algorithm for the redundancy allocation problem (RAP). IEEE Trans. Reliab. 53, 417–423 (2004)
Lins, I.D., Droguett, E.L.: Multiobjective optimization of availability and cost in repairable systems design via genetic algorithms and discrete event simulation. Pesqui. Oper. 29, 43–66 (2009)
Littlewood, B.: Comments on “Reliability and performance analysis for fault-tolerant programs consisting of versions with different characteristics” by Gregory Levitin. Reliab. Eng. Syst. Saf. 91, 119–120 (2006)
Milanovic, N., Milic, B.: Automatic generation of service availability models. IEEE Trans. Serv. Comput. 4(1), 56–69 (2011)
Onishi, J., et al.: Solving the redundancy allocation problem with a mix of components using the improved surrogate constraint method. IEEE Trans. Reliab. 56(1), 94–101 (2007)
Oppenheimer, D., Ganapathi, A., Patterson, D.A.: Why do internet services fail, and what can be done about it? In: 4th Usenix Symposium on Internet Technologies and Systems (USITS) (2003)
Orgerie, A.-C., De Assuncao, M.D., Lefevre, L.: A survey on techniques for improving the energy efficiency of large scale distributed systems. ACM Comput. Surv. 46(4), 1–35 (2014)
Ouzineb, M., Nourelfath, M., Gendreau, M.: Tabu search for the redundancy allocation problem of homogenous series–parallel multi-state systems. Reliab. Eng. Syst. Saf. 93, 1257–1272 (2008)
Painton, L., Campbell, J.: Genetic algorithms in optimization of system reliability. IEEE Trans. Reliab. 44, 172–178 (1995)
Pinheiro, E., Weber, W.-D., Barroso, L.A.: Failure trends in a large disk drive population. In: Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST) (2007)
Ramirez-Marquez, J.E., Coit, D.W.: A heuristic for solving the redundancy allocation problem for multi-state series-parallel systems. Reliab. Eng. Syst. Saf. 83, 341–349 (2004)
Ravi, V., Murty, B.S.N., Reddy, P.J.: Nonequilibrium simulated annealing-algorithm applied to reliability optimization of complex system. IEEE Trans. Reliab. 46, 233–239 (1997)
Sachdeva, A., Kumar, D., Kumar, P.: Reliability analysis of pulping system using Petri nets. Int. J. Qual. Reliab. Manag. 25, 860–877 (2008)
Sadjadi, S.J., Soltani, R.: Minimum–maximum regret redundancy allocation with the choice of redundancy strategy and multiple choice of component type under uncertainty. Comput. Ind. Eng. 79, 204–213 (2015)
Sahoo, L., Bhunia, A.K., Roy, D.: A genetic algorithm based reliability redundancy optimization for interval valued reliabilities of components. J. Appl. Quant. Methods. 5, 270–287 (2010)
Schroeder, B., Pinheiro, E., Weber, W.-D.: DRAM errors in the wild: a large-scale field study. Commun. ACM. 54, 100–107 (2011)
Shooman, M.L.: Reliability of Computer Systems and Networks – Fault Tolerance, Analysis, and Design. Wiley, New York (2002)
Silic, M., et al.: Scalable and accurate prediction of availability of atomic web services. IEEE Trans. Serv. Comput. 7(2), 252–264 (2014)
Soltani, R.: Reliability optimization of binary state non-repairable systems: a state of the art survey. Int. J. Ind. Eng. Comput. 5, 339–364 (2014)
Sooktip, T., et al.: Multi-objective optimization for k-out-of-n redundancy allocation problem. In: International Conference on Quality, Reliability, Risk, Maintenance, and Safety Engineering (ICQR2MSE), pp. 1050–1054. IEEE, Chengdu (2012)
Taguchi, T., Yokota, T.: Optimal design problem of system reliability with interval coefficient using improved genetic algorithms. Comput. Ind. Eng. 37, 145–149 (1999)
Terlit, D., Krcmar, H.: Generic performance prediction for ERP and SOA applications. In: Proceedings of the 18th European Conference on Information Systems (ECIS) (2011)
Tian, Z., Levitin, G., Zuo, M.J.: A joint reliability–redundancy optimization approach for multi-state series–parallel systems. Reliab. Eng. Syst. Saf. 94, 1568–1576 (2009)
Trivedi, K. et al.: Achieving and assuring high availability. In: Nanya, T., et al. (eds.) 5th International Service Availability Symposium (ISAS). Lecture Notes in Computer Science, pp. 20–25. Springer Verlag, Tokyo/Berlin/Heidelberg (2008)
Wang, S., Watada, J.: Modelling redundancy allocation for a fuzzy random parallel-series system. J. Comput. Appl. Math. 232, 539–557 (2009)
Zhao, R., Liu, B.: Redundancy optimization problems with uncertainty of combining randomness and fuzziness. Eur. J. Oper. Res. 157, 716–735 (2004)
Ziaee, M.: Optimal redundancy allocation in hierarchical series–parallel systems using mixed integer programming. Appl. Math. 4, 79–83 (2013)
Zille, V., et al.: Simulation of maintained multicomponent systems for dependability assessment. In: Faulin, P., et al. (eds.) Simulation Methods for Reliability and Availability of Complex Systems, pp. 253–272. Springer, Berlin/Heidelberg (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Bosse, S., Turowski, K. (2017). Optimization of Data Center Fault Tolerance Design. In: Marx Gómez, J., Mora, M., Raisinghani, M., Nebel, W., O'Connor, R. (eds) Engineering and Management of Data Centers. Service Science: Research and Innovations in the Service Economy. Springer, Cham. https://doi.org/10.1007/978-3-319-65082-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-65082-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65081-4
Online ISBN: 978-3-319-65082-1
eBook Packages: Computer ScienceComputer Science (R0)