Abstract
Assessing Mission-Critical systems is non-trivial, even more when Commercial-Off-The-Shelf (COTS) software tools, which have not been developed following custom reliability rules, are adopted. A satisfactory process of standard certification brings in a model estimation of system reliability. However, its evaluation requires in input reliability data of subsystem units, which are quite difficult to obtain. A practical issue, in fact, concerns a general lack of detailed statistical evaluations coming from on-field experiences. While, on the hardware side, the research community gave an effective contribution, on the software side, there is still work to do. An example is represented by the Cluster Resource Manager (CRM) software running on top of clustered systems, which is responsible of orchestrating fail-over operations. To the best of our knowledge, for such a component there isn’t any reliability evaluation based on field experiences.
In this work, a particular CRM, namely Pacemaker, was tested to estimate the fail-over success probability in the occurrence of different type of resource outages. Pacemaker is one of the most accepted CRM and is used in several Critical Infrastructure (CI) contexts to ensure high availability of their Industrial Control System (ICS). Our experiments have been conducted on a real clustered ICS, the Train Management System (TMS) of Hitachi Ansaldo STS.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Campanile, F., Coppolino, L., D’Antonio, S., Lev, L., Mazzeo, G., Romano, L., Sgaglione, L., Tessitore, F.: Cloudifying critical applications: a use case from the power grid domain. In: 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), pp. 363–370, March 2017
Cerullo, G., Mazzeo, G., Papale, G., Sgaglione, L., Cristaldi, R.: A secure cloud-based scada application: the use case of a water supply network. In: SoMeT (2016)
Connelly, S., Becht, H.: Developing a methodology for the use of cots operating systems with safety-related software. In: Proceedings of the Australian System Safety Conference, ASSC 2011, Darlinghurst, Australia, vol. 133, pp. 27–36. Australian Computer Society, Inc. (2011)
Pierce, R.H.: Preliminary Assessment of Linux for Safety Related Systems. Research Report Series, HSE Books (2002)
Jones, C., Bloomfield, R., Froome, P., Bishope, P.: Methods for assessing the safety integrity of safety-related software of uncertain pedigree (SOUP). HSE Books (2001)
Mazzeo, G., Coppolino, L., D’Antonio, S., Mazzariello, C., Romano, L.: SIL2 assessment of an active/standby cots-based safety-related system. Reliab. Eng. Syst. Saf. 176, 125–134 (2018)
Benz, K., Bohnert, T.: Impact of pacemaker failover configuration on mean time to recovery for small cloud clusters. In: IEEE International Conference on Cloud Computing, CLOUD, July 2014
Cotroneo, D., Simone, L.D., Natella, R.: Dependability certification guidelines for NFVIS through fault injection. In: 2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), Los Alamitos, CA, USA, pp. 321–328. IEEE Computer Society, October 2018
Cotroneo, D., De Simone, L., Iannillo, A.K., Lanzaro, A., Natella, R., Fan, J., Ping, W.: Network function virtualization: challenges and directions for reliability assurance. In: 2014 IEEE International Symposium on Software Reliability Engineering Workshops, pp. 37–42, November 2014
Hsueh, M.-C., Tsai, T.K., Iyer, R.K.: Fault injection techniques and tools. Computer 30(4), 75–82 (1997)
Lyu, M.R.: Handbook of Software Reliability Engineering. Research Report Series, Mcgraw-Hill (1996)
Cotroneo, D., De Simone, L., Iannillo, A.K., Lanzaro, A., Natella, R.: Dependability evaluation and benchmarking of network function virtualization infrastructures. In: Proceedings of the 2015 1st IEEE Conference on Network Softwarization (NetSoft), pp. 1–9, April 2015
Kuo, S.Y., Huang, C.Y., Lyu, M.R.: Framework for modeling software reliability, using various testing-efforts and fault-detection rates. IEEE Trans. Reliab. 50(3), 310–320 (2001)
Coppolino, L., D’Antonio, S., Mazzeo, G., Romano, L., Sgaglione, L.: Exploiting new CPU extensions for secure exchange of ehealth data at the EU level. In: 2018 14th European Dependable Computing Conference (EDCC), pp. 17–24, September 2018
Nelson, W.: Weibull analysis of reliability data with few or no failures. J. Qual. Technol. 173, 140–146 (1985)
Acknowledgments
This project has received funding from the European Union’s Horizon 2020 Framework Programme for Research and Innovation under g.a. No. 833088 (InfraStress).
Authors would like to thank Giovanni Mazzeo for his support in the research activity.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Appierto, D., Giuliano, V. (2020). An Evaluation of Pacemaker Cluster Resource Manager Reliability. In: Barolli, L., Hellinckx, P., Natwichai, J. (eds) Advances on P2P, Parallel, Grid, Cloud and Internet Computing. 3PGCIC 2019. Lecture Notes in Networks and Systems, vol 96. Springer, Cham. https://doi.org/10.1007/978-3-030-33509-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-33509-0_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33508-3
Online ISBN: 978-3-030-33509-0
eBook Packages: EngineeringEngineering (R0)