Skip to main content

An Evaluation of Pacemaker Cluster Resource Manager Reliability

  • Conference paper
  • First Online:
Advances on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC 2019)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 96))

Abstract

Assessing Mission-Critical systems is non-trivial, even more when Commercial-Off-The-Shelf (COTS) software tools, which have not been developed following custom reliability rules, are adopted. A satisfactory process of standard certification brings in a model estimation of system reliability. However, its evaluation requires in input reliability data of subsystem units, which are quite difficult to obtain. A practical issue, in fact, concerns a general lack of detailed statistical evaluations coming from on-field experiences. While, on the hardware side, the research community gave an effective contribution, on the software side, there is still work to do. An example is represented by the Cluster Resource Manager (CRM) software running on top of clustered systems, which is responsible of orchestrating fail-over operations. To the best of our knowledge, for such a component there isn’t any reliability evaluation based on field experiences.

In this work, a particular CRM, namely Pacemaker, was tested to estimate the fail-over success probability in the occurrence of different type of resource outages. Pacemaker is one of the most accepted CRM and is used in several Critical Infrastructure (CI) contexts to ensure high availability of their Industrial Control System (ICS). Our experiments have been conducted on a real clustered ICS, the Train Management System (TMS) of Hitachi Ansaldo STS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://clusterlabs.org/wiki/Pacemaker.

  2. 2.

    http://corosync.github.io/corosync/.

  3. 3.

    https://blitiri.com.ar/p/libfiu/.

References

  1. Campanile, F., Coppolino, L., D’Antonio, S., Lev, L., Mazzeo, G., Romano, L., Sgaglione, L., Tessitore, F.: Cloudifying critical applications: a use case from the power grid domain. In: 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), pp. 363–370, March 2017

    Google Scholar 

  2. Cerullo, G., Mazzeo, G., Papale, G., Sgaglione, L., Cristaldi, R.: A secure cloud-based scada application: the use case of a water supply network. In: SoMeT (2016)

    Google Scholar 

  3. Connelly, S., Becht, H.: Developing a methodology for the use of cots operating systems with safety-related software. In: Proceedings of the Australian System Safety Conference, ASSC 2011, Darlinghurst, Australia, vol. 133, pp. 27–36. Australian Computer Society, Inc. (2011)

    Google Scholar 

  4. Pierce, R.H.: Preliminary Assessment of Linux for Safety Related Systems. Research Report Series, HSE Books (2002)

    Google Scholar 

  5. Jones, C., Bloomfield, R., Froome, P., Bishope, P.: Methods for assessing the safety integrity of safety-related software of uncertain pedigree (SOUP). HSE Books (2001)

    Google Scholar 

  6. Mazzeo, G., Coppolino, L., D’Antonio, S., Mazzariello, C., Romano, L.: SIL2 assessment of an active/standby cots-based safety-related system. Reliab. Eng. Syst. Saf. 176, 125–134 (2018)

    Article  Google Scholar 

  7. Benz, K., Bohnert, T.: Impact of pacemaker failover configuration on mean time to recovery for small cloud clusters. In: IEEE International Conference on Cloud Computing, CLOUD, July 2014

    Google Scholar 

  8. Cotroneo, D., Simone, L.D., Natella, R.: Dependability certification guidelines for NFVIS through fault injection. In: 2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), Los Alamitos, CA, USA, pp. 321–328. IEEE Computer Society, October 2018

    Google Scholar 

  9. Cotroneo, D., De Simone, L., Iannillo, A.K., Lanzaro, A., Natella, R., Fan, J., Ping, W.: Network function virtualization: challenges and directions for reliability assurance. In: 2014 IEEE International Symposium on Software Reliability Engineering Workshops, pp. 37–42, November 2014

    Google Scholar 

  10. Hsueh, M.-C., Tsai, T.K., Iyer, R.K.: Fault injection techniques and tools. Computer 30(4), 75–82 (1997)

    Article  Google Scholar 

  11. Lyu, M.R.: Handbook of Software Reliability Engineering. Research Report Series, Mcgraw-Hill (1996)

    Google Scholar 

  12. Cotroneo, D., De Simone, L., Iannillo, A.K., Lanzaro, A., Natella, R.: Dependability evaluation and benchmarking of network function virtualization infrastructures. In: Proceedings of the 2015 1st IEEE Conference on Network Softwarization (NetSoft), pp. 1–9, April 2015

    Google Scholar 

  13. Kuo, S.Y., Huang, C.Y., Lyu, M.R.: Framework for modeling software reliability, using various testing-efforts and fault-detection rates. IEEE Trans. Reliab. 50(3), 310–320 (2001)

    Article  Google Scholar 

  14. Coppolino, L., D’Antonio, S., Mazzeo, G., Romano, L., Sgaglione, L.: Exploiting new CPU extensions for secure exchange of ehealth data at the EU level. In: 2018 14th European Dependable Computing Conference (EDCC), pp. 17–24, September 2018

    Google Scholar 

  15. Nelson, W.: Weibull analysis of reliability data with few or no failures. J. Qual. Technol. 173, 140–146 (1985)

    Article  Google Scholar 

Download references

Acknowledgments

This project has received funding from the European Union’s Horizon 2020 Framework Programme for Research and Innovation under g.a. No. 833088 (InfraStress).

Authors would like to thank Giovanni Mazzeo for his support in the research activity.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vincenzo Giuliano .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Appierto, D., Giuliano, V. (2020). An Evaluation of Pacemaker Cluster Resource Manager Reliability. In: Barolli, L., Hellinckx, P., Natwichai, J. (eds) Advances on P2P, Parallel, Grid, Cloud and Internet Computing. 3PGCIC 2019. Lecture Notes in Networks and Systems, vol 96. Springer, Cham. https://doi.org/10.1007/978-3-030-33509-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-33509-0_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-33508-3

  • Online ISBN: 978-3-030-33509-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics