Abstract
Business success of companies heavily depends on the availability and performance of their client applications. Due to modern development paradigms such as DevOps and microservice architectural styles, applications are decoupled into services with complex interactions and dependencies. Although these paradigms enable individual development cycles with reduced delivery times, they cause several challenges to manage the services in distributed systems. One major challenge is to observe and monitor such distributed systems. This paper provides a qualitative study to understand the challenges and good practices in the field of observability and monitoring of distributed systems. In 28 semi-structured interviews with software professionals we discovered increasing complexity and dynamics in that field. Especially observability becomes an essential prerequisite to ensure stable services and further development of client applications. However, the participants mentioned a discrepancy in the awareness regarding the importance of the topic, both from the management as well as from the developer perspective. Besides technical challenges, we identified a strong need for an organizational concept including strategy, roles and responsibilities. Our results support practitioners in developing and implementing systematic observability and monitoring for distributed systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aceto, G., Botta, A., de Donato, W., Pescapè, A.: Cloud monitoring: a survey. Comput. Netw. 57(9), 2093–2115 (2013)
Alhamazani, K., et al.: An overview of the commercial cloud monitoring tools: research dimensions, design issues, and state-of-the-art. Computing 97(4), 357–377 (2015)
Beyer, B., Jones, C., Petoff, J., Murphy, N.R.: Site Reliability Engineering: How Google Runs Production Systems. O’Reilly Media Inc., Sebastopol (2016)
Colville, R.J.: CMDB or configuration database: know the difference (2006)
Fatema, K., Emeakaroha, V.C., Healy, P.D., Morrison, J.P., Lynn, T.: A survey of cloud monitoring tools: taxonomy, capabilities and objectives. J. Parallel Distrib. Comput. 74(10), 2918–2933 (2014)
Gamez-Diaz, A., Fernandez, P., Ruiz-Cortes, A.: An analysis of RESTful APIs offerings in the industry. In: Maximilien, M., Vallecillo, A., Wang, J., Oriol, M. (eds.) ICSOC 2017. LNCS, vol. 10601, pp. 589–604. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69035-3_43
Gopal, M.: Modern Control System Theory, 2nd edn. Halsted Press, New York (1993)
Gupta, M., Mandal, A., Dasgupta, G., Serebrenik, A.: Runtime monitoring in continuous deployment by differencing execution behavior model. In: Pahl, C., Vukovic, M., Yin, J., Yu, Q. (eds.) ICSOC 2018. LNCS, vol. 11236, pp. 812–827. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03596-9_58
Heger, C., van Hoorn, A., Mann, M., Okanovic, D.: Application performance management: state of the art and challenges for the future. In: Proceedings of the 8th ACM/SPEC International Conference on Performance Engineering (ICPE 2017). ACM (2017)
IEEE: IEEE Standard Glossary of Software Engineering Terminology (1990). https://ieeexplore.ieee.org/document/159342
Johng, H., Kim, D., Hill, T., Chung, L.: Estimating the performance of cloud-based systems using benchmarking and simulation in a complementary manner. In: Pahl, C., Vukovic, M., Yin, J., Yu, Q. (eds.) ICSOC 2018. LNCS, vol. 11236, pp. 576–591. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03596-9_42
Kinsella, J.: The cloud complexity gap: making software more intelligent to address complex infrastructure. https://www.cloudcomputing-news.net/news/2015/jun/17/cloud-complexity-gap-making-software-more-intelligent-address-complex-infrastructure/
Knoche, H., Hasselbring, W.: Drivers and barriers for microservice adoption–a survey among professionals in Germany. Enterp. Model. Inf. Syst. Architect. (EMISAJ)–Int. J. Conceptual Model. 14(1), 1–35 (2019)
Lin, J., Chen, P., Zheng, Z.: Microscope: pinpoint performance issues with causal graphs in micro-service environments. In: Pahl, C., Vukovic, M., Yin, J., Yu, Q. (eds.) ICSOC 2018. LNCS, vol. 11236, pp. 3–20. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03596-9_1
Mayring, P.: Qualitative Content Analysis: Theoretical Foundation, Basic Procedures and Software Solution (2014)
Natu, M., Ghosh, R.K., Shyamsundar, R.K., Ranjan, R.: Holistic performance monitoring of hybrid clouds: complexities and future directions. IEEE Cloud Comput. 3(1), 72–81 (2016)
Niedermaier, S., Koetter, F., Freymann, A., Wagner, S.: Interview guideline on observability and monitoring of distributed systems (2019). https://doi.org/10.5281/zenodo.3346579
Picoreti, R., Pereira do Carmo, A., Mendonça de Queiroz, F., Salles Garcia, A., Frizera Vassallo, R., Simeonidou, D.: Multilevel observability in cloud orchestration. In: 2018 IEEE 16th International Conference on DASC/PiCom/DataCom/CyberSciTech, pp. 776–784, August 2018
Runeson, P., Höst, M.: Guidelines for conducting and reporting case study research in software engineering. Empirical Softw. Eng. 14(2), 131 (2008)
Sambasivan, R.R., Shafer, I., Mace, J., Sigelman, B.H., Fonseca, R., Ganger, G.R.: Principled workflow-centric tracing of distributed systems. In: Proceedings of the Seventh ACM Symposium on Cloud Computing, pp. 401–414. ACM (2016)
Sfondrini, N., Motta, G., Longo, A.: Public cloud adoption in multinational companies: a survey. In: 2018 IEEE International Conference on Services Computing (SCC), pp. 177–184, July 2018
Singer, J., Sim, S.E., Lethbridge, T.C.: Software engineering data collection for field studies. In: Shull, F., Singer, J., Sjøberg, D.I.K. (eds.) Guide to Advanced Empirical Software Engineering, pp. 9–34. Springer, London (2008). https://doi.org/10.1007/978-1-84800-044-5_1
Sun, C., Li, M., Jia, J., Han, J.: Constraint-based model-driven testing of web services for behavior conformance. In: Pahl, C., Vukovic, M., Yin, J., Yu, Q. (eds.) ICSOC 2018. LNCS, vol. 11236, pp. 543–559. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03596-9_40
Yang, Y., Wang, L., Gu, J., Li, Y.: Transparently capturing execution path of service/job request processing. In: Pahl, C., Vukovic, M., Yin, J., Yu, Q. (eds.) ICSOC 2018. LNCS, vol. 11236, pp. 879–887. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03596-9_63
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Niedermaier, S., Koetter, F., Freymann, A., Wagner, S. (2019). On Observability and Monitoring of Distributed Systems – An Industry Interview Study. In: Yangui, S., Bouassida Rodriguez, I., Drira, K., Tari, Z. (eds) Service-Oriented Computing. ICSOC 2019. Lecture Notes in Computer Science(), vol 11895. Springer, Cham. https://doi.org/10.1007/978-3-030-33702-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-33702-5_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33701-8
Online ISBN: 978-3-030-33702-5
eBook Packages: Computer ScienceComputer Science (R0)