Skip to main content

On Observability and Monitoring of Distributed Systems – An Industry Interview Study

  • Conference paper
  • First Online:
Service-Oriented Computing (ICSOC 2019)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11895))

Included in the following conference series:

Abstract

Business success of companies heavily depends on the availability and performance of their client applications. Due to modern development paradigms such as DevOps and microservice architectural styles, applications are decoupled into services with complex interactions and dependencies. Although these paradigms enable individual development cycles with reduced delivery times, they cause several challenges to manage the services in distributed systems. One major challenge is to observe and monitor such distributed systems. This paper provides a qualitative study to understand the challenges and good practices in the field of observability and monitoring of distributed systems. In 28 semi-structured interviews with software professionals we discovered increasing complexity and dynamics in that field. Especially observability becomes an essential prerequisite to ensure stable services and further development of client applications. However, the participants mentioned a discrepancy in the awareness regarding the importance of the topic, both from the management as well as from the developer perspective. Besides technical challenges, we identified a strong need for an organizational concept including strategy, roles and responsibilities. Our results support practitioners in developing and implementing systematic observability and monitoring for distributed systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aceto, G., Botta, A., de Donato, W., Pescapè, A.: Cloud monitoring: a survey. Comput. Netw. 57(9), 2093–2115 (2013)

    Article  Google Scholar 

  2. Alhamazani, K., et al.: An overview of the commercial cloud monitoring tools: research dimensions, design issues, and state-of-the-art. Computing 97(4), 357–377 (2015)

    Article  MathSciNet  Google Scholar 

  3. Beyer, B., Jones, C., Petoff, J., Murphy, N.R.: Site Reliability Engineering: How Google Runs Production Systems. O’Reilly Media Inc., Sebastopol (2016)

    Google Scholar 

  4. Colville, R.J.: CMDB or configuration database: know the difference (2006)

    Google Scholar 

  5. Fatema, K., Emeakaroha, V.C., Healy, P.D., Morrison, J.P., Lynn, T.: A survey of cloud monitoring tools: taxonomy, capabilities and objectives. J. Parallel Distrib. Comput. 74(10), 2918–2933 (2014)

    Article  Google Scholar 

  6. Gamez-Diaz, A., Fernandez, P., Ruiz-Cortes, A.: An analysis of RESTful APIs offerings in the industry. In: Maximilien, M., Vallecillo, A., Wang, J., Oriol, M. (eds.) ICSOC 2017. LNCS, vol. 10601, pp. 589–604. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69035-3_43

    Chapter  Google Scholar 

  7. Gopal, M.: Modern Control System Theory, 2nd edn. Halsted Press, New York (1993)

    Google Scholar 

  8. Gupta, M., Mandal, A., Dasgupta, G., Serebrenik, A.: Runtime monitoring in continuous deployment by differencing execution behavior model. In: Pahl, C., Vukovic, M., Yin, J., Yu, Q. (eds.) ICSOC 2018. LNCS, vol. 11236, pp. 812–827. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03596-9_58

    Chapter  Google Scholar 

  9. Heger, C., van Hoorn, A., Mann, M., Okanovic, D.: Application performance management: state of the art and challenges for the future. In: Proceedings of the 8th ACM/SPEC International Conference on Performance Engineering (ICPE 2017). ACM (2017)

    Google Scholar 

  10. IEEE: IEEE Standard Glossary of Software Engineering Terminology (1990). https://ieeexplore.ieee.org/document/159342

  11. Johng, H., Kim, D., Hill, T., Chung, L.: Estimating the performance of cloud-based systems using benchmarking and simulation in a complementary manner. In: Pahl, C., Vukovic, M., Yin, J., Yu, Q. (eds.) ICSOC 2018. LNCS, vol. 11236, pp. 576–591. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03596-9_42

    Chapter  Google Scholar 

  12. Kinsella, J.: The cloud complexity gap: making software more intelligent to address complex infrastructure. https://www.cloudcomputing-news.net/news/2015/jun/17/cloud-complexity-gap-making-software-more-intelligent-address-complex-infrastructure/

  13. Knoche, H., Hasselbring, W.: Drivers and barriers for microservice adoption–a survey among professionals in Germany. Enterp. Model. Inf. Syst. Architect. (EMISAJ)–Int. J. Conceptual Model. 14(1), 1–35 (2019)

    Google Scholar 

  14. Lin, J., Chen, P., Zheng, Z.: Microscope: pinpoint performance issues with causal graphs in micro-service environments. In: Pahl, C., Vukovic, M., Yin, J., Yu, Q. (eds.) ICSOC 2018. LNCS, vol. 11236, pp. 3–20. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03596-9_1

    Chapter  Google Scholar 

  15. Mayring, P.: Qualitative Content Analysis: Theoretical Foundation, Basic Procedures and Software Solution (2014)

    Google Scholar 

  16. Natu, M., Ghosh, R.K., Shyamsundar, R.K., Ranjan, R.: Holistic performance monitoring of hybrid clouds: complexities and future directions. IEEE Cloud Comput. 3(1), 72–81 (2016)

    Article  Google Scholar 

  17. Niedermaier, S., Koetter, F., Freymann, A., Wagner, S.: Interview guideline on observability and monitoring of distributed systems (2019). https://doi.org/10.5281/zenodo.3346579

  18. Picoreti, R., Pereira do Carmo, A., Mendonça de Queiroz, F., Salles Garcia, A., Frizera Vassallo, R., Simeonidou, D.: Multilevel observability in cloud orchestration. In: 2018 IEEE 16th International Conference on DASC/PiCom/DataCom/CyberSciTech, pp. 776–784, August 2018

    Google Scholar 

  19. Runeson, P., Höst, M.: Guidelines for conducting and reporting case study research in software engineering. Empirical Softw. Eng. 14(2), 131 (2008)

    Article  Google Scholar 

  20. Sambasivan, R.R., Shafer, I., Mace, J., Sigelman, B.H., Fonseca, R., Ganger, G.R.: Principled workflow-centric tracing of distributed systems. In: Proceedings of the Seventh ACM Symposium on Cloud Computing, pp. 401–414. ACM (2016)

    Google Scholar 

  21. Sfondrini, N., Motta, G., Longo, A.: Public cloud adoption in multinational companies: a survey. In: 2018 IEEE International Conference on Services Computing (SCC), pp. 177–184, July 2018

    Google Scholar 

  22. Singer, J., Sim, S.E., Lethbridge, T.C.: Software engineering data collection for field studies. In: Shull, F., Singer, J., Sjøberg, D.I.K. (eds.) Guide to Advanced Empirical Software Engineering, pp. 9–34. Springer, London (2008). https://doi.org/10.1007/978-1-84800-044-5_1

    Chapter  Google Scholar 

  23. Sun, C., Li, M., Jia, J., Han, J.: Constraint-based model-driven testing of web services for behavior conformance. In: Pahl, C., Vukovic, M., Yin, J., Yu, Q. (eds.) ICSOC 2018. LNCS, vol. 11236, pp. 543–559. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03596-9_40

    Chapter  Google Scholar 

  24. Yang, Y., Wang, L., Gu, J., Li, Y.: Transparently capturing execution path of service/job request processing. In: Pahl, C., Vukovic, M., Yin, J., Yu, Q. (eds.) ICSOC 2018. LNCS, vol. 11236, pp. 879–887. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03596-9_63

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Falko Koetter .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Niedermaier, S., Koetter, F., Freymann, A., Wagner, S. (2019). On Observability and Monitoring of Distributed Systems – An Industry Interview Study. In: Yangui, S., Bouassida Rodriguez, I., Drira, K., Tari, Z. (eds) Service-Oriented Computing. ICSOC 2019. Lecture Notes in Computer Science(), vol 11895. Springer, Cham. https://doi.org/10.1007/978-3-030-33702-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-33702-5_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-33701-8

  • Online ISBN: 978-3-030-33702-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics