Skip to main content

Benefits and Drawbacks of Representing and Analyzing Source Code and Software Engineering Artifacts with Graph Databases

  • Conference paper
  • First Online:
Software Quality: The Complexity and Challenges of Software Engineering and Software Quality in the Cloud (SWQD 2019)

Abstract

Source code and related artifacts of software systems encode valuable expert knowledge accumulated over many person-years of development. Analyzing software systems and extracting this knowledge requires processing the source code and reconstructing structure and dependency information. In analysis projects over the last years, we have created tools and services using graph databases for representing and analyzing source code and other software engineering artifacts as well as their dependencies. Graph databases such as Neo4j are optimized for storing, traversing, and manipulating data in the form of nodes and relationships. They are scalable, extendable, and can quickly be adapted for different application scenarios. In this paper, we share our insights and experience from five different cases where graph databases have been used as a common solution concept for analyzing source code and related artifacts. They cover a broad spectrum of use cases from industry and research, ranging from lightweight dependency analysis to analyzing the architecture of a large-scale software system with 44 million lines of code. We discuss the benefits and drawbacks of using graph databases in the reported cases. The benefits are related to representing dependencies between source code elements and other artifacts, the support for rapid prototyping of analysis solutions, and the power and flexibility of the graph query language. The drawbacks concern the generic frontends of graph databases and the lack of support for time series data. A summary of application scenarios for using graph databases concludes the paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://neo4j.com.

  2. 2.

    https://jqassistant.org.

  3. 3.

    If not already revealed in previous publications, details about involved industry partners have been omitted due to confidentiality obligations.

References

  1. Alves, T.L., Hage, J., Rademaker, P.: A comparative study of code query technologies. In: 11th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM) 2011, pp. 145–154. IEEE (2011)

    Google Scholar 

  2. Angerer, F., Prähofer, H., Ramler, R., Grillenberger, F.: Points-to analysis of IEC 61131–3 programs: Implementation and application. In: IEEE 18th Conference on Emerging Technologies & Factory Automation (ETFA) 2013, pp. 1–8. IEEE (2013)

    Google Scholar 

  3. Angles, R.: A comparison of current graph database models. In: IEEE 28th International Conference on Data Engineering Workshops (ICDEW) 2012. pp. 171–177. IEEE (2012)

    Google Scholar 

  4. Angles, R., Gutierrez, C.: Survey of graph database models. ACM Comput. Surv. (CSUR) 40(1), 1 (2008)

    Article  Google Scholar 

  5. Buchgeher, G., Ernstbrunner, C., Ramler, R., Lusser, M.: Towards tool-support for test case selection in manual regression testing. In: IEEE Sixth International Conference on Software Testing, Verification and Validation Workshops (ICSTW) 2013, pp. 74–79. IEEE (2013)

    Google Scholar 

  6. Buchgeher, G., Weinreich, R., Huber, H.: A platform for the automated provisioning of architecture information for large-scale service-oriented software systems. In: European Conference on Software Architecture. Springer (2018) (to appear)

    Google Scholar 

  7. Fleck, G., Kirchmayr, W., Moser, M., Nocke, L., Pichler, J., Tober, R., Witlatschil, M.: Experience report on building ASTM based tools for multi-language reverse engineering. In:IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER) 2016, vol. 1, pp. 683–687. IEEE (2016)

    Google Scholar 

  8. Goonetilleke, O., Meibusch, D., Barham, B.: Graph data management of evolving dependency graphs for multi-versioned codebases. In: IEEE International Conference on Software Maintenance and Evolution (ICSME) 2017, pp. 574–583. IEEE (2017)

    Google Scholar 

  9. Hawes, N., Barham, B., Cifuentes, C.: Frappé: Querying the Linux kernel dependency graph. In: Proceedings of the GRADES 2015, p. 4. ACM (2015)

    Google Scholar 

  10. Ikkink, H.K.: Gradle Dependency Management. Packt Publishing, Birmingham (2015)

    Google Scholar 

  11. John, K.H., Tiegelkamp, M.: IEC 61131–3: Programming Industrial Automation Systems. Concepts and Programming Languages, Requirements for Programming Systems Decision-making Aids. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12015-2

    Book  Google Scholar 

  12. Juergens, E., Hummel, B., Deissenboeck, F., Feilkas, M., Schlogel, C., Wubbeke, A.: Regression test selection of manual system tests in practice. In: 15th European Conference on Software Maintenance and Reengineering, pp. 309–312, March 2011

    Google Scholar 

  13. Passos, L., Terra, R., Valente, M.T., Diniz, R., das Mendonca, N.C.: Static architecture-conformance checking: an illustrative overview. IEEE Softw. 27(5), 82–89 (2010)

    Article  Google Scholar 

  14. Pawlak, R., Monperrus, M., Petitprez, N., Noguera, C., Seinturier, L.: SPOON: A library for implementing analyses and transformations of Java source code. Softw. Pract. Exp. 46(9), 1155–1179 (2015)

    Article  Google Scholar 

  15. Prähofer, H., Angerer, F., Ramler, R., Grillenberger, F.: Static code analysis of iec 61131–3 programs: Comprehensive tool support and experiences from large-scale industrial application. IEEE Trans. Ind. Inform. 13(1), 37–47 (2017)

    Article  Google Scholar 

  16. Prähofer, H., Angerer, F., Ramler, R., Lacheiner, H., Grillenberger, F.: Opportunities and challenges of static code analysis of iec 61131–3 programs. In: IEEE 17th Conference on Emerging Technologies & Factory Automation (ETFA), pp. 1–8. IEEE (2012)

    Google Scholar 

  17. Ramler, R., Salomon, C., Buchgeher, G., Lusser, M.: Tool support for change-based regression testing: an industry experience report. In: Winkler, D., Biffl, S., Bergsmann, J. (eds.) SWQD 2017. LNBIP, vol. 269, pp. 133–152. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-49421-0_10

    Chapter  Google Scholar 

  18. Robinson, I., Webber, J., Eifrem, E.: Graph Databases: New Opportunities for Connected Data. O’Reilly. Media Inc., Sebastopol (2015)

    Google Scholar 

  19. Runeson, P., Host, M., Rainer, A., Regnell, B.: Case Study Research in Software Engineering. Guidelines and Examples. Wiley, Hoboken (2012)

    Book  Google Scholar 

  20. Thaller, H.: Probabilistic Software Modeling, Jun 2018. arXiv:1806.08942 [cs]

  21. Urma, R.G., Mycroft, A.: Source-code queries with graph databases-with application to programming language usage and evolution. Sci. Comput. Program. 97, 127–134 (2015)

    Article  Google Scholar 

  22. Yamaguchi, F., Golde, N., Arp, D., Rieck, K.: Modeling and discovering vulnerabilities with code property graphs. In: IEEE Symposium on Security and Privacy (SP), pp. 590–604. IEEE (2014)

    Google Scholar 

  23. Yoo, S., Harman, M.: Regression testing minimization, selection and prioritization: a survey. Softw. Test. Verif. Reliab. 22(2), 67–120 (2012)

    Article  Google Scholar 

  24. Zhang, T., Pan, M., Zhao, J., Yu, Y., Li, X.: An open framework for semantic code queries on heterogeneous repositories. In: International Symposium on Theoretical Aspects of Software Engineering (TASE), pp. 39–46. IEEE (2015)

    Google Scholar 

Download references

Acknowledgements

The research reported in this paper was supported by the Austrian Ministry for Transport, Innovation and Technology, the Federal Ministry for Digital and Economic Affairs, and the Province of Upper Austria in the frame of the COMET center SCCH.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rudolf Ramler .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ramler, R. et al. (2019). Benefits and Drawbacks of Representing and Analyzing Source Code and Software Engineering Artifacts with Graph Databases. In: Winkler, D., Biffl, S., Bergsmann, J. (eds) Software Quality: The Complexity and Challenges of Software Engineering and Software Quality in the Cloud. SWQD 2019. Lecture Notes in Business Information Processing, vol 338. Springer, Cham. https://doi.org/10.1007/978-3-030-05767-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05767-1_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05766-4

  • Online ISBN: 978-3-030-05767-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics