Benefits and Drawbacks of Representing and Analyzing Source Code and Software Engineering Artifacts with Graph Databases

Ramler, Rudolf; Buchgeher, Georg; Klammer, Claus; Pfeiffer, Michael; Salomon, Christian; Thaller, Hannes; Linsbauer, Lukas

doi:10.1007/978-3-030-05767-1_9

Rudolf Ramler ORCID: orcid.org/0000-0001-9903-6107⁹,
Georg Buchgeher⁹,
Claus Klammer⁹,
Michael Pfeiffer⁹,
Christian Salomon⁹,
Hannes Thaller¹⁰ &
…
Lukas Linsbauer¹⁰

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 338))

Included in the following conference series:

International Conference on Software Quality

1126 Accesses
2 Citations

Abstract

Source code and related artifacts of software systems encode valuable expert knowledge accumulated over many person-years of development. Analyzing software systems and extracting this knowledge requires processing the source code and reconstructing structure and dependency information. In analysis projects over the last years, we have created tools and services using graph databases for representing and analyzing source code and other software engineering artifacts as well as their dependencies. Graph databases such as Neo4j are optimized for storing, traversing, and manipulating data in the form of nodes and relationships. They are scalable, extendable, and can quickly be adapted for different application scenarios. In this paper, we share our insights and experience from five different cases where graph databases have been used as a common solution concept for analyzing source code and related artifacts. They cover a broad spectrum of use cases from industry and research, ranging from lightweight dependency analysis to analyzing the architecture of a large-scale software system with 44 million lines of code. We discuss the benefits and drawbacks of using graph databases in the reported cases. The benefits are related to representing dependencies between source code elements and other artifacts, the support for rapid prototyping of analysis solutions, and the power and flexibility of the graph query language. The drawbacks concern the generic frontends of graph databases and the lack of support for time series data. A summary of application scenarios for using graph databases concludes the paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://neo4j.com.
2.
https://jqassistant.org.
3.
If not already revealed in previous publications, details about involved industry partners have been omitted due to confidentiality obligations.

References

Alves, T.L., Hage, J., Rademaker, P.: A comparative study of code query technologies. In: 11th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM) 2011, pp. 145–154. IEEE (2011)
Google Scholar
Angerer, F., Prähofer, H., Ramler, R., Grillenberger, F.: Points-to analysis of IEC 61131–3 programs: Implementation and application. In: IEEE 18th Conference on Emerging Technologies & Factory Automation (ETFA) 2013, pp. 1–8. IEEE (2013)
Google Scholar
Angles, R.: A comparison of current graph database models. In: IEEE 28th International Conference on Data Engineering Workshops (ICDEW) 2012. pp. 171–177. IEEE (2012)
Google Scholar
Angles, R., Gutierrez, C.: Survey of graph database models. ACM Comput. Surv. (CSUR) 40(1), 1 (2008)
Article Google Scholar
Buchgeher, G., Ernstbrunner, C., Ramler, R., Lusser, M.: Towards tool-support for test case selection in manual regression testing. In: IEEE Sixth International Conference on Software Testing, Verification and Validation Workshops (ICSTW) 2013, pp. 74–79. IEEE (2013)
Google Scholar
Buchgeher, G., Weinreich, R., Huber, H.: A platform for the automated provisioning of architecture information for large-scale service-oriented software systems. In: European Conference on Software Architecture. Springer (2018) (to appear)
Google Scholar
Fleck, G., Kirchmayr, W., Moser, M., Nocke, L., Pichler, J., Tober, R., Witlatschil, M.: Experience report on building ASTM based tools for multi-language reverse engineering. In:IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER) 2016, vol. 1, pp. 683–687. IEEE (2016)
Google Scholar
Goonetilleke, O., Meibusch, D., Barham, B.: Graph data management of evolving dependency graphs for multi-versioned codebases. In: IEEE International Conference on Software Maintenance and Evolution (ICSME) 2017, pp. 574–583. IEEE (2017)
Google Scholar
Hawes, N., Barham, B., Cifuentes, C.: Frappé: Querying the Linux kernel dependency graph. In: Proceedings of the GRADES 2015, p. 4. ACM (2015)
Google Scholar
Ikkink, H.K.: Gradle Dependency Management. Packt Publishing, Birmingham (2015)
Google Scholar
John, K.H., Tiegelkamp, M.: IEC 61131–3: Programming Industrial Automation Systems. Concepts and Programming Languages, Requirements for Programming Systems Decision-making Aids. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12015-2
Book Google Scholar
Juergens, E., Hummel, B., Deissenboeck, F., Feilkas, M., Schlogel, C., Wubbeke, A.: Regression test selection of manual system tests in practice. In: 15th European Conference on Software Maintenance and Reengineering, pp. 309–312, March 2011
Google Scholar
Passos, L., Terra, R., Valente, M.T., Diniz, R., das Mendonca, N.C.: Static architecture-conformance checking: an illustrative overview. IEEE Softw. 27(5), 82–89 (2010)
Article Google Scholar
Pawlak, R., Monperrus, M., Petitprez, N., Noguera, C., Seinturier, L.: SPOON: A library for implementing analyses and transformations of Java source code. Softw. Pract. Exp. 46(9), 1155–1179 (2015)
Article Google Scholar
Prähofer, H., Angerer, F., Ramler, R., Grillenberger, F.: Static code analysis of iec 61131–3 programs: Comprehensive tool support and experiences from large-scale industrial application. IEEE Trans. Ind. Inform. 13(1), 37–47 (2017)
Article Google Scholar
Prähofer, H., Angerer, F., Ramler, R., Lacheiner, H., Grillenberger, F.: Opportunities and challenges of static code analysis of iec 61131–3 programs. In: IEEE 17th Conference on Emerging Technologies & Factory Automation (ETFA), pp. 1–8. IEEE (2012)
Google Scholar
Ramler, R., Salomon, C., Buchgeher, G., Lusser, M.: Tool support for change-based regression testing: an industry experience report. In: Winkler, D., Biffl, S., Bergsmann, J. (eds.) SWQD 2017. LNBIP, vol. 269, pp. 133–152. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-49421-0_10
Chapter Google Scholar
Robinson, I., Webber, J., Eifrem, E.: Graph Databases: New Opportunities for Connected Data. O’Reilly. Media Inc., Sebastopol (2015)
Google Scholar
Runeson, P., Host, M., Rainer, A., Regnell, B.: Case Study Research in Software Engineering. Guidelines and Examples. Wiley, Hoboken (2012)
Book Google Scholar
Thaller, H.: Probabilistic Software Modeling, Jun 2018. arXiv:1806.08942 [cs]
Urma, R.G., Mycroft, A.: Source-code queries with graph databases-with application to programming language usage and evolution. Sci. Comput. Program. 97, 127–134 (2015)
Article Google Scholar
Yamaguchi, F., Golde, N., Arp, D., Rieck, K.: Modeling and discovering vulnerabilities with code property graphs. In: IEEE Symposium on Security and Privacy (SP), pp. 590–604. IEEE (2014)
Google Scholar
Yoo, S., Harman, M.: Regression testing minimization, selection and prioritization: a survey. Softw. Test. Verif. Reliab. 22(2), 67–120 (2012)
Article Google Scholar
Zhang, T., Pan, M., Zhao, J., Yu, Y., Li, X.: An open framework for semantic code queries on heterogeneous repositories. In: International Symposium on Theoretical Aspects of Software Engineering (TASE), pp. 39–46. IEEE (2015)
Google Scholar

Download references

Acknowledgements

The research reported in this paper was supported by the Austrian Ministry for Transport, Innovation and Technology, the Federal Ministry for Digital and Economic Affairs, and the Province of Upper Austria in the frame of the COMET center SCCH.

Author information

Authors and Affiliations

Software Competence Center Hagenberg GmbH, Softwarepark 21, 4232, Hagenberg, Austria
Rudolf Ramler, Georg Buchgeher, Claus Klammer, Michael Pfeiffer & Christian Salomon
Johannes Kepler University Linz, Altenberger Street 69, 4040, Linz, Austria
Hannes Thaller & Lukas Linsbauer

Authors

Rudolf Ramler
View author publications
You can also search for this author in PubMed Google Scholar
Georg Buchgeher
View author publications
You can also search for this author in PubMed Google Scholar
Claus Klammer
View author publications
You can also search for this author in PubMed Google Scholar
Michael Pfeiffer
View author publications
You can also search for this author in PubMed Google Scholar
Christian Salomon
View author publications
You can also search for this author in PubMed Google Scholar
Hannes Thaller
View author publications
You can also search for this author in PubMed Google Scholar
Lukas Linsbauer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rudolf Ramler .

Editor information

Editors and Affiliations

Vienna University of Technology, Vienna, Austria
Dietmar Winkler
Vienna University of Technology, Vienna, Austria
Stefan Biffl
Software Quality Lab GmbH, Linz, Austria
Johannes Bergsmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ramler, R. et al. (2019). Benefits and Drawbacks of Representing and Analyzing Source Code and Software Engineering Artifacts with Graph Databases. In: Winkler, D., Biffl, S., Bergsmann, J. (eds) Software Quality: The Complexity and Challenges of Software Engineering and Software Quality in the Cloud. SWQD 2019. Lecture Notes in Business Information Processing, vol 338. Springer, Cham. https://doi.org/10.1007/978-3-030-05767-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-05767-1_9
Published: 11 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05766-4
Online ISBN: 978-3-030-05767-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics