Combining Clustering and Classification for Software Quality Evaluation

Papas, Diomidis; Tjortjis, Christos

doi:10.1007/978-3-319-07064-3_22

Diomidis Papas²² &
Christos Tjortjis²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8445))

Included in the following conference series:

Hellenic Conference on Artificial Intelligence

2766 Accesses
8 Citations

Abstract

Source code and metric mining have been used to successfully assist with software quality evaluation. This paper presents a data mining approach which incorporates clustering Java classes, as well as classifying extracted clusters, in order to assess internal software quality. We use Java classes as entities and static metrics as attributes for data mining. We identify outliers and apply K-means clustering in order to establish clusters of classes. Outliers indicate potentially fault prone classes, whilst clusters are examined so that we can establish common characteristics. Subsequently, we apply C4.5 to build classification trees for identifying metrics which determine cluster membership. We evaluate the proposed approach with two well known open source software systems, Jedit and Apache Geronimo. Results have consolidated key findings from previous work and indicated that combining clustering with classification produces better results than stand alone clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Tian, J.: Quality-Evaluation Models and Measurements. IEEE Software 21, 84–91 (2004)
Article Google Scholar
Li, H.F., Cheung, W.K.: An Experimental investigation of software metric and their relationship to software development effort. IEEE Transaction on Software Engineering 15(5), 649–653 (1989)
Article Google Scholar
Kanellopoulos, Y., Makris, C., Tjortjis, C.: An Improved Methodology on Information Distillation by Mining Program Source Code. Data & Knowledge Engineering, Elsevier 61(2), 359–383 (2007)
Article Google Scholar
Menzies, T., Greenwald, J., Frank, A.: Data Mining Static Code Attributes to Learn Defect Predictors. IEEE Transactions on Software Engineering 32(11), 2–13 (2007)
Article Google Scholar
Tribus, H., Morrigl, I., Axelsson, S.: Using Data Mining for Static Code Analysis of C. In: Zhou, S., Zhang, S., Karypis, G. (eds.) ADMA 2012. LNCS (LNAI), vol. 7713, pp. 603–614. Springer, Heidelberg (2012)
Chapter Google Scholar
Bush, W.R., Pincus, J.D., Sielaff, D.J.: A Static Analyzer for Finding Dynamic Programming Errors. Software-Practice and Experience 20, 775–802 (2000)
Article Google Scholar
Spinnelis, D.: Code Quality the Open Source Perspective. Addison Wesley (2006)
Google Scholar
Fenton, N.E.: Software Metrics: A Rigorous Approach. Cengage Learning EMEA (1991)
Google Scholar
Chidamber, S.R., Kemerer, C.F.: Towards a Metrics Suite for Object Oriented Design. In: Proc. Conf. Object Oriented Programming Systems, Languages, and Applications (OOPSLA 1991), vol. 26(11), pp. 197–211 (1991)
Google Scholar
Chidamber, S.R., Kemerer, C.F.: A metrics suite for object oriented design. IEEE Transactions on Software Engineering 20(6), 476–493 (1994)
Article Google Scholar
Halstead, M.: Elements of Software Science. Elsevier (1977)
Google Scholar
McCabe, T.J.: A Complexity Measure. IEEE Transactions on Software Engineering SE-2(4), 308–320 (1976)
Article MathSciNet Google Scholar
Dick, S., Meeks, A., Last, M., Bunke, H., Kandel, A.: Data mining in software metrics databases. Fuzzy Sets and Systems 145(1), 81–100 (2004)
Article MathSciNet Google Scholar
Zhong, S., Khoshgoftaar, T.M., Seliya, N.: Expert-Based Software Measurement Data Analysis with Clustering Techniques. IEEE Intelligent Systems, Special Issue on Data and Information Cleaning and Preprocessing, 22–30 (2004)
Google Scholar
Nagappan, N., Ball, T., Zeller, A.: Mining Metrics to Predict Component Failures. In: Proc. 28th Int’l Conf. Software Engineering (ICSE 2006), pp. 452–461 (2006)
Google Scholar
Kanellopoulos, Y., Antonellis, P., Antoniou, D., Makris, C., Theodoridis, E., Tjortjis, C., Tsirakis, N.: Code Quality Evaluation methodology using the ISO/IEC 9126 Standard. Int’l Journal of Software Engineering & Applications 1(3), 17–36 (2010)
Article Google Scholar
Antonellis, P., Antoniou, D., Kanellopoulos, Y., Makris, C., Theodoridis, E., Tjortjis, C., Tsirakis, N.: Employing Clustering for Assisting Source Code Maintainability Evaluation according to ISO/IEC-9126. In: Proc. Artificial Intelligence Techniques in Software Engineering Workshop (AISEW 2008) in ECAI 2008 (2008)
Google Scholar
Dunham, M.H.: Data Mining: Introductory and Advanced Topics. Pearson Education (2006)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann (2005)
Google Scholar
Vartziotis, F.: Java Source Code Analyzer for Software Assessment, BSc Dissertation, Department of Computer Science & Engineering University of Ioannina (2012)
Google Scholar
Kanellopoulos, Y., Heitlager, I., Tjortjis, C., Visser, J.: Interpretation of Source Code Clusters in Terms of the ISO/IEC-9126 Maintainability Characteristics. In: Proc. 12th European Conf. Software Maintenance and Reengineering (CSMR 2008), pp. 63–72. IEEE Comp. Soc. Press (2008)
Google Scholar
Antonellis, P., Antoniou, D., Kanellopoulos, Y., Makris, C., Theodoridis, E., Tjortjis, C., Tsirakis, N.: Clustering for Monitoring Software Systems Maintainability Evolution. Electronic Notes in Theoretical Computer Science, Elsevier 233, 43–57 (2009)
Article Google Scholar
Prasad, A.V.K., Krishna, S.R.: Data Mining for Secure Software Engineering-Source Code Management Tool Case Study. Int’l Journal of Engineering Science and Technology 2(7), 2667–2677 (2010)
Google Scholar
JEdit website, http://www.jedit.org (last accessed: January 15, 2014)
Apache Geronimo website, http://geronimo.apache.org (last accessed: January 15, 2014)

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, University of Ioannina, P.O. Box 1186, GR 45110, Ioannina, Greece
Diomidis Papas
School of Science & Technology, International Hellenic University, 14th km Thessaloniki – Moudania, 57001, Thermi, Greece
Christos Tjortjis

Authors

Diomidis Papas
View author publications
You can also search for this author in PubMed Google Scholar
Christos Tjortjis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Ioannina, GR 45110, Ioannina, Greece
Aristidis Likas
Department of Computer Science, University of Ioannina, P.O. Box 1186, 45110, Ioannina, Greece
Konstantinos Blekas
Hellenic Open University, GR 26335, Peribola, Patras, Greece
Dimitris Kalles

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Papas, D., Tjortjis, C. (2014). Combining Clustering and Classification for Software Quality Evaluation. In: Likas, A., Blekas, K., Kalles, D. (eds) Artificial Intelligence: Methods and Applications. SETN 2014. Lecture Notes in Computer Science(), vol 8445. Springer, Cham. https://doi.org/10.1007/978-3-319-07064-3_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-07064-3_22
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07063-6
Online ISBN: 978-3-319-07064-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics