Abstract
Software plays a major role in many organizations. Organizational success depends partially on the quality of softwares used. In recent years, many researchers have recognized that statistical classification techniques are well-suited to develop software quality prediction models. Different statistical software quality models, using complexity metrics as early indicators of software quality, have been proposed in the past. At a high-level the problem of software categorization is to classify software modules into fault prone and non-fault prone. Indeed, a learner is given a set of training modules and the corresponding class labels (i.e fault prone or non-fault-prone), and outputs a classifier. Then, the classifier takes an unlabeled module (i.e hitherto-unseen module) and assigns it to a class. The focus of this paper is to study some selected classification techniques widely used for software categorization. Indeed, practitioners are faced with a body of approaches and literature that give several conflicting advices about the usefulness of these classification approaches. The techniques evaluated in this paper include: principal component analysis, linear discriminant analysis, multiple linear regression, logistic regression, support vector machine and finite mixture models. Moreover, we propose a Bayesian approach based on finite Dirichlet mixture models. We evaluate experimentally these approaches using a real data set. Our experimental results show that different algorithms lead to different statistically significant results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Porter, A.A., Selby, R.W.: Empirically guided software development using metric-based classification trees. IEEE Software 7(2), 46–54 (1990)
Mayer, A., Sykes, A.M.: Statistical Methods for the Analysis of Software Metrics Data. Software Quality Journal 1(4), 209–223 (1992)
Narayanan, A.: A Note on Parameter Estimation in the Multivariate Beta Distribution. Computer Mathematics and Applications 24(10), 11–17 (1992)
Curtis, B., Sheprad, S.B., Milliman, H., Borst, M.A., Love, T.: Measuring the Psychlogical Complexity of Software Maintenance Tasks with the Halstead and McCabe Metrics. IEEE Transactions on Software Engineering SE-5(2), 96–104 (1979)
Boehm, B.W., Papaccio, P.N.: Understanding and Controlling Software Costs. IEEE Transactions on Software Engineering 14(10), 1462–1477 (1988)
Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)
Ebert, C.: Classification Techniques for Metric-Based Development. Software Quality Journal 5(4), 255–272 (1996)
Ebert, C., Baisch, E.: Industrial Application of Criticality Predictions in Software Development. In: Proc. of the 8th IEEE International Symposium on Software Reliability Engineering, pp. 80–89 (1998)
Wallace, C.S.: Statistical and Inductive Inference by Minimum Message Length. Springer, Heidelberg (2005)
Montgomery, D.C., Peck, E.A., Vining, G.G.: Introduction to Linear Regression Analysis, 3rd edn. Wiley-Interscience, Hoboken (2001)
Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression. Wiley-Interscience Publication, Hoboken (2000)
Zhang, D., Tsai, J.J.P.: Machine Learning and Software Engineering. Software Quality Journal 11(2), 87–119 (2003)
Weyuker, E.J.: Evaluating software complexity measures. IEEE Transactions on Software Engineering 14(9), 1357–1365 (1988)
Brooks, F.: No Silver Bullet-Essense and Accidents of Software Engineering. IEEE Computer 20(4), 10–19 (1987)
Lanubile, F.: Why Software Reliability Predictions Fail. IEEE Software 13(4), 131–132, 137 (1996)
Lanubile, F., Visaggio, G.: Evaluating Predictive Quality Models Derived from Software Measures: Lessons Learned. Journal of Systems and Software 38(3), 225–234 (1997)
Xing, F., Guo, P., Lyu, M.R.: A Novel Method for Early Software Quality Prediction Based on Support Vector Machine. In: Proc. of the 16th IEEE International Symposium on Software Reliability Engineering, pp. 213–222 (2005)
Le Gall, G., Adam, M.-F., Derriennic, H., Moreau, B., Valette, N.: Studies on Measuring Software. IEEE Journal on Selected Areas in Communications 8(2), 234–246 (1990)
Ronning, G.: Maximum Likelihood Estimation of Dirichlet Distributions. Journal of Statistical Computation and Simulation 32, 215–221 (1989)
Schwarz, G.: Estimating the Dimension of a Model. The Annals of Statistics 6(2), 461–464 (1978)
Russel, G.W.: Experience With Inspection in Ultralarge-Scale Developments. IEEE Software 8(1), 25–31 (1991)
McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000)
Akaike, H.: A New Look at the Statistical Model Identification. IEEE Transactions on Automatic Control AC-19(6), 716–723 (1974)
Jensen, H., Vairavan, K.: An Experimental Study of Software Metrics for Real-time Software. IEEE Transaction on Software Engineering SE-11(4), 231–234 (1994)
Zuse, H.: Comments to the Paper: Briand, Eman, Morasca: On the Application of Measurement Theory in Software Engineering. Empirical Software Engineering 2(3), 313–316 (1997)
Munson, J.C., Khoshgoftaar, T.M.: The Dimensionality of Program Complexity. In: Proc. of Eleventh International Conference on Software Engineering, pp. 245–253 (1989)
Gaffney, J.: Estimating the Number of Faults in Code. IEEE Transactions on Software Engineering 10(4), 459–464 (1984)
Henry, J., Henry, S., Kafura, D., Matheson, L.: Improving Software Maintenance at Martin Marietta. IEEE Software 11(4), 67–75 (1994)
Mayrand, J., Coallier, F.: System Acquisition Based on Software Product Assessment. In: Proc. of 18th International Conference on Software Engineering, pp. 210–219 (1996)
Troster, J., Tian, J.: Measurement and Defect Modeling for a Legacy Software System. Annals of Software Engineering 1(1), 95–118 (1995)
Munson, J.C.: Handbook of Software Reliability Engineering. IEEE Computer Society Press/McGraw-Hill Book Company (1999)
Munson, J.C., Khoshgoftaar, T.M.: The Detection of Fault-Prone Programs. IEEE Transactions on Software Engineering 18(5), 423–433 (1992)
Briand, L., EL Emam, K., Morasca, S.: On the Application of Measurement Theory in Software Engineering. Empirical Software Engineering 1(1), 61–88 (1996)
Briand, L.C., Basili, V.R., Hetmanski, C.J.: Developing Interpretable Models with Optimized Set Reduction for Identifying High-Risk Software Components. IEEE Transactions on Software Engineering 19(11), 1028–1044 (1993)
Briand, L.C., Basili, V.R., Thomas, W.M.: A Pattern Recognition Approach for Software Engineering Data Analysis. IEEE Transactions on Software Engineering 18(11), 931–942 (1992)
Briand, L.C., Thomas, W.M., Hetmanski, C.J.: Modeling and Managing Risk Early in Software Development. In: Proc. of 15th International Conference on Software Engineering, pp. 55–65 (1993)
Guo, L., Ma, Y., Cukic, B., Singh, H.: Robust Prediction of Fault-Proneness by Random Forests. In: Proc. of the 15th IEEE International Symposium on Software Reliability Engineering, pp. 417–428 (2004)
Ottenstein, L.M.: Quantitative Estimates of Debugging Requirements. IEEE Transactions on Software Engineering SE-5(5), 504–514 (1979)
Mark, L., Jeff, K.: Object-Oriented Software Metrics. Prentice-Hall, Englewood Cliffs (1994)
Ohlsson, M.C., Wohlin, C.: Identification of Green, Yellow and Red Legacy Components. In: Proc. of the International Conference on Software Maintenance, pp. 6–15 (1998)
Ohlsson, M.C., Runeson, P.: Experience from Replicating Empirical Studies on Prediction Models. In: Proc. of the Eighth IEEE Symposium on Software Metrics, pp. 217–226 (2002)
Halstead, M.H., Leroy, A.M.: Elements of Software Science. Elseviser, New York (1977)
Hitz, M., Montazeri, B.: Chidamber and Kemerer’s Metrics Suite: A Measurement Theory Perspective. IEEE Transactions on Software Engineering 22(4), 267–271 (1996)
Shepperd, M., Kadoda, G.: Comparing Software Prediction Techniques Using Simulation. IEEE Transactions on Software Engineering 27(11), 1014–1022 (2001)
Bouguila, N., Ziou, D.: Unsupervised Selection of a Finite Dirichlet Mixture Model: An MML-Based Approach. IEEE Transactions on Knowledge and Data Engineering 18(8), 993–1009 (2006)
Bouguila, N., Ziou, D.: Unsupervised Learning of a Finite Discrete Mixture: Applications to Texture Modeling and Image Databases Summarization. Journal of Visual Communication and Image Representation 18(4), 295–309 (2007)
Bouguila, N., Ziou, D., Monga, E.: Practical Bayesian Estimation of a Finite Beta Mixture Through Gibbs Sampling and its Applications. Statistics and Computing 16(2), 215–225 (2006)
Bouguila, N., Ziou, D., Vaillancourt, J.: Novel Mixtures Based on the Dirichlet Distribution: Application to Data and Image Classification. In: Perner, P., Rosenfeld, A. (eds.) MLDM 2003. LNCS (LNAI), vol. 2734, pp. 172–181. Springer, Heidelberg (2003)
Bouguila, N., Ziou, D., Vaillancourt, J.: Unsupervised Learning of a Finite Mixture Model Based on the Dirichlet Distribution and its Application. IEEE Transactions on Image Processing 13(11), 1533–1543 (2004)
Bouguila, N., Wang, J.H., Ben Hamza, A.: A Bayesian Approach for Software Quality Prediction. In: Proc. of the IEEE International Conference on Intelligent Systems, pp. 49–54 (2008)
Schneidewind, N.F.: Validating Software Metrics: Producing Quality Discriminators. In: Proc. of Second International Symposium on Software Reliability Engineering, pp. 225–232 (1991)
Schneidewind, N.F.: Methodology For Validating Software Metrics. IEEE Transactions on Software Engineering 18(5), 410–422 (1992)
Schneidewind, N.F.: Minimizing risk in applying metrics on multiple projects. In: Proc. of Third International Symposium on Software Reliability Engineering, pp. 173–182 (1992)
Schneidewind, N.F.: Software metrics validation: Space Shuttle flight software example. Annals of Software Engineering 1(1), 287–309 (1995)
Schneidewind, N.F.: Software metrics model for integrating quality control and prediction. In: Proc. of the Eighth International Symposium on Software Reliability Engineering, pp. 402–415 (1997)
Schneidewind, N.F.: Investigation of Logistic Regression as a Discriminant of Software Quality. In: Proc. of the Seventh IEEE Symposium on Software Metrics, pp. 328–337 (2001)
Fenton, N.: Software Measurement: A Necessary Scientific Basis. IEEE Transactions on Software Engineering 20(3), 199–206 (1994)
Ohlisson, N., Zhao, M., Helander, M.: Application of Multivariate Analysis for Software Fault Prediction. Software Quality Journal 7(1), 51–66 (1998)
Ohlsson, N., Alberg, H.: Predicting Fault-Prone Software Modules in Telephone Switches. IEEE Transactions on Software Engineering 22(12), 886–894 (1996)
Congdon, P.: Applied Bayesian Modelling. John Wiley and Sons, Chichester (2003)
Frankl, P., Hamlet, D., Littlewood, B., Strigini, L.: Evaluating Testing Methods by Delivered Reliability. IEEE Transactions on Software Engineering 24(8), 586–601 (1998)
Guo, P., Lyu, M.R.: Software Quality Prediction Using Mixture Models with EM Algorithm. In: Proc. First Asia-Pacific Conference on Quality Software, pp. 69–78 (2000)
Szabo, R.M., Khoshgoftaar, T.M.: An assessment of software quality in a C++ environment. In: Proc. of the Sixth International Symposium on Software Reliability Engineering, pp. 240–249 (1995)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New York (2001)
Pressman, R.S.: Software Engineering: A Practioner’s Approach, 5th edn. McGraw-Hill, New York (2001)
Takahashi, R., Muraoka, Y., Nakamura, Y.: Building Software Quality Classification Trees: Approach, Experimentation, Evaluation. In: Proc. of the 8th IEEE International Symposium on Software Reliability Engineering, pp. 222–233 (1997)
Selby, R.W.: Empirically based analysis of failures in software systems. IEEE Transactions on Reliability 39(4), 444–454 (1990)
Selby, R.W., Porter, A.A.: Learning From Examples: Generation and Evaluation of Decision Trees for Software Ressource Analysis. IEEE Transactions on Software Engineering 14(12), 1743–1757 (1988)
Kass, R.E., Raftery, A.E.: Bayes Factors. Journal of the American Statistical Association 90, 773–795 (1995)
Rissanen, J.: Modeling by Shortest Data Description. Automatica 14, 465–471 (1978)
Biyani, S., Santhanam, P.: Exploring Defect Data from Development and Customer Usage on Software Modules over Multiple Releases. In: Proc. of the 8th IEEE International Symposium on Software Reliability Engineering, pp. 316–320 (1998)
Conte, S.D.: Metrics and Models in Software Quality Engineering. Addison-Wesley Professional, Reading (1996)
Crawford, S.G., McIntosh, A.A., Pregibon, D.: An Analysis of Static Metrics and Faults in C Software. Journal of Systems and Software 15(1), 37–48 (1985)
Stockman, S.G., Todd, A.R., Robinson, G.A.: A Framework for Software Quality Measurement. IEEE Journal on Selected Areas in Communications 8(2), 224–233 (1990)
Henry, S., Wake, S.: Predicting maintainability with software quality metrics. Journal of Software Maintenance: Research and Practice 3(3), 129–143 (1991)
Pfleeger, S.L.: Lessons Learned in Building a Corporate Metrics Program. IEEE Software 10(3), 67–74 (1993)
Pfleeger, S.L., Fitzgerald, J.C., Rippy, D.A.: Using multiple metrics for analysis of improvement. Software Quality Journal 1(1), 27–36 (1992)
Chidamber, S.R., Kemerer, C.F.: A Metrics Suite for Object-Oriented Design. IEEE Transactions on Software Engineering 20(6), 476–493 (1994)
Gokhale, S.S., Lyu, M.R.: Regression Tree Modeling for the Prediction of Software Quality. In: Proc. of the third ISSAT International Conference on Reliability and Quality in Design, pp. 31–36 (1997)
Khoshgoftaar, T.M., Allen, E.B.: Early Quality Prediction: A Case Study in Telecommunications. IEEE Software 13(4), 65–71 (1996)
Khoshgoftaar, T.M., Lanning, D.L., Pandya, A.S.: A Comparative Study of Pattern Recognition Techniques for Quality Evaluation of Telecommunications Software. IEEE Journal on Selected Areas in Communications 12(2), 279–291 (1994)
Khoshgoftaar, T.M., Allen, E.B., Jones, W.D., Hudepohl, J.P.: Return on Investment of Software Quality Predictions. In: Proc. of the IEEE Workshop on Application-Specific Software Engineering Technology, pp. 145–150 (1998)
Khoshgoftaar, T.M., Geleyn, E., Nguyen, L.: Empirical Case Studies of Combining Software Quality Classification Models. In: Proc. of the Third International Conference on Quality Software, pp. 40–49 (2003)
Khoshgoftaar, T.M., Munson, J.C., Lanning, D.L.: A comparative Study of Predictive Models for Program Changes During System Testing and Maintenance. In: Proc. of the IEEE Conference on Software Maintenance, pp. 72–79 (1993)
Khoshgoftaar, T.M., Munson, J.C., Bhattacharya, B.B., Richardson, G.D.: Predictive Modeling Techniques of Software Quality from Software Measures. IEEE Transactions on Software Engineering 18(11), 979–987 (1992)
Dietterich, T.G.: Approximate Statistical Test For Comparing Supervised Classification Learning Algorithms. Neural Computation 10(7), 1895–1923 (1998)
McCabe, T.J.: A Complexity Measure. IEEE Transactions on Software Engineering SE-2(4), 308–320 (1976)
Khoshgoftaar, T.M., Allen, E.B.: Multivariate Assessment of Complex Software Systems: A comparative Study. In: Proc. of First International Conference on Engineering of Complex Computer Systems, pp. 389–396 (1995)
Khoshgoftaar, T.M., Allen, E.B.: The Impact of Costs of Misclassification on Software Quality Modeling. In: Proc. of Fourth International Software Metrics Symposium, pp. 54–62 (1997)
Khoshgoftaar, T.M., Allen, E.B.: Classification of Fault-Prone Software Modules: Prior Probabilities, Costs, and Model Evaluation. Empirical Software Engineering 3(3), 275–298 (1998)
Khoshgoftaar, T.M., Allen, E.B.: A Comparative Study of Ordering and Classification of Fault-Prone Software Modules. Empirical Software Engineering 4(2), 159–186 (1999)
Khoshgoftaar, T.M., Allen, E.B.: Predicting Fault-Prone Software Modules in Embedded Systems with Classification Trees. In: Proc. of High-Assurance Systems Engineering Workshop, pp. 105–112 (1999)
Khoshgoftaar, T.M., Allen, E.B.: Controlling Overfitting in Classification-Tree Models of Software Quality. Empirical Software Engineering 6(1), 59–79 (2001)
Khoshgoftaar, T.M., Allen, E.B.: Ordering Fault-Prone Software Modules. Software Quality Journal 11(1), 19–37 (2003)
Khoshgoftaar, T.M., Allen, E.B.: A Practical Classification-Rule for Software-Quality Models. IEEE Transactions on Reliability 49(2), 209–216 (2000)
Khoshgoftaar, T.M., Munson, J.C.: Predicting Software Development Errors Using Software Complexity Metrics. IEEE Journal on Selected Areas in Communications 8(2), 253–261 (1990)
Khoshgoftaar, T.M., Halstead, R.: Process Measures for Predicting Software Quality. In: Proc. of High-Assurance Systems Engineering Workshop, pp. 155–160 (1997)
Khoshgoftaar, T.M., Allen, E.B., Goel, N.: The Impact of Software Evolution and Reuse on Software Quality. Empirical Software Engineering 1(1), 31–44 (1996)
Khoshgoftaar, T.M., Allen, E.B., Hudepohl, J.P., Aud, S.J.: Applications of Neural Networks to Software Quality Modeling of a Very Large Telecommunications System. IEEE Transactions on Neural Networks 8(4), 902–909 (1997)
Khoshgoftaar, T.M., Allen, E.B., Jones, W.D., Hudepohl, J.P.: Which Software Modules have Faults which will be Discovered by Customers? Journal of Software Maintenance: Research and Practice 11, 1–18 (1999)
Khoshgoftaar, T.M., Allen, E.B., Jones, W.D., Hudepohl, J.P.: Classification-Tree Models of Software-Quality Over Multiple Release. IEEE Transactions on Reliability 49(1), 4–11 (2000)
Khoshgoftaar, T.M., Yuan, X., Allen, E.B.: Balancing Misclassification Rates in Classification-Tree Models of Software Quality. Empirical Software Engineering 5(4), 313–330 (2000)
Khoshgoftaar, T.M., Yuan, X., Allen, E.B., Jones, W.D., Hudepohl, J.P.: Uncertain Classification of Fault-Prone Software Modules. Empirical Software Engineering 7(1), 297–318 (2002)
Vapnik, V.N.: The Nature of Statistical Learning Theory, 2nd edn. Springer, Heidelberg (1999)
Basili, V.R., Hutchens, D.H.: An Empirical Study of a Syntactic Complexity Family. IEEE Transactions on Software Engineering SE-9(6), 664–672 (1983)
Basili, V.R., Briand, L.C., Melo, W.L.: A Validation of Object-Oriented Design Metrics as Quality Indicators. IEEE Transactions on Software Engineering 22(10), 751–761 (1996)
Rodriguez, V., Tsai, W.T.: Evaluation of Software Metrics Using Discriminant Analysis. Information and Software Technology 29(3), 245–251 (1987)
Shen, V.Y., Conte, S.D., Dunsmore, H.E.: Software Science Revisited: A Critical Analysis of the Theory and its Empirical Support. IEEE Transactions on Software Engineering SE-9(2), 155–165 (1983)
Shen, V.Y., Yu, T.-J., Thebaut, S.M., Paulsen, L.R.: Identifying Error-Prone Software- An Empirical Study. IEEE Transactions on Software Engineering 11(4), 317–324 (1985)
Li, W., Henry, S.: Object-Oriented Metrics that Predict Maintainability. Journal of Systems and Software 23(2), 111–122 (1993)
Evanco, W.M., Agresti, W.M.: A Composite Complexity Approach for Software Defect Modeling. Software Quality Journal 3(1), 27–44 (1994)
Dillon, W.R., Goldstein, M.: Multivariate Analysis. Wiley, New York (1984)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Wang, J.H., Bouguila, N., Bdiri, T. (2010). Empirical Evaluation of Selected Algorithms for Complexity-Based Classification of Software Modules and a New Model. In: Sgurev, V., Hadjiski, M., Kacprzyk, J. (eds) Intelligent Systems: From Theory to Practice. Studies in Computational Intelligence, vol 299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13428-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-13428-9_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13427-2
Online ISBN: 978-3-642-13428-9
eBook Packages: EngineeringEngineering (R0)