Abstract
Metric-driven classification models identify software components with user-specifiable properties, such as those likely to be fault-prone, have high development effort, or have faults in a certain class. These models are generated automatically from past metric data, and they are scalable to large systems and calibratable to different projects. These models serve as extensible integration frameworks for software metrics because they allow the addition of new metrics and integrate symbolic and numeric data from all four measurement abstractions. In our past work, we developed and evaluated techniques for generating tree-based classification models. In this paper, we investigate a technique for generating network-based classification models. The principle underlying the tree-based models is partitioning, while the principle underlying the network-based models is pattern matching. Tree-based models prune away information and can be decomposed, while network-based models retain all information and tend to be more complex. We evaluate the predictive accuracy of network-based models and compare them to the tree-based models.
The evaluative study uses metric data from 16 NASA production systems ranging in size from 3000 to 112,000 source lines. The goal of the classification models is to identify the software components in the systems that had “high” development faults or effort, where “high” is defined to be in the uppermost quartile relative to past data. The models are derived from 74 candidate metrics that capture a multiplicity of information about the components: development effort, faults, changes, design style, and implementation style. A total of 1920 tree- and network-based models are automatically generated, and their predictive accuracies are compared in terms of correctness, completeness, and consistency using a non-parametric analysis of variance model. On the average, the predictions from the network-based models had 89.6% correctness, 69.1% completeness, and 79.5% consistency, while those from the tree-based models had 82.2% correctness, 56.3% completeness, and 74.5% consistency. The network-based models had statistically higher correctness and completeness than did the tree-based models, but they were not different statistically in terms of consistency. Capabilities to generate metric-driven classification models will be supported in the Amadeus measurement-driven analysis and feedback system.
This work was supported in part by the National Science Foundation under grant CCR-8704311 with cooperation from the Defense Advanced Research Projects Agency under Arpa order 6108, program code 7T10; National Aeronautics and Space Administration under grant NSG-5123; National Science Foundation under grant DCR-8521398; University of California under the MICRO program; Computer Sciences Corporation; Hughes Aircraft; and TRW.
Preview
Unable to display preview. Download preview PDF.
References
V. R. Basili and H. D. Rombach. The TAME project: Towards improvementoriented software environments. IEEE Transactions on Software Engineering, SE-14(6):758–773, June 1988.
V. R. Basili and D. M. Weiss. A methodology for collecting valid software engineering data. IEEE Transactions on Software Engineering, SE-10(6):728–738, November 1984.
V. R. Basili, M. V. Zelkowitz, F. E. McGarry, Jr. R. W. Reiter, W. F. Truszkowski, and D. L. Weiss. The software engineering laboratory. Technical Report SEL-77-001, Software Engineering Laboratory, NASA/Goddard Space Flight Center, Greenbelt, MD, May 1977.
Victor R. Basili. Quantitative evaluation of software methodology. In Proceedings of the First Pan Pacific Computer Conference, Melbourne, Australia, September 1985.
Victor R. Basili and Richard W. Selby. Calculation and use of an environment's characteristic software metric set. In Proceedings of the Eighth International Conference on Software Engineering, London, August 1985.
Victor R. Basili, Richard W. Selby, and Tsai Y. Phillips. Metric analysis and data validation across Fortran projects. IEEE Transactions on Software Engineering, SE-9(6):652–663, November 1983.
Barry Boehm. Industrial software metrics top 10 list. IEEE Software, 4(5):84–85, September 1987.
Barry W. Boehm and Rony Ross. Theory-w software project management: Principles and examples. IEEE Transactions on Software Engineering, 15(7):902–916, July 1989.
G. E. P. Box, W. G. Hunter, and J. S. Hunter. Statistics for Experimenters. John Wiley & Sons, New York, 1978.
L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth, Monterey, CA, 1984.
D. N. Card, F. E. McGarry, J. Page, S. Eslinger, and V. R. Basili. The software engineering laboratory. Technical Report SEL-81-104, Software Engineering Laboratory, NASA/Goddard Space Flight Center, Greenbelt, MD, February 1982.
W. G. Cochran and G. M. Cox. Experimental Designs. John Wiley & Sons, New York, 1950.
W. Decker and J. Valett. Software management environment (SME) concepts and architecture. Technical Report SEL-89-003, NASA Goddard, Greenbelt, Maryland, August 1989.
W. J. Decker and W. A. Taylor. Fortran static source code analyzer program (sap) user's guide (revision 1). Technical Report SEL-78-102, Software Engineering Laboratory, NASA/Goddard Space Flight Center, Greenbelt, MD, May 1982.
W. E. Hall and S. H. Zweben. The cloze procedure and software comprehensibility measurement. IEEE Transactions on Software Engineering, SE-12(5):608–623, May 1986.
S. Henry and C. Selig. Predicting source-code complexity at the design stage. IEEE Software, 7(2):36–45, March 1990.
W. S. Humphrey. Characterizing the software process: A maturity framework. IEEE Software, 5(2):73–79, March 1988.
R. Kent Madsen and Richard W. Selby. Metric-driven classification models for analyzing large-scale software. Technical report, University of California, 1990. (submitted for publication).
J. A. McCall, P. Richards, and G. Walters. Factors in software quality. Technical Report RADC-TR-77-369, Rome Air Development Center, Griffiss Air Force Base, NY, November 1977.
F. McGarry. Annotated bibliography of software engineering laboratory (sel)literature. Technical Report SEL-82-006, Software Engineering Laboratory, NASA/Goddard Space Flight Center, Greenbelt, MD, November 1982.
Adam A. Porter and Richard W. Selby. Empirically guided software development using metric-based classification trees. IEEE Software, 7(2):46–54, March 1990.
Adam A. Porter and Richard W. Selby. Evaluating techniques for generating metric-based classification trees. Journal of Systems and Software, 12(3):209–218, July 1990.
J. R. Quinlan. Induction of decision trees. Journal of Machine Learning, 1(1):81–106, 1986.
D. Rombach. Design measurement: Some lessons learned. IEEE Software, 7(2):17–25, March 1990.
D. Rumelhart and J. McClelland, editors. Parallel Distributed Processing. MIT Press, Cambridge, MA, 1986. Volume 1.
Richard W. Selby. Empirically analyzing software reuse in a production environment. In W. Tracz, editor, Software Reuse — Emerging Technologies. IEEE Computer Society Press, New York, September 1988.
Richard W. Selby and Adam A. Porter. Learning from examples: Generation and evaluation of decision trees for software resource analysis. IEEE Transactions on Software Engineering, SE-14(12):1743–1757, December 1988.
Richard W. Selby and Adam A. Porter. Software metric classification trees help guide the maintenance of large-scale systems. In Proceedings of the Conference on Software Maintenance, pages 116–123, Miami, FL, October 1989.
Richard W. Selby, Adam A. Porter, Doug C. Schmidt, and James Berney. Metric-driven analysis and feedback systems for enabling empirically guided software development. In Proceedings of the Thirteenth International Conference on Software Engineering, Austin, TX, May 1991.
Richard N. Taylor, Frank C. Belz, Lori A. Clarke, Leon Osterweil, Richard W. Selby, Jack C. Wileden, Alexander L. Wolf, and Michal Young. Foundations for the Arcadia environment architecture. In Proceedings of ACM SIGSOFT '88: Third Symposium on Software Development Environments, pages 1–13, Boston, November 1988. Appeared as Sigplan Notices 24(2) and Software Engineering Notes 13(5).
Koji Torii, Tohru Kikuno, Ken ichi Matsumoto, and Shinji Kusumoto. A data collection and analysis system Ginger to improve programmer productivity on software development. Technical report, Osaka University, Osaka, Japan, 1989.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1991 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Selby, R.W., Madsen, R.K. (1991). Metric-driven classification analysis. In: van Lamsweerde, A., Fugetta, A. (eds) ESEC '91. ESEC 1991. Lecture Notes in Computer Science, vol 550. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3540547428_54
Download citation
DOI: https://doi.org/10.1007/3540547428_54
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-54742-6
Online ISBN: 978-3-540-46446-4
eBook Packages: Springer Book Archive