Metric-driven classification analysis

Selby, Richard W.; Madsen, R. Kent

doi:10.1007/3540547428_54

Richard W. Selby¹ &
R. Kent Madsen¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 550))

Included in the following conference series:

European Software Engineering Conference

130 Accesses
1 Citations

Abstract

Metric-driven classification models identify software components with user-specifiable properties, such as those likely to be fault-prone, have high development effort, or have faults in a certain class. These models are generated automatically from past metric data, and they are scalable to large systems and calibratable to different projects. These models serve as extensible integration frameworks for software metrics because they allow the addition of new metrics and integrate symbolic and numeric data from all four measurement abstractions. In our past work, we developed and evaluated techniques for generating tree-based classification models. In this paper, we investigate a technique for generating network-based classification models. The principle underlying the tree-based models is partitioning, while the principle underlying the network-based models is pattern matching. Tree-based models prune away information and can be decomposed, while network-based models retain all information and tend to be more complex. We evaluate the predictive accuracy of network-based models and compare them to the tree-based models.

The evaluative study uses metric data from 16 NASA production systems ranging in size from 3000 to 112,000 source lines. The goal of the classification models is to identify the software components in the systems that had “high” development faults or effort, where “high” is defined to be in the uppermost quartile relative to past data. The models are derived from 74 candidate metrics that capture a multiplicity of information about the components: development effort, faults, changes, design style, and implementation style. A total of 1920 tree- and network-based models are automatically generated, and their predictive accuracies are compared in terms of correctness, completeness, and consistency using a non-parametric analysis of variance model. On the average, the predictions from the network-based models had 89.6% correctness, 69.1% completeness, and 79.5% consistency, while those from the tree-based models had 82.2% correctness, 56.3% completeness, and 74.5% consistency. The network-based models had statistically higher correctness and completeness than did the tree-based models, but they were not different statistically in terms of consistency. Capabilities to generate metric-driven classification models will be supported in the Amadeus measurement-driven analysis and feedback system.

This work was supported in part by the National Science Foundation under grant CCR-8704311 with cooperation from the Defense Advanced Research Projects Agency under Arpa order 6108, program code 7T10; National Aeronautics and Space Administration under grant NSG-5123; National Science Foundation under grant DCR-8521398; University of California under the MICRO program; Computer Sciences Corporation; Hughes Aircraft; and TRW.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

V. R. Basili and H. D. Rombach. The TAME project: Towards improvementoriented software environments. IEEE Transactions on Software Engineering, SE-14(6):758–773, June 1988.
Google Scholar
V. R. Basili and D. M. Weiss. A methodology for collecting valid software engineering data. IEEE Transactions on Software Engineering, SE-10(6):728–738, November 1984.
Google Scholar
V. R. Basili, M. V. Zelkowitz, F. E. McGarry, Jr. R. W. Reiter, W. F. Truszkowski, and D. L. Weiss. The software engineering laboratory. Technical Report SEL-77-001, Software Engineering Laboratory, NASA/Goddard Space Flight Center, Greenbelt, MD, May 1977.
Google Scholar
Victor R. Basili. Quantitative evaluation of software methodology. In Proceedings of the First Pan Pacific Computer Conference, Melbourne, Australia, September 1985.
Google Scholar
Victor R. Basili and Richard W. Selby. Calculation and use of an environment's characteristic software metric set. In Proceedings of the Eighth International Conference on Software Engineering, London, August 1985.
Google Scholar
Victor R. Basili, Richard W. Selby, and Tsai Y. Phillips. Metric analysis and data validation across Fortran projects. IEEE Transactions on Software Engineering, SE-9(6):652–663, November 1983.
Google Scholar
Barry Boehm. Industrial software metrics top 10 list. IEEE Software, 4(5):84–85, September 1987.
Google Scholar
Barry W. Boehm and Rony Ross. Theory-w software project management: Principles and examples. IEEE Transactions on Software Engineering, 15(7):902–916, July 1989.
Google Scholar
G. E. P. Box, W. G. Hunter, and J. S. Hunter. Statistics for Experimenters. John Wiley & Sons, New York, 1978.
Google Scholar
L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth, Monterey, CA, 1984.
Google Scholar
D. N. Card, F. E. McGarry, J. Page, S. Eslinger, and V. R. Basili. The software engineering laboratory. Technical Report SEL-81-104, Software Engineering Laboratory, NASA/Goddard Space Flight Center, Greenbelt, MD, February 1982.
Google Scholar
W. G. Cochran and G. M. Cox. Experimental Designs. John Wiley & Sons, New York, 1950.
Google Scholar
W. Decker and J. Valett. Software management environment (SME) concepts and architecture. Technical Report SEL-89-003, NASA Goddard, Greenbelt, Maryland, August 1989.
Google Scholar
W. J. Decker and W. A. Taylor. Fortran static source code analyzer program (sap) user's guide (revision 1). Technical Report SEL-78-102, Software Engineering Laboratory, NASA/Goddard Space Flight Center, Greenbelt, MD, May 1982.
Google Scholar
W. E. Hall and S. H. Zweben. The cloze procedure and software comprehensibility measurement. IEEE Transactions on Software Engineering, SE-12(5):608–623, May 1986.
Google Scholar
S. Henry and C. Selig. Predicting source-code complexity at the design stage. IEEE Software, 7(2):36–45, March 1990.
Google Scholar
W. S. Humphrey. Characterizing the software process: A maturity framework. IEEE Software, 5(2):73–79, March 1988.
Google Scholar
R. Kent Madsen and Richard W. Selby. Metric-driven classification models for analyzing large-scale software. Technical report, University of California, 1990. (submitted for publication).
Google Scholar
J. A. McCall, P. Richards, and G. Walters. Factors in software quality. Technical Report RADC-TR-77-369, Rome Air Development Center, Griffiss Air Force Base, NY, November 1977.
Google Scholar
F. McGarry. Annotated bibliography of software engineering laboratory (sel)literature. Technical Report SEL-82-006, Software Engineering Laboratory, NASA/Goddard Space Flight Center, Greenbelt, MD, November 1982.
Google Scholar
Adam A. Porter and Richard W. Selby. Empirically guided software development using metric-based classification trees. IEEE Software, 7(2):46–54, March 1990.
Google Scholar
Adam A. Porter and Richard W. Selby. Evaluating techniques for generating metric-based classification trees. Journal of Systems and Software, 12(3):209–218, July 1990.
Google Scholar
J. R. Quinlan. Induction of decision trees. Journal of Machine Learning, 1(1):81–106, 1986.
Google Scholar
D. Rombach. Design measurement: Some lessons learned. IEEE Software, 7(2):17–25, March 1990.
Google Scholar
D. Rumelhart and J. McClelland, editors. Parallel Distributed Processing. MIT Press, Cambridge, MA, 1986. Volume 1.
Google Scholar
Richard W. Selby. Empirically analyzing software reuse in a production environment. In W. Tracz, editor, Software Reuse — Emerging Technologies. IEEE Computer Society Press, New York, September 1988.
Google Scholar
Richard W. Selby and Adam A. Porter. Learning from examples: Generation and evaluation of decision trees for software resource analysis. IEEE Transactions on Software Engineering, SE-14(12):1743–1757, December 1988.
Google Scholar
Richard W. Selby and Adam A. Porter. Software metric classification trees help guide the maintenance of large-scale systems. In Proceedings of the Conference on Software Maintenance, pages 116–123, Miami, FL, October 1989.
Google Scholar
Richard W. Selby, Adam A. Porter, Doug C. Schmidt, and James Berney. Metric-driven analysis and feedback systems for enabling empirically guided software development. In Proceedings of the Thirteenth International Conference on Software Engineering, Austin, TX, May 1991.
Google Scholar
Richard N. Taylor, Frank C. Belz, Lori A. Clarke, Leon Osterweil, Richard W. Selby, Jack C. Wileden, Alexander L. Wolf, and Michal Young. Foundations for the Arcadia environment architecture. In Proceedings of ACM SIGSOFT '88: Third Symposium on Software Development Environments, pages 1–13, Boston, November 1988. Appeared as Sigplan Notices 24(2) and Software Engineering Notes 13(5).
Google Scholar
Koji Torii, Tohru Kikuno, Ken ichi Matsumoto, and Shinji Kusumoto. A data collection and analysis system Ginger to improve programmer productivity on software development. Technical report, Osaka University, Osaka, Japan, 1989.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information and Computer Science, University of California, 92717, Irvine, California
Richard W. Selby & R. Kent Madsen

Authors

Richard W. Selby
View author publications
You can also search for this author in PubMed Google Scholar
R. Kent Madsen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Axel van Lamsweerde Alfonso Fugetta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Selby, R.W., Madsen, R.K. (1991). Metric-driven classification analysis. In: van Lamsweerde, A., Fugetta, A. (eds) ESEC '91. ESEC 1991. Lecture Notes in Computer Science, vol 550. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3540547428_54

Download citation

DOI: https://doi.org/10.1007/3540547428_54
Published: 02 July 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-54742-6
Online ISBN: 978-3-540-46446-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics