Skip to main content

Metric-driven classification analysis

  • Conference paper
  • First Online:
ESEC '91 (ESEC 1991)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 550))

Included in the following conference series:

Abstract

Metric-driven classification models identify software components with user-specifiable properties, such as those likely to be fault-prone, have high development effort, or have faults in a certain class. These models are generated automatically from past metric data, and they are scalable to large systems and calibratable to different projects. These models serve as extensible integration frameworks for software metrics because they allow the addition of new metrics and integrate symbolic and numeric data from all four measurement abstractions. In our past work, we developed and evaluated techniques for generating tree-based classification models. In this paper, we investigate a technique for generating network-based classification models. The principle underlying the tree-based models is partitioning, while the principle underlying the network-based models is pattern matching. Tree-based models prune away information and can be decomposed, while network-based models retain all information and tend to be more complex. We evaluate the predictive accuracy of network-based models and compare them to the tree-based models.

The evaluative study uses metric data from 16 NASA production systems ranging in size from 3000 to 112,000 source lines. The goal of the classification models is to identify the software components in the systems that had “high” development faults or effort, where “high” is defined to be in the uppermost quartile relative to past data. The models are derived from 74 candidate metrics that capture a multiplicity of information about the components: development effort, faults, changes, design style, and implementation style. A total of 1920 tree- and network-based models are automatically generated, and their predictive accuracies are compared in terms of correctness, completeness, and consistency using a non-parametric analysis of variance model. On the average, the predictions from the network-based models had 89.6% correctness, 69.1% completeness, and 79.5% consistency, while those from the tree-based models had 82.2% correctness, 56.3% completeness, and 74.5% consistency. The network-based models had statistically higher correctness and completeness than did the tree-based models, but they were not different statistically in terms of consistency. Capabilities to generate metric-driven classification models will be supported in the Amadeus measurement-driven analysis and feedback system.

This work was supported in part by the National Science Foundation under grant CCR-8704311 with cooperation from the Defense Advanced Research Projects Agency under Arpa order 6108, program code 7T10; National Aeronautics and Space Administration under grant NSG-5123; National Science Foundation under grant DCR-8521398; University of California under the MICRO program; Computer Sciences Corporation; Hughes Aircraft; and TRW.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. V. R. Basili and H. D. Rombach. The TAME project: Towards improvementoriented software environments. IEEE Transactions on Software Engineering, SE-14(6):758–773, June 1988.

    Google Scholar 

  2. V. R. Basili and D. M. Weiss. A methodology for collecting valid software engineering data. IEEE Transactions on Software Engineering, SE-10(6):728–738, November 1984.

    Google Scholar 

  3. V. R. Basili, M. V. Zelkowitz, F. E. McGarry, Jr. R. W. Reiter, W. F. Truszkowski, and D. L. Weiss. The software engineering laboratory. Technical Report SEL-77-001, Software Engineering Laboratory, NASA/Goddard Space Flight Center, Greenbelt, MD, May 1977.

    Google Scholar 

  4. Victor R. Basili. Quantitative evaluation of software methodology. In Proceedings of the First Pan Pacific Computer Conference, Melbourne, Australia, September 1985.

    Google Scholar 

  5. Victor R. Basili and Richard W. Selby. Calculation and use of an environment's characteristic software metric set. In Proceedings of the Eighth International Conference on Software Engineering, London, August 1985.

    Google Scholar 

  6. Victor R. Basili, Richard W. Selby, and Tsai Y. Phillips. Metric analysis and data validation across Fortran projects. IEEE Transactions on Software Engineering, SE-9(6):652–663, November 1983.

    Google Scholar 

  7. Barry Boehm. Industrial software metrics top 10 list. IEEE Software, 4(5):84–85, September 1987.

    Google Scholar 

  8. Barry W. Boehm and Rony Ross. Theory-w software project management: Principles and examples. IEEE Transactions on Software Engineering, 15(7):902–916, July 1989.

    Google Scholar 

  9. G. E. P. Box, W. G. Hunter, and J. S. Hunter. Statistics for Experimenters. John Wiley & Sons, New York, 1978.

    Google Scholar 

  10. L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth, Monterey, CA, 1984.

    Google Scholar 

  11. D. N. Card, F. E. McGarry, J. Page, S. Eslinger, and V. R. Basili. The software engineering laboratory. Technical Report SEL-81-104, Software Engineering Laboratory, NASA/Goddard Space Flight Center, Greenbelt, MD, February 1982.

    Google Scholar 

  12. W. G. Cochran and G. M. Cox. Experimental Designs. John Wiley & Sons, New York, 1950.

    Google Scholar 

  13. W. Decker and J. Valett. Software management environment (SME) concepts and architecture. Technical Report SEL-89-003, NASA Goddard, Greenbelt, Maryland, August 1989.

    Google Scholar 

  14. W. J. Decker and W. A. Taylor. Fortran static source code analyzer program (sap) user's guide (revision 1). Technical Report SEL-78-102, Software Engineering Laboratory, NASA/Goddard Space Flight Center, Greenbelt, MD, May 1982.

    Google Scholar 

  15. W. E. Hall and S. H. Zweben. The cloze procedure and software comprehensibility measurement. IEEE Transactions on Software Engineering, SE-12(5):608–623, May 1986.

    Google Scholar 

  16. S. Henry and C. Selig. Predicting source-code complexity at the design stage. IEEE Software, 7(2):36–45, March 1990.

    Google Scholar 

  17. W. S. Humphrey. Characterizing the software process: A maturity framework. IEEE Software, 5(2):73–79, March 1988.

    Google Scholar 

  18. R. Kent Madsen and Richard W. Selby. Metric-driven classification models for analyzing large-scale software. Technical report, University of California, 1990. (submitted for publication).

    Google Scholar 

  19. J. A. McCall, P. Richards, and G. Walters. Factors in software quality. Technical Report RADC-TR-77-369, Rome Air Development Center, Griffiss Air Force Base, NY, November 1977.

    Google Scholar 

  20. F. McGarry. Annotated bibliography of software engineering laboratory (sel)literature. Technical Report SEL-82-006, Software Engineering Laboratory, NASA/Goddard Space Flight Center, Greenbelt, MD, November 1982.

    Google Scholar 

  21. Adam A. Porter and Richard W. Selby. Empirically guided software development using metric-based classification trees. IEEE Software, 7(2):46–54, March 1990.

    Google Scholar 

  22. Adam A. Porter and Richard W. Selby. Evaluating techniques for generating metric-based classification trees. Journal of Systems and Software, 12(3):209–218, July 1990.

    Google Scholar 

  23. J. R. Quinlan. Induction of decision trees. Journal of Machine Learning, 1(1):81–106, 1986.

    Google Scholar 

  24. D. Rombach. Design measurement: Some lessons learned. IEEE Software, 7(2):17–25, March 1990.

    Google Scholar 

  25. D. Rumelhart and J. McClelland, editors. Parallel Distributed Processing. MIT Press, Cambridge, MA, 1986. Volume 1.

    Google Scholar 

  26. Richard W. Selby. Empirically analyzing software reuse in a production environment. In W. Tracz, editor, Software Reuse — Emerging Technologies. IEEE Computer Society Press, New York, September 1988.

    Google Scholar 

  27. Richard W. Selby and Adam A. Porter. Learning from examples: Generation and evaluation of decision trees for software resource analysis. IEEE Transactions on Software Engineering, SE-14(12):1743–1757, December 1988.

    Google Scholar 

  28. Richard W. Selby and Adam A. Porter. Software metric classification trees help guide the maintenance of large-scale systems. In Proceedings of the Conference on Software Maintenance, pages 116–123, Miami, FL, October 1989.

    Google Scholar 

  29. Richard W. Selby, Adam A. Porter, Doug C. Schmidt, and James Berney. Metric-driven analysis and feedback systems for enabling empirically guided software development. In Proceedings of the Thirteenth International Conference on Software Engineering, Austin, TX, May 1991.

    Google Scholar 

  30. Richard N. Taylor, Frank C. Belz, Lori A. Clarke, Leon Osterweil, Richard W. Selby, Jack C. Wileden, Alexander L. Wolf, and Michal Young. Foundations for the Arcadia environment architecture. In Proceedings of ACM SIGSOFT '88: Third Symposium on Software Development Environments, pages 1–13, Boston, November 1988. Appeared as Sigplan Notices 24(2) and Software Engineering Notes 13(5).

    Google Scholar 

  31. Koji Torii, Tohru Kikuno, Ken ichi Matsumoto, and Shinji Kusumoto. A data collection and analysis system Ginger to improve programmer productivity on software development. Technical report, Osaka University, Osaka, Japan, 1989.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Axel van Lamsweerde Alfonso Fugetta

Rights and permissions

Reprints and permissions

Copyright information

© 1991 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Selby, R.W., Madsen, R.K. (1991). Metric-driven classification analysis. In: van Lamsweerde, A., Fugetta, A. (eds) ESEC '91. ESEC 1991. Lecture Notes in Computer Science, vol 550. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3540547428_54

Download citation

  • DOI: https://doi.org/10.1007/3540547428_54

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-54742-6

  • Online ISBN: 978-3-540-46446-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics