Skip to main content

Regularities in Learning Defect Predictors

  • Conference paper
Product-Focused Software Process Improvement (PROFES 2010)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6156))

Abstract

Collecting large consistent data sets of real world software projects from a single source is problematic. In this study, we show that bug reports need not necessarily come from the local projects in order to learn defect prediction models. We demonstrate that using imported data from different sites can make it suitable for predicting defects at the local site. In addition to our previous work in commercial software, we now explore open source domain with two versions of an open source anti-virus software (Clam AV) and a subset of bugs in two versions of GNU gcc compiler, to mark the regularities in learning predictors for a different domain. Our conclusion is that there are surprisingly uniform assets of software that can be discovered with simple and repeated patterns in local or imported data using just a handful of examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Menzies, T., Elrawas, O., Barry, B., Madachy, R., Hihn, J., Baker, D., Lum, K.: Accurate estimates without calibration. In: International Conference on Software Process (2008)

    Google Scholar 

  2. The Standish Group Report: Chaos (1995)

    Google Scholar 

  3. Menzies, T., Port, D., Chen, Z., Hihn, J., Stukes, S.: Specialization and extrapolation of induced domain models: Case studies in software effort estimation. In: IEEE ASE 2005 (2005)

    Google Scholar 

  4. Menzies, T., Chen, Z., Hihn, J., Lum, K.: Selecting best practices for effort estimation. IEEE Transactions on Software Engineering (2006)

    Google Scholar 

  5. Fenton, N.E., Pfleeger, S.: Software Metrics: A Rigorous & Practical Approach. International Thompson Press (1997)

    Google Scholar 

  6. Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering (2007)

    Google Scholar 

  7. Turhan, B., Menzies, T., Bener, A.B., Di Stefano, J.: On the relative value of cross-company and within-company data for defect prediction. Empirical Softw. Engg. 14(5), 540–578 (2009)

    Article  Google Scholar 

  8. Bell, R., Ostrand, T., Weyuker, E.: Looking for bugs in all the right places. In: ISSTA 2006: Proceedings of the 2006 international symposium on Software testing and analysis (2006)

    Google Scholar 

  9. Ostrand, T., Weyuker, E., Bell, R.: Where the bugs are. ACM SIGSOFT Software Engineering Notes 29(4) (2004)

    Google Scholar 

  10. Ostrand, T., Weyuker, E.: The distribution of faults in a large industrial software system. In: ISSTA 2002: Proceedings of the 2002 ACM SIGSOFT international symposium on Software testing and analysis (2002)

    Google Scholar 

  11. Ostrand, T., Weyuker, E., Bell, R.: Automating algorithms for the identification of fault-prone files. In: ISSTA 2007: Proceedings of the 2007 international symposium on Software testing and analysis (2007)

    Google Scholar 

  12. Koru, A.G., Liu, H.: Identifying and characterizing change-prone classes in two large-scale open-source products. JSS (2007)

    Google Scholar 

  13. Menzies, T., Turhan, B., Bener, A., Gay, G., Cukic, B., Jiang, Y.: Implications of ceiling effects in defect predictors. In: Proceedings of PROMISE 2008 Workshop, ICSE (2008)

    Google Scholar 

  14. Veldhuizen, T.L.: Software libraries and their reuse: Entropy, kolmogorov complexity, and zipf’s law. arXiv cs.SE (2005)

    Google Scholar 

  15. Boehm, B.: Software Engineering Economics. Prentice-Hall, Englewood Cliffs (1981)

    MATH  Google Scholar 

  16. Jalali, O.: Evaluation bias in effort estimation. Master’s thesis, Lane Department of Computer Science and Electrical Engineering, West Virginia University (2007)

    Google Scholar 

  17. Zhang, H.: On the distribution of software faults. IEEE Transactions on Software Engineering 34(2), 301–302 (2008)

    Article  Google Scholar 

  18. Halstead, M.: Elements of Software Science. Elsevier, Amsterdam (1977)

    MATH  Google Scholar 

  19. McCabe, T.: A complexity measure. IEEE Transactions on Software Engineering 2(4), 308–320 (1976)

    Article  MathSciNet  Google Scholar 

  20. Fenton, N., Ohlsson, N.: Quantitative analysis of faults and failures in a complex software system. IEEE Transactions on Software Engineering, 797–814 (2000)

    Google Scholar 

  21. Shepperd, M., Ince, D.: A critique of three metrics. The Journal of Systems and Software 26(3), 197–210 (1994)

    Article  Google Scholar 

  22. Khoshgoftaar, T.M., Seliya, N.: Fault prediction modeling for software quality estimation: Comparing commonly used techniques. Empirical Software Engineering 8(3), 255–283 (2003)

    Article  Google Scholar 

  23. Tang, W., Khoshgoftaar, T.M.: Noise identification with the k-means algorithm. In: ICTAI, pp. 373–378 (2004)

    Google Scholar 

  24. Nagappan, N., Ball, T.: Static analysis tools as early indicators of pre-release defect density. In: ICSE 2005, St. Louis (2005)

    Google Scholar 

  25. Nikora, A., Munson, J.: Developing fault predictors for evolving software systems. In: Ninth International Software Metrics Symposium, METRICS 2003 (2003)

    Google Scholar 

  26. Porter, A., Selby, R.: Empirically guided software development using metric-based classification trees. IEEE Software, 46–54 (1990)

    Google Scholar 

  27. Srinivasan, K., Fisher, D.: Machine learning approaches to estimating software development effort. IEEE Trans. Soft. Eng., 126–137 (1995)

    Google Scholar 

  28. Tian, J., Zelkowitz, M.: Complexity measure evaluation and selection. IEEE Transaction on Software Engineering 21(8), 641–649 (1995)

    Article  Google Scholar 

  29. Rakitin, S.: Software Verification and Validation for Practitioners and Managers, 2nd edn. Artech House (2001)

    Google Scholar 

  30. Fagan, M.: Design and code inspections to reduce errors in program development. IBM Systems Journal 15(3) (1976)

    Google Scholar 

  31. Fagan, M.: Advances in software inspections. IEEE Trans. on Software Engineering, 744–751 (1986)

    Google Scholar 

  32. Shull, F., Rus, I., Basili, V.: How perspective-based reading can improve requirements inspections. IEEE Computer 33(7), 73–79 (2000)

    Google Scholar 

  33. Shull, F., Basili, V., Boehm, B., Brown, A., Costa, P., Lindvall, M., Port, D., Rus, I., Tesoriero, R., Zelkowitz, M.: What we have learned about fighting defects. In: Proceedings of 8th International Software Metrics Symposium, Ottawa, Canada, pp. 249–258 (2002)

    Google Scholar 

  34. Menzies, T., Raffo, D., Setamanit, S., Hu, Y., Tootoonian, S.: Model-based tests of truisms. In: Proceedings of IEEE ASE 2002 (2002)

    Google Scholar 

  35. Kitchenham, B.A., Mendes, E., Travassos, G.H.: Cross- vs. within-company cost estimation studies: A systematic review. IEEE Transactions on Software Engineering, 316–329 (2007)

    Google Scholar 

  36. Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Statist. 18(1), 50–60 (1947)

    Article  MATH  MathSciNet  Google Scholar 

  37. Easterbrook, S., Lutz, R.R., Covington, R., Kelly, J., Ampo, Y., Hamilton, D.: Experiences using lightweight formal methods for requirements modeling. IEEE Transactions on Software Engineering, 4–14 (1998)

    Google Scholar 

  38. Heimdahl, M., Leveson, N.: Completeness and consistency analysis of state-based requirements. IEEE Transactions on Software Engineering (1996)

    Google Scholar 

  39. Heitmeyer, C., Jeffords, R., Labaw, B.: Automated consistency checking of requirements specifications. ACM Transactions on Software Engineering and Methodology 5(3), 231–261 (1996)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Turhan, B., Bener, A., Menzies, T. (2010). Regularities in Learning Defect Predictors. In: Ali Babar, M., Vierimaa, M., Oivo, M. (eds) Product-Focused Software Process Improvement. PROFES 2010. Lecture Notes in Computer Science, vol 6156. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13792-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13792-1_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13791-4

  • Online ISBN: 978-3-642-13792-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics