Regularities in Learning Defect Predictors

Turhan, Burak; Bener, Ayse; Menzies, Tim

doi:10.1007/978-3-642-13792-1_11

Burak Turhan¹⁹,
Ayse Bener²⁰ &
Tim Menzies²¹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6156))

Included in the following conference series:

International Conference on Product Focused Software Process Improvement

1682 Accesses
5 Citations

Abstract

Collecting large consistent data sets of real world software projects from a single source is problematic. In this study, we show that bug reports need not necessarily come from the local projects in order to learn defect prediction models. We demonstrate that using imported data from different sites can make it suitable for predicting defects at the local site. In addition to our previous work in commercial software, we now explore open source domain with two versions of an open source anti-virus software (Clam AV) and a subset of bugs in two versions of GNU gcc compiler, to mark the regularities in learning predictors for a different domain. Our conclusion is that there are surprisingly uniform assets of software that can be discovered with simple and repeated patterns in local or imported data using just a handful of examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An Open-Source Software Metric Tool for Defect Prediction, Its Case Study and Lessons We Learned

A Public Bug Database of GitHub Projects and Its Application in Bug Prediction

Studying just-in-time defect prediction using cross-project models

Article 14 September 2015

References

Menzies, T., Elrawas, O., Barry, B., Madachy, R., Hihn, J., Baker, D., Lum, K.: Accurate estimates without calibration. In: International Conference on Software Process (2008)
Google Scholar
The Standish Group Report: Chaos (1995)
Google Scholar
Menzies, T., Port, D., Chen, Z., Hihn, J., Stukes, S.: Specialization and extrapolation of induced domain models: Case studies in software effort estimation. In: IEEE ASE 2005 (2005)
Google Scholar
Menzies, T., Chen, Z., Hihn, J., Lum, K.: Selecting best practices for effort estimation. IEEE Transactions on Software Engineering (2006)
Google Scholar
Fenton, N.E., Pfleeger, S.: Software Metrics: A Rigorous & Practical Approach. International Thompson Press (1997)
Google Scholar
Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering (2007)
Google Scholar
Turhan, B., Menzies, T., Bener, A.B., Di Stefano, J.: On the relative value of cross-company and within-company data for defect prediction. Empirical Softw. Engg. 14(5), 540–578 (2009)
Article Google Scholar
Bell, R., Ostrand, T., Weyuker, E.: Looking for bugs in all the right places. In: ISSTA 2006: Proceedings of the 2006 international symposium on Software testing and analysis (2006)
Google Scholar
Ostrand, T., Weyuker, E., Bell, R.: Where the bugs are. ACM SIGSOFT Software Engineering Notes 29(4) (2004)
Google Scholar
Ostrand, T., Weyuker, E.: The distribution of faults in a large industrial software system. In: ISSTA 2002: Proceedings of the 2002 ACM SIGSOFT international symposium on Software testing and analysis (2002)
Google Scholar
Ostrand, T., Weyuker, E., Bell, R.: Automating algorithms for the identification of fault-prone files. In: ISSTA 2007: Proceedings of the 2007 international symposium on Software testing and analysis (2007)
Google Scholar
Koru, A.G., Liu, H.: Identifying and characterizing change-prone classes in two large-scale open-source products. JSS (2007)
Google Scholar
Menzies, T., Turhan, B., Bener, A., Gay, G., Cukic, B., Jiang, Y.: Implications of ceiling effects in defect predictors. In: Proceedings of PROMISE 2008 Workshop, ICSE (2008)
Google Scholar
Veldhuizen, T.L.: Software libraries and their reuse: Entropy, kolmogorov complexity, and zipf’s law. arXiv cs.SE (2005)
Google Scholar
Boehm, B.: Software Engineering Economics. Prentice-Hall, Englewood Cliffs (1981)
MATH Google Scholar
Jalali, O.: Evaluation bias in effort estimation. Master’s thesis, Lane Department of Computer Science and Electrical Engineering, West Virginia University (2007)
Google Scholar
Zhang, H.: On the distribution of software faults. IEEE Transactions on Software Engineering 34(2), 301–302 (2008)
Article Google Scholar
Halstead, M.: Elements of Software Science. Elsevier, Amsterdam (1977)
MATH Google Scholar
McCabe, T.: A complexity measure. IEEE Transactions on Software Engineering 2(4), 308–320 (1976)
Article MathSciNet Google Scholar
Fenton, N., Ohlsson, N.: Quantitative analysis of faults and failures in a complex software system. IEEE Transactions on Software Engineering, 797–814 (2000)
Google Scholar
Shepperd, M., Ince, D.: A critique of three metrics. The Journal of Systems and Software 26(3), 197–210 (1994)
Article Google Scholar
Khoshgoftaar, T.M., Seliya, N.: Fault prediction modeling for software quality estimation: Comparing commonly used techniques. Empirical Software Engineering 8(3), 255–283 (2003)
Article Google Scholar
Tang, W., Khoshgoftaar, T.M.: Noise identification with the k-means algorithm. In: ICTAI, pp. 373–378 (2004)
Google Scholar
Nagappan, N., Ball, T.: Static analysis tools as early indicators of pre-release defect density. In: ICSE 2005, St. Louis (2005)
Google Scholar
Nikora, A., Munson, J.: Developing fault predictors for evolving software systems. In: Ninth International Software Metrics Symposium, METRICS 2003 (2003)
Google Scholar
Porter, A., Selby, R.: Empirically guided software development using metric-based classification trees. IEEE Software, 46–54 (1990)
Google Scholar
Srinivasan, K., Fisher, D.: Machine learning approaches to estimating software development effort. IEEE Trans. Soft. Eng., 126–137 (1995)
Google Scholar
Tian, J., Zelkowitz, M.: Complexity measure evaluation and selection. IEEE Transaction on Software Engineering 21(8), 641–649 (1995)
Article Google Scholar
Rakitin, S.: Software Verification and Validation for Practitioners and Managers, 2nd edn. Artech House (2001)
Google Scholar
Fagan, M.: Design and code inspections to reduce errors in program development. IBM Systems Journal 15(3) (1976)
Google Scholar
Fagan, M.: Advances in software inspections. IEEE Trans. on Software Engineering, 744–751 (1986)
Google Scholar
Shull, F., Rus, I., Basili, V.: How perspective-based reading can improve requirements inspections. IEEE Computer 33(7), 73–79 (2000)
Google Scholar
Shull, F., Basili, V., Boehm, B., Brown, A., Costa, P., Lindvall, M., Port, D., Rus, I., Tesoriero, R., Zelkowitz, M.: What we have learned about fighting defects. In: Proceedings of 8th International Software Metrics Symposium, Ottawa, Canada, pp. 249–258 (2002)
Google Scholar
Menzies, T., Raffo, D., Setamanit, S., Hu, Y., Tootoonian, S.: Model-based tests of truisms. In: Proceedings of IEEE ASE 2002 (2002)
Google Scholar
Kitchenham, B.A., Mendes, E., Travassos, G.H.: Cross- vs. within-company cost estimation studies: A systematic review. IEEE Transactions on Software Engineering, 316–329 (2007)
Google Scholar
Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Statist. 18(1), 50–60 (1947)
Article MATH MathSciNet Google Scholar
Easterbrook, S., Lutz, R.R., Covington, R., Kelly, J., Ampo, Y., Hamilton, D.: Experiences using lightweight formal methods for requirements modeling. IEEE Transactions on Software Engineering, 4–14 (1998)
Google Scholar
Heimdahl, M., Leveson, N.: Completeness and consistency analysis of state-based requirements. IEEE Transactions on Software Engineering (1996)
Google Scholar
Heitmeyer, C., Jeffords, R., Labaw, B.: Automated consistency checking of requirements specifications. ACM Transactions on Software Engineering and Methodology 5(3), 231–261 (1996)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Processing Science, University of Oulu, Oulu, 90014, Finland
Burak Turhan
Department of Computer Engineering, Boğaziçi University, Istanbul, 34342, Turkey
Ayse Bener
Lane Dept. of CS&EE, West Virginia University, Morgantown, WV, USA
Tim Menzies

Authors

Burak Turhan
View author publications
You can also search for this author in PubMed Google Scholar
Ayse Bener
View author publications
You can also search for this author in PubMed Google Scholar
Tim Menzies
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Software Development Group, IT University of Copenhagen, Rued Langgaards Vej 7, 2300, Copenhagen, Denmark
M. Ali Babar
VTT Technical Research Centre of Finland, Kaitoväylä 1, 90570, Oulu, Finland
Matias Vierimaa
Department of Information Processing Science, University of Oulu, P.O. Box 3000, 90014, Oulu, Finland
Markku Oivo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Turhan, B., Bener, A., Menzies, T. (2010). Regularities in Learning Defect Predictors. In: Ali Babar, M., Vierimaa, M., Oivo, M. (eds) Product-Focused Software Process Improvement. PROFES 2010. Lecture Notes in Computer Science, vol 6156. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13792-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-13792-1_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13791-4
Online ISBN: 978-3-642-13792-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Regularities in Learning Defect Predictors

Abstract

Access this chapter

Preview

Similar content being viewed by others

An Open-Source Software Metric Tool for Defect Prediction, Its Case Study and Lessons We Learned

A Public Bug Database of GitHub Projects and Its Application in Bug Prediction

Studying just-in-time defect prediction using cross-project models

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Regularities in Learning Defect Predictors

Abstract

Access this chapter

Preview

Similar content being viewed by others

An Open-Source Software Metric Tool for Defect Prediction, Its Case Study and Lessons We Learned

A Public Bug Database of GitHub Projects and Its Application in Bug Prediction

Studying just-in-time defect prediction using cross-project models

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation