Skip to main content

Empirical Evaluation of Hunk Metrics as Bug Predictors

  • Conference paper
Software Process and Product Measurement (IWSM 2009)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 5891))

Included in the following conference series:

Abstract

Reducing the number of bugs is a crucial issue during software development and maintenance. Software process and product metrics are good indicators of software complexity. These metrics have been used to build bug predictor models to help developers maintain the quality of software. In this paper we empirically evaluate the use of hunk metrics as predictor of bugs. We present a technique for bug prediction that works at smallest units of code change called hunks. We build bug prediction models using random forests, which is an efficient machine learning classifier. Hunk metrics are used to train the classifier and each hunk metric is evaluated for its bug prediction capabilities. Our classifier can classify individual hunks as buggy or bug-free with 86 % accuracy, 83 % buggy hunk precision and 77% buggy hunk recall. We find that history based and change level hunk metrics are better predictors of bugs than code level hunk metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brun, Y., Ernst, M.D.: Finding Latent Code Errors via Machine Learning over Program Executions. In: Proc. of 26th International Conference on Software Engineering (ICSE 2004), Scotland, UK, pp. 480–490 (2004)

    Google Scholar 

  2. Fenton, N., Neil, M.: A Critique of Software Defect Prediction Models. IEEE Transactions on Software Engineering 25, 675–689 (1999)

    Article  Google Scholar 

  3. Ferzund, J., Ahsan, S.N., Wotawa, F.: Automated Classification of Faults in Programms using Machine Learning Techniques. In: AISEW, European Conference on Artificial Intelligence, Patras, Greece (July 2008)

    Google Scholar 

  4. Ferzund, J., Ahsan, S.N., Wotawa, F.: Analysing Bug Prediction Capabilities of Static Code Metrics in Open Source Software. In: Dumke, R.R., Braungarten, R., Büren, G., Abran, A., Cuadrado-Gallego, J.J. (eds.) IWSM 2008. LNCS, vol. 5338, pp. 331–343. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  5. Fischer, M., Pinzger, M., Gall, H.: Populating a release history database from version control and bug tracking systems. In: Proc. 19th Int’l Conference on Software Maintenance (ICSM 2003), Amsterdam, The Netherlands, pp. 23–32 (2003)

    Google Scholar 

  6. Graves, T.L., Karr, A.F., Marron, J.S., Siy, H.: Predicting fault incidence using software change history. IEEE Transactions on Software Engineering 26, 653–661 (2000)

    Article  Google Scholar 

  7. Guilford, J.P., Fruchter, B.: Fundamental Statistics in Psychology and Education, 5th edn. McGraw-Hill, New York (1973)

    MATH  Google Scholar 

  8. Gyimothy, T., Ferenc, R., Siket, I.: Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction. IEEE Trans. Software Eng. 31(10), 897–910 (2005)

    Article  Google Scholar 

  9. Hassan, A.E., Holt, R.C.: The Top Ten List: Dynamic Fault Prediction. In: Proc. 21st Int’l Conf. Software Maintenance, pp. 263–272 (2005)

    Google Scholar 

  10. Khoshgoftaar, T.M., Bhattacharyya, B.B., Richardson, G.D.: Predicting Software Errors, During Development, Using Nonlinear Regression Models: A Comparative Study. IEEE Transactions on Reliability 41, 390–395 (1992)

    Article  MATH  Google Scholar 

  11. Kim, S., Whitehead Jr., E.J., Zhang, Y.: Classifying Software Changes: Clean or Buggy? IEEE Trans. Software Eng. 34(2), 181–196 (2008)

    Article  Google Scholar 

  12. Kim, S., Pan, K., Whitehead Jr., E.J.: Memories of Bug Fixes. In: Proc. 14th ACM Symp. Foundations of Software Eng., pp. 35–45 (2006)

    Google Scholar 

  13. Kleinbaum, D.G., Klein, M.: Logistic Regression- A Self-Learning Text, 2nd edn. Springer, New York (2002)

    MATH  Google Scholar 

  14. Koru, A.G., Liu, H.: Building effective defect-prediction models in practice. IEEE Software 22, 23–29 (2005)

    Article  Google Scholar 

  15. Lanubile, F., Visaggio, G.: Evaluating Predictive Quality Models Derived from Software Measures: Lessons Learned. Journal of Systems and Software 38, 225–234 (1997)

    Article  Google Scholar 

  16. Menzies, T., Greenwald, J., Frank, A.: Data Mining Static Code Attributes to Learn Defect Predictors. IEEE Trans. Software Eng. 33(1), 2–13 (2007)

    Article  Google Scholar 

  17. Mockus, A., Votta, L.G.: Identifying Reasons for Software Changes Using Historic Databases. In: Proc. of 16th International Conference on Software Maintenance (ICSM 2000), San Jose, California, USA, pp. 120–130 (2000)

    Google Scholar 

  18. Mockus, A., Weiss, D.M.: Predicting Risk of Software Changes. Bell Labs Technical J. 5(2), 169–180 (2002)

    Article  Google Scholar 

  19. Nagappan, N., Ball, T., Zeller, A.: Mining Metrics to Predict Component Failures. In: Proc. of 28th International Conference on Software Engineering, Shanghai, China (May 2006)

    Google Scholar 

  20. Neumann, D.E.: An Enhanced Neural Network Technique for Software Risk Analysis. IEEE Tran. Software Eng. (September 2002)

    Google Scholar 

  21. Ostrand, T.J., Weyuker, E.J., Bell, R.M.: Predicting the Location and Number of Faults in Large Software Systems. IEEE Trans. Software Eng. 31(4), 340–355 (2005)

    Article  Google Scholar 

  22. Pan, K., Kim, S., Whitehead Jr., E.J.: Bug Classification Using Program Slicing Metrics. In: Proc. Sixth IEEE Int’l Workshop Source Code Analysis and Manipulation (2006)

    Google Scholar 

  23. Porter, A., Selby, R.: Empirically-guided software development using metric-based classification trees. IEEE Software 7, 46–54 (1990)

    Article  Google Scholar 

  24. Ratzinger, J., Pinzger, M., Gall, H.: EQ-Mine: Predicting Short-Term Defects for Software Evolution. In: Dwyer, M.B., Lopes, A. (eds.) FASE 2007. LNCS, vol. 4422, pp. 12–26. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  25. Shepperd, M., Kadoda, G.: Comparing software prediction techniques using simulation. IEEE Trans. Software Eng. 27, 1014–1022 (2001)

    Article  Google Scholar 

  26. Sliwerski, J., Zimmermann, T., Zeller, A.: When Do Changes Induce Fixes? In: Proc. of Int’l Workshop on Mining Software Repositories (MSR 2005), Saint Louis, Missouri, USA, pp. 24–28 (2005)

    Google Scholar 

  27. Williams, C.C., Hollingsworth, J.K.: Automatic Mining of Source Code Repositories to Improve Bug Finding Techniques. IEEE Trans. on Software Engineering 31(6), 466–480 (2005)

    Article  Google Scholar 

  28. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ferzund, J., Ahsan, S.N., Wotawa, F. (2009). Empirical Evaluation of Hunk Metrics as Bug Predictors. In: Abran, A., Braungarten, R., Dumke, R.R., Cuadrado-Gallego, J.J., Brunekreef, J. (eds) Software Process and Product Measurement. IWSM 2009. Lecture Notes in Computer Science, vol 5891. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05415-0_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-05415-0_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-05414-3

  • Online ISBN: 978-3-642-05415-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics