Effective software fault localization using predicted execution results

Abstract

Software has become ubiquitous in our daily lives, and with its increasing functionality and complexity comes a frequently tedious and prolonged debugging process. Of the three activities in program debugging (failure detection, fault localization, and bug fixing), the focus of this paper is on the first, failure detection, under the condition that there is no test oracle that can be used to automatically determine the success or failure of all the executions. More precisely, the outputs for many executions have to be verified manually, or the expected outputs are not even available. We want to determine whether there is a solution to help programmers predict the execution results. How good are these predicted results when they are used to help programmers find the locations of bugs? A framework is proposed to reduce the effort on output verification using a strategy based on the Hamming distance or K-Means clustering to predict results of test executions. Such data and the statement coverage of each test case are used to compute the suspiciousness of each statement according to a fault localization technique and produce a ranking for examination to locate bugs. Case studies using 22 programs and seven fault localization techniques were conducted to evaluate the fault localization effectiveness of the proposed framework on 1203 faulty versions, some of which have a single bug and others with multiple bugs. A discussion on factors that may affect the accuracy of execution result prediction and the resulting fault localization effectiveness is also presented. Our data suggests that, in general, with respect to fault localization techniques using execution results verified against the expected outputs, those using predicted execution results can be even more effective than (by examining a smaller number of statements to locate the first faulty statement) or as good as the former (the verified).

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Notes

  1. 1.

    In the paper, whenever appropriate “software,” “application,” and “system” are used interchangeably; “bug” and “fault” are also used interchangeably.

  2. 2.

    While this paper considers the localization of faults within program statements, the techniques described can be generalized to locate different types of faulty components such as blocks, functions, predicates, c-uses, and p-uses (Hogan and London 1991).

  3. 3.

    In this paper, “a statement is covered by a test case” is the same as “a statement is executed by a test case”.

  4. 4.

    A more detailed discussion can be found in Sect. 8: Threats to Validity.

  5. 5.

    A more detailed discussion can be found in Sect. 8: Threats to Validity.

  6. 6.

    A few faulty versions in the Siemens and Unix suites do not have 30 distinct failed tests. For each of them, the number of iterations is the same as the number its failed tests.

  7. 7.

    Due to space limitations, the average precision and recall with respect to multiple-bug versions of each program using HM- and KM-based techniques are not included in the paper. However, similar conclusions can be derived as those for single-bug versions.

Abbreviations

P :

A generic program

T :

A generic test set

N CF :

Number of failed test cases that cover the statement

N UF :

Number of failed test cases that do not cover the statement

N CS :

Number of successful test cases that cover the statement

N US :

Number of successful test cases that do not cover the statement

N C :

Total number of test cases that cover the statement

N U :

Total number of test cases that do not cover the statement

N S :

Total number of successful test cases

N F :

Total number of failed test cases

t f :

A failed test case

t i :

A test case in T

HM:

Hamming distance

KM:

K-Means clustering

\( {\mathcal{X}} \) :

A fault localization technique discussed in Sect. 2

\( {\mathcal{X}}{\text{-HM}} \) :

A fault localization technique with HM-based execution result prediction

\( {\mathcal{X}}{\text{-KM}} \) :

A fault localization technique with KM-based execution result prediction

References

  1. Abreu, R., Zoeteweij, P., Golsteijn, R., & Van Gemund, A. J. C. (2009). A practical evaluation of spectrum-based fault localization. Journal of Systems and Software, 82(11), 1780–1792.

    Article  Google Scholar 

  2. Afshan, S., McMinn, P., & Stevenson, M. (2013). Evolving readable string test inputs using a natural language model to reduce human oracle cost. In Proceedings of IEEE Sixth International Conference on Software Testing, Verification and Validation (ICST), Luxembourg (pp. 352–361).

  3. Agrawal, H., DeMillo, R. A., & Spafford, E. H. (1996). Debugging with dynamic slicing and backtracking. Software—Practice and Experience, 23(6), 589–616.

    Article  Google Scholar 

  4. Agrawal, H., Horgan, J. R., London, S., & Wong, W. E. (1995). Fault localization using execution slices and dataflow tests. In Proceedings of the 6th International Symposium on Software Reliability Engineering, Toulouse, France (pp. 143–15).

  5. Andrews, J. H., Briand, L. C., & Labiche, Y. (2005). Is mutation an appropriate tool for testing experiments? In Proceedings of the 27th International Conference on Software Engineering, St. Louis, Missouri, USA (pp. 402–411).

  6. Bookstein, A., Kulyukin, V. A., & Raita, T. (2002). Generalized Hamming distance. Information Retrieval, 5(4), 353–375.

    Article  Google Scholar 

  7. Cleve, H., & Zeller, A. (2005). Locating causes of program failures. In Proceedings of the 27th International Conference on Software Engineering, St. Louis, Missouri, USA (pp. 342–351).

  8. Do, H., & Rothermel, G. (2006). On the use of mutation faults in empirical assessments of test case prioritization techniques. IEEE Transactions on Software Engineering, 32(9), 733–752.

    Article  Google Scholar 

  9. Everitt, B. S. (1977). The analysis of contingency tables. London: Chapman & Hall.

    Book  MATH  Google Scholar 

  10. Freeman, D. (1987). Applied categorical data analysis. New York: Marcel Dekker.

    MATH  Google Scholar 

  11. Goodman, L. A. (1984). The analysis of cross-classification data having ordered categories. Cambridge: Harvard University Press.

    Google Scholar 

  12. Hamming, R. W. (1950). Error detecting and error correcting codes. Bell System Technical Journal, 29(2), 147–160.

    MathSciNet  Article  Google Scholar 

  13. Harman, M., Kim, S. G., Lakhotia, K., McMinn, P., & Yoo, S. (2010). Optimizing for the number of tests generated in search based test data generation with an application to the oracle cost problem. In Proceedings of the 3rd International Conference on Software Testing, Verification, and Validation Workshops (ICSTW), Paris, France (pp. 182–191).

  14. Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A K-Means clustering algorithm. Applied Statistics, 28(1), 100–108.

    Article  MATH  Google Scholar 

  15. Hierons, R. M. (2009). Verdict functions in testing with a fault domain or test hypotheses. ACM Transactions on Software Engineering and Methodology, 18(4), 14.

    Article  Google Scholar 

  16. Hierons, R. M. (2012). Oracles for distributed testing. IEEE Transactions on Software Engineering, 38(3), 629–641.

    Article  Google Scholar 

  17. Horgan, J. R., & London, S. A. (1991). Data flow coverage and the C language. In Proceedings of the 4th Symposium on Software Testing, Analysis, and Verification, Victoria, British Columbia, Canada (pp. 87–97).

  18. Jeffrey, D., Gupta, N., & Gupta, R. (2008). Fault localization using value replacement. In Proceedings of Internet Symposium of Software Testing and Analysis, Seattle, Washington, USA (pp. 167–178).

  19. Jeffrey, D., Gupta, N., & Gupta, R. (2009). Effective and efficient localization of multiple faults using value replacement. In Proceedings of International Conference on Software Maintenance, Edmonton, Canada (pp. 221–230).

  20. Jones, J. A., Bowring, J., & Harrold, M. J. (2007). Debugging in parallel. In Proceedings of the 2007 International Symposium on Software Testing and Analysis, London, UK (pp. 16–26).

  21. Jones, J. A., & Harrold, M. J. (2005). Empirical evaluation of the tarantula automatic fault-localization technique. In Proceedings of the 20th IEEE/ACM Conference on Automated Software Engineering, Long Beach, California, USA (pp. 273–282).

  22. Liu, C., Fei, L., Yan, X., Han, J., & Midkiff, S. P. (2006). Statistical debugging: A hypothesis testing-based approach. IEEE Transactions on Software Engineering, 32(10), 831–848.

    Article  Google Scholar 

  23. Lyle, J. R., & Weiser, M. (1987). Automatic program dug location by program slicing. In Proceedings of the 2nd International Conference on Computer and Applications, Beijing, China (pp. 877–883).

  24. Machado, P. D. L., & Andrade, W. L. (2007). The oracle problem for testing against quantified properties. In Proceedings of the 7th International Conference on Quality Software, Portland, Oregon, USA (pp. 415–418).

  25. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, pp. 281–297).

  26. McMinn, P., Stevenson, M., & Harman, M. (2010). Reducing qualitative human oracle costs associated with automatically generated test data. In Proceedings of the First International Workshop on Software Test Output Validation, Trento, Italy (pp. 1–4).

  27. Naish, L., Lee, H. J., & Ramamohanarao, K. (2011). A model for spectra-based software diagnosis. ACM Transactions on Software Engineering and Methodology, 20(3), 11:1–11:32.

    Article  Google Scholar 

  28. Namin, A. S., Andrews, J. H., & Labiche, Y. (2006). Using mutation analysis for assessing and comparing testing coverage criteria. IEEE Transactions on Software Engineering, 32(8), 608–624.

    Article  Google Scholar 

  29. Offutt, A. J., Lee, A., Rothermel, G., Untch, R. H., & Zapf, C. (1996). An experimental determination of sufficient mutant operators. ACM Transactions on Software Engineering and Methodology, 5(2), 99–118.

    Article  Google Scholar 

  30. Ott, R. L. (1993). An introduction to statistical methods and data analysis (4th ed.). North Scituate: Duxbury Press.

    Google Scholar 

  31. Renieris, M., & Reiss, S. P. (2003). Fault localization with nearest neighbor queries. In Proceedings of the 18th International Conference on Automated Software Engineering, Montreal, Canada (pp. 30–39).

  32. Santelices, R., Jones, J. A., Yu, Y., & Harrold, M. J. (2009). Lightweight fault-localization using multiple coverage types. In Proceedings of the 31st International Conference on Software Engineering, Vancouver, Canada (pp. 56–66).

  33. Shahamiri, S. R., Kadir, W. M. N. W., & Mohd-Hashim, S. Z. (2009). A comparative study on automated software test oracle methods. In Proceedings of the 4th International Conference on Software Engineering Advances, Porto, Portugal (pp. 140–145).

  34. The Software Infrastructure Repository. http://sir.unl.edu/portal/index.html.

  35. Wang, Y., Chen, Z., Feng, Y., Luo, B., & Yang, Y. (2012). Using weighted attributes to improve cluster test selection. In Proceedings of the 6th IEEE International Conference on Software Security and Reliability (SERE), Washington D.C. (pp. 138–146).

  36. Weiser, M. (1982). Programmers use slices when debugging. Communications of the ACM, 25(7), 446–452.

    Article  Google Scholar 

  37. Wong, W. E., Debroy, V., & Choi, B. (2010). A family of code coverage-based heuristics for effective fault localization. Journal of Systems and Software, 83(2), 188–208.

    Article  Google Scholar 

  38. Wong, W. E., Debroy, V., Gao, R., & Li, Y. (2014). The DStar method for effective software fault localization. IEEE Transactions on Reliability, 62(4), 290–308.

    Article  Google Scholar 

  39. Wong, W. E., Debroy, V., Golden, R., Xu, X., & Thuraisingham, B. (2012a). Effective software fault localization using an RBF neural network. IEEE Transactions on Reliability, 61(1), 149–169.

    Article  Google Scholar 

  40. Wong, W. E., Debroy, V., & Xu, D. (2012b). Towards better fault localization: A crosstab-based statistical approach. IEEE Transactions on Systems, Man, and Cybernetics—Part C, 42(3), 378–396.

    Article  Google Scholar 

  41. Wong, W. E., Horgan, J. R., London, S., & Mathur, A. P. (1998). Effect of test set minimization on fault detection effectiveness. Software—Practice and Experience, 28(4), 347–369.

    Article  Google Scholar 

  42. Wong, W. E., & Mathur, A. P. (1995a). Fault detection effectiveness of mutation and data flow testing. Software Quality Journal, 4(1), 69–83.

    Article  Google Scholar 

  43. Wong, W. E., & Mathur, A. P. (1995b). Reducing the cost of mutation testing: An empirical study. Journal of Systems and Software, 31(3), 185–196.

    Article  Google Scholar 

  44. Xie, X., Wong, W. E., Chen, T. Y., & Xu, B. (2013). Metamorphic slice: An application in spectrum-based fault localization. Information and Software Technology, 55(5), 866–879.

    Article  Google Scholar 

  45. Yan, S., Chen, Z., Zhao, Z., Zhang, C., & Zhou, Y. (2010). A dynamic test cluster sampling strategy by leveraging execution spectra information. In Proceedings of IEEE 3rd International Conference on Software Testing, Verification and Validation (ICST), Paris, France (pp. 147–154).

  46. Yu, Y., Jones, J. A., & Harrold, M. J. (2008). An empirical study on the effects of test-suite reduction on fault localization. In Proceedings of the International Conference on Software Engineering (ICSE), Leipzig, Germany (pp. 201–210).

  47. Zhang, X., Gupta, N., & Gupta, R. (2006). Locating faults through automated predicate switching. In Proceedings of the 28th International Conference on Software Engineering, Shanghai, China (pp. 272–281).

  48. Zhang, X., Gupta, N., & Gupta, R. (2007). A study of effectiveness of dynamic slicing in locating real faults. Empirical Software Engineering, 12(2), 143–160.

    Article  Google Scholar 

  49. Zhang, Z., Jiang, B., Chan, W. K., Tse, T. H., & Wang, X. (2010). Fault localization through evaluation sequences. Journal of System and Software, 83(2), 174–187.

    Article  Google Scholar 

  50. χSuds User’s Manual, Telcordia Technologies (1998).

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to W. Eric Wong.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gao, R., Wong, W.E., Chen, Z. et al. Effective software fault localization using predicted execution results. Software Qual J 25, 131–169 (2017). https://doi.org/10.1007/s11219-015-9295-1

Download citation

Keywords

  • Program debugging
  • Software fault localization
  • Output verification
  • Statement suspiciousness
  • EXAM score