Abstract
This study examines the impact of noise on the evaluation of software quality imputation techniques. The imputation procedures evaluated in this work include Bayesian multiple imputation, mean imputation, nearest neighbor imputation, regression imputation, and REPTree (decision tree) imputation. These techniques were used to impute missing software measurement data for a large military command, control, and communications system dataset (CCCS). A randomized three-way complete block design analysis of variance model using the average absolute error as the response variable was built to analyze the imputation results. Multiple pairwise comparisons using Fisher and Tukey-Kramer tests were conducted to demonstrate the performance differences amongst the significant experimental factors. The underlying quality of data was a significant factor affecting the accuracy of the imputation techniques. Bayesian multiple imputation and regression imputation were top performers, while mean imputation was ineffective.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
T. Khoshgoftaar, E. Allen, W. Jones, and J. Hudepohl. Accuracy of software quality models over multiple releases. Annals of Software Engineering, 9(1-4):103–116, 2000.
T. Khoshgoftaar and N. Seliya. Fault Prediction Modeling for Software Quality Estimation: Comparing Commonly Used Techniques. Empirical Software Engineering Journal, 8:255–283, September 2003.
N. Schneidewind. Software Metrics Validation: Space Shuttle Flight Software Example. Annals of Software Engineering, 1:287–309, 1995.
M. Ohlsson and P. Runeson. Experience from Replicating Empirical Studies on Prediction Models. In Proceedings of the 8th International Symposium on Software Metrics, pages 217–226, 2002.
S. Gokhale and M. Lyu. Regression Tree Modeling for the Prediction of Software Quality. In H. Pham, editor, Proceedings: 3rd International Conference on Reliability and Quality in Design, pages 31–36, 1997.
T. Khoshgoftaar, A. Folleco, J. Van Hulse, and L. Bullard. Multiple Imputation of Missing Values in Software Measurement Data. Technical report, Florida Atlantic University, February 2006.
Y. Haitovsky. Missing data in regression analysis. Journal Royal Statistical Society, 30:67–81, 1968.
I. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, CA, 2nd edition, 2005.
P. Jonsson and C. Wohlin. An evaluation of k-nearest neighbour imputation using likert data. 10th IEEE Intl. Symposium on Software Metrics (METRICS’04), pages 108–118, 2004.
J. Schafer. Analysis of Incomplete Multivariate Data. Chapman and Hall/CRC, Boca Raton, FL, 2000.
SAS Institute. SAS/STAT User’s Guide. 2004.
D. Rubin. Multiple Imputation. John Wiley and Sons, New York, NY, 1987.
J. Schafer and M. Olsen. Multiple imputation for multivariate missing data problems: A data analyst’s perspective. Multivariate Behavioral Research, 33(4):545–571, 1998.
P. Allison. Missing Data. Sage University Press, Thousand Oaks, CA, 2002.
J. Schafer and J. W. Graham. Missing data: Our view of the state of the art. Psychological Methods, 7(2):147–177, 2002.
P. Bremaud. Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues. Springer, 1999.
M. A. Tanner and W. H. Wong. The calculation of posterior distributions by data augmentation. Journal of the American Statistical Society, 82:528–550, 1987.
R. Little and D. Rubin. Statistical Analysis with Missing Data. John Wiley and Sons, New York, NY, 2nd edition, 2002.
Y. C. Yuan. Multiple imputation for missing data: Concepts and new development. In Proceedings of the 25th Annual SAS Users Group International Conference, 2000. Paper No 267.
A. Dempster, N. Laird, and D. Rubin. Maximum likelihood estimates from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39:1–38, 1977. Series B.
M. Berenson, D. Levine, and M. Goldstein. Intermediate Statistical Methods and Applications: A Computer Package Approach. Prentice Hall, Englewood Cliffs, NJ, USA, 1983.
A. Hayter. A Proof of the Conjecture that Tukey-Kramer Methods is Conservative. The Annals of Statistics, 12:61–75, 1984.
C. Kramer. Extension of Multiple Range Tests to Group Means with Unequal Number of Replications. Biometrics, 29(1):4–11, 1956.
H. Scheffe. The Analysis of Variance. John Wiley and Sons, New York, NY, 1959.
R. Waller and D. Duncan. A Bayes Rule for the Symmetric Multiple Comparison Problem. Journal of the American Statistical Association, 64:1484–1499, 1969.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer London
About this chapter
Cite this chapter
Folleco, A., Khoshgoftaar, T., Van Hulse, J. (2008). Software Fault Imputation in Noisy and Incomplete Measurement Data. In: Pham, H. (eds) Recent Advances in Reliability and Quality in Design. Springer Series in Reliability Engineering. Springer, London. https://doi.org/10.1007/978-1-84800-113-8_12
Download citation
DOI: https://doi.org/10.1007/978-1-84800-113-8_12
Publisher Name: Springer, London
Print ISBN: 978-1-84800-112-1
Online ISBN: 978-1-84800-113-8
eBook Packages: EngineeringEngineering (R0)