Abstract
Data collection and analysis are key artifacts in any software engineering experiment. However, these data might contain errors. We propose a Data Quality model specific to data obtained from software engineering experiments, which provides a framework for analyzing and improving these data. We apply the model to two controlled experiments, which results in the discovery of data quality problems that need to be addressed. We conclude that data quality issues have to be considered before obtaining the experimental results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Valverde, C., Grazioli, F., Vallespir, D.: A study of the quality of data gathered during the use of personal software process. In: Proceedings JIISIC 2012. Lima, Peru (2012)
Valverde, C., Vallespir, D., Marotta, A.: Data quality analysis in software engineering experimental data. In: Proceedings CACIC 2012, Argentina, pp. 794-803 (2012)
Batini, C., Scannapieco, M.: Data Quality: Concepts, Methodologies and Techniques. Springer, Heidelberg (2006)
Strong, D.M., Lee, Y.W., Wang, R.Y.: Data quality in context. Communications of ACM 40, 103–110 (1997)
Pipino, L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Communications of ACM 45(4), 211–218 (2002)
Wang, R.Y., Strong, D.M.: Beyond Accuracy: What Data Quality Means to Data Consumers. Journal of Management Information Systems 12(4), 5–33 (1996)
Scannapieco, M., Catarci, T.: Data quality under a computer science perspective. Archivi & Computer 2, 1–15 (2002)
Redman, T.: Data Quality for the Information Age. Artech House (1996)
Crosby, P.B.: Quality without tears: The art of hassle free management. McGraw-Hill, New York (1984)
Moranga, M.A., Calero, C., Piattini, M.: Comparing different quality models for portals. Online Information Review 30(5), 555–568 (2006)
Etcheverry, L., Marotta, A., Ruggia, R.: Data Quality Metrics for Genome Wide Association Studies. In: DEXA Workshops, pp. 105–109 (2010)
Embley, D.W., Liddle, S., Pastor, Ó.: Conceptual-Model Programming: A Manifesto. In: Handbook of Conceptual Modeling, pp. 3–16. Springer (2011)
Bachmann, A.J.E.: Why Should We Care about Data Quality in Software Engineering? Ph.D. thesis, University of Zurich (2010)
Liebchen, G.A.: Data Cleaning Techniques for Software Engineering Data Sets. Ph.D. thesis, Brunel University (2010)
Liebchen, G.A., Shepperd, M.: Data sets and data quality in software engineering. In: Proceedings PROMISE 2008, pp. 39–44. ACM, New York (2008)
Liebchen, G.A., Twala, B., Shepperd, M., Cartwright, M., Stephens, M.: Filtering, robust, filtering, polishing: Techniques for addressing quality in software data. In: ESEM 2007, Madrid, Spain, pp. 99–106 (2007)
Bachmann, A., Bernstein, A.: When Process Data Quality Affects the Number of Bugs: Correlations in Software Engineering Datasets. In: MSR 2010, pp. 62–71. IEEE Computer Society, Cape Town (2010)
Chen, K., Schach, S.R., Yu, L., Offutt, J., Heller, G.Z.: Open-source change logs. Emp. Softw. Eng. 9(3), 197–210 (2004)
Liebchen, G.A., Shepperd, M.: Software productivity analysis of a large data set and issues of confidentiality and data quality. In: Proceedings of METRICS 2005 (2005)
Bachmann, A., Bernstein, A.: Software process data quality and characteristics - a historical view on open and closed source projects. In: IWPSE-Evol 2009, Amsterdam, The Netherlands, pp. 119–128 (2009)
Basili, V., Weiss, D.: A methodology for collecting valid software engineering data. IEEE Transactions on Software Engineering 10(6), 728–738 (1984)
Kim, S., Zhang, H., Wu, R., Gong, L.: Dealing with Noise in Defect Prediction. In: Proc. of ICSE 2011, Honolulu, Hawaii, pp. 481–490 (2011)
Strike, K., Emam, K.E., Madhavji, N.: Software Cost Estimation with Incomplete Data. IEEE Trans. on Software Engineering 27(10), 890–908 (2001)
Aranda, J., Venolia, G.: The secret life of bugs: Going past the errors and omissions in software repositories. In: ICSE 2009, pp. 298–308 (2009)
Liebchen, G.A., Twala, B., Shepperd, M., Cartwright, M.: Assessing the quality and cleaning of a software project data set: An experience report. In: Proceedings of EASE 2006. British Computer Society (2006)
Cartwright, M.H., Shepperd, M.J., Song, Q.: Dealing with Missing Software Project Data. In: Proceedings of METRICS 2003, p. 154. IEEE Computer Society, Australia (2003)
Rodriguez, D., Herraiz, I., Harrison, R.: On software engineering repositories and their open problems. In: RAISE (2012)
Bachmann, A., Bird, C., Rahman, F., Devanbu, P., Bernstein, A.: The Missing Links: Bugs and Bug-Fix Commits. In: ACM SIGSOFT / FSE 2010. ACM, USA (2010)
Wu, R., Zhang, H., Kim, S., Cheung, S.: ReLink: recovering links between bugs and changes. In: Proceedings of the 19th ACM SIGSOFT, Szeged, Hungary (2011)
Bosu, M.F., MacDonell, S.G.: Data quality in empirical software engineering: A targeted review. In: Proceedings of EASE 2013, pp. TBC. ACM Press, Brazil (2013)
Bosu, M.F., MacDonell, S.G.: A Taxonomy of Data Quality Challenges in Empirical Software Engineering. In: Australian Software Engineering Conference, pp. 97–106 (2013)
Etcheverry, L., Peralta, V., Bouzeghoub, M.: Qbox-Foundation: A Metadata Platform for Quality Measurement. In: DKQ 2008 in EGC 2008, Sophia-Antipolis, France (January 2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Valverde, M.C., Vallespir, D., Marotta, A., Panach, J.I. (2014). Applying a Data Quality Model to Experiments in Software Engineering. In: Indulska, M., Purao, S. (eds) Advances in Conceptual Modeling. ER 2014. Lecture Notes in Computer Science, vol 8823. Springer, Cham. https://doi.org/10.1007/978-3-319-12256-4_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-12256-4_18
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12255-7
Online ISBN: 978-3-319-12256-4
eBook Packages: Computer ScienceComputer Science (R0)