Applying a Data Quality Model to Experiments in Software Engineering

  • María Carolina Valverde
  • Diego Vallespir
  • Adriana Marotta
  • Jose Ignacio Panach
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8823)


Data collection and analysis are key artifacts in any software engineering experiment. However, these data might contain errors. We propose a Data Quality model specific to data obtained from software engineering experiments, which provides a framework for analyzing and improving these data. We apply the model to two controlled experiments, which results in the discovery of data quality problems that need to be addressed. We conclude that data quality issues have to be considered before obtaining the experimental results.


data quality software engineering controlled experiments 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Valverde, C., Grazioli, F., Vallespir, D.: A study of the quality of data gathered during the use of personal software process. In: Proceedings JIISIC 2012. Lima, Peru (2012)Google Scholar
  2. 2.
    Valverde, C., Vallespir, D., Marotta, A.: Data quality analysis in software engineering experimental data. In: Proceedings CACIC 2012, Argentina, pp. 794-803 (2012)Google Scholar
  3. 3.
    Batini, C., Scannapieco, M.: Data Quality: Concepts, Methodologies and Techniques. Springer, Heidelberg (2006)Google Scholar
  4. 4.
    Strong, D.M., Lee, Y.W., Wang, R.Y.: Data quality in context. Communications of ACM 40, 103–110 (1997)CrossRefGoogle Scholar
  5. 5.
    Pipino, L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Communications of ACM 45(4), 211–218 (2002)CrossRefGoogle Scholar
  6. 6.
    Wang, R.Y., Strong, D.M.: Beyond Accuracy: What Data Quality Means to Data Consumers. Journal of Management Information Systems 12(4), 5–33 (1996)zbMATHGoogle Scholar
  7. 7.
    Scannapieco, M., Catarci, T.: Data quality under a computer science perspective. Archivi & Computer 2, 1–15 (2002)Google Scholar
  8. 8.
    Redman, T.: Data Quality for the Information Age. Artech House (1996)Google Scholar
  9. 9.
    Crosby, P.B.: Quality without tears: The art of hassle free management. McGraw-Hill, New York (1984)Google Scholar
  10. 10.
    Moranga, M.A., Calero, C., Piattini, M.: Comparing different quality models for portals. Online Information Review 30(5), 555–568 (2006)CrossRefGoogle Scholar
  11. 11.
    Etcheverry, L., Marotta, A., Ruggia, R.: Data Quality Metrics for Genome Wide Association Studies. In: DEXA Workshops, pp. 105–109 (2010) Google Scholar
  12. 12.
    Embley, D.W., Liddle, S., Pastor, Ó.: Conceptual-Model Programming: A Manifesto. In: Handbook of Conceptual Modeling, pp. 3–16. Springer (2011)Google Scholar
  13. 13.
    Bachmann, A.J.E.: Why Should We Care about Data Quality in Software Engineering? Ph.D. thesis, University of Zurich (2010)Google Scholar
  14. 14.
    Liebchen, G.A.: Data Cleaning Techniques for Software Engineering Data Sets. Ph.D. thesis, Brunel University (2010)Google Scholar
  15. 15.
    Liebchen, G.A., Shepperd, M.: Data sets and data quality in software engineering. In: Proceedings PROMISE 2008, pp. 39–44. ACM, New York (2008)Google Scholar
  16. 16.
    Liebchen, G.A., Twala, B., Shepperd, M., Cartwright, M., Stephens, M.: Filtering, robust, filtering, polishing: Techniques for addressing quality in software data. In: ESEM 2007, Madrid, Spain, pp. 99–106 (2007)Google Scholar
  17. 17.
    Bachmann, A., Bernstein, A.: When Process Data Quality Affects the Number of Bugs: Correlations in Software Engineering Datasets. In: MSR 2010, pp. 62–71. IEEE Computer Society, Cape Town (2010)Google Scholar
  18. 18.
    Chen, K., Schach, S.R., Yu, L., Offutt, J., Heller, G.Z.: Open-source change logs. Emp. Softw. Eng. 9(3), 197–210 (2004)CrossRefGoogle Scholar
  19. 19.
    Liebchen, G.A., Shepperd, M.: Software productivity analysis of a large data set and issues of confidentiality and data quality. In: Proceedings of METRICS 2005 (2005)Google Scholar
  20. 20.
    Bachmann, A., Bernstein, A.: Software process data quality and characteristics - a historical view on open and closed source projects. In: IWPSE-Evol 2009, Amsterdam, The Netherlands, pp. 119–128 (2009)Google Scholar
  21. 21.
    Basili, V., Weiss, D.: A methodology for collecting valid software engineering data. IEEE Transactions on Software Engineering 10(6), 728–738 (1984)CrossRefGoogle Scholar
  22. 22.
    Kim, S., Zhang, H., Wu, R., Gong, L.: Dealing with Noise in Defect Prediction. In: Proc. of ICSE 2011, Honolulu, Hawaii, pp. 481–490 (2011)Google Scholar
  23. 23.
    Strike, K., Emam, K.E., Madhavji, N.: Software Cost Estimation with Incomplete Data. IEEE Trans. on Software Engineering 27(10), 890–908 (2001)CrossRefGoogle Scholar
  24. 24.
    Aranda, J., Venolia, G.: The secret life of bugs: Going past the errors and omissions in software repositories. In: ICSE 2009, pp. 298–308 (2009)Google Scholar
  25. 25.
    Liebchen, G.A., Twala, B., Shepperd, M., Cartwright, M.: Assessing the quality and cleaning of a software project data set: An experience report. In: Proceedings of EASE 2006. British Computer Society (2006)Google Scholar
  26. 26.
    Cartwright, M.H., Shepperd, M.J., Song, Q.: Dealing with Missing Software Project Data. In: Proceedings of METRICS 2003, p. 154. IEEE Computer Society, Australia (2003)Google Scholar
  27. 27.
    Rodriguez, D., Herraiz, I., Harrison, R.: On software engineering repositories and their open problems. In: RAISE (2012)Google Scholar
  28. 28.
    Bachmann, A., Bird, C., Rahman, F., Devanbu, P., Bernstein, A.: The Missing Links: Bugs and Bug-Fix Commits. In: ACM SIGSOFT / FSE 2010. ACM, USA (2010)Google Scholar
  29. 29.
    Wu, R., Zhang, H., Kim, S., Cheung, S.: ReLink: recovering links between bugs and changes. In: Proceedings of the 19th ACM SIGSOFT, Szeged, Hungary (2011)Google Scholar
  30. 30.
    Bosu, M.F., MacDonell, S.G.: Data quality in empirical software engineering: A targeted review. In: Proceedings of EASE 2013, pp. TBC. ACM Press, Brazil (2013)Google Scholar
  31. 31.
    Bosu, M.F., MacDonell, S.G.: A Taxonomy of Data Quality Challenges in Empirical Software Engineering. In: Australian Software Engineering Conference, pp. 97–106 (2013)Google Scholar
  32. 32.
    Etcheverry, L., Peralta, V., Bouzeghoub, M.: Qbox-Foundation: A Metadata Platform for Quality Measurement. In: DKQ 2008 in EGC 2008, Sophia-Antipolis, France (January 2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • María Carolina Valverde
    • 1
  • Diego Vallespir
    • 1
  • Adriana Marotta
    • 1
  • Jose Ignacio Panach
    • 2
  1. 1.Universidad de la RepúblicaMontevideoUruguay
  2. 2.Departament d’InformàticaUniversitat de ValènciaValenciaEspaña

Personalised recommendations