Skip to main content

Applying a Data Quality Model to Experiments in Software Engineering

  • Conference paper
Advances in Conceptual Modeling (ER 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8823))

Included in the following conference series:

Abstract

Data collection and analysis are key artifacts in any software engineering experiment. However, these data might contain errors. We propose a Data Quality model specific to data obtained from software engineering experiments, which provides a framework for analyzing and improving these data. We apply the model to two controlled experiments, which results in the discovery of data quality problems that need to be addressed. We conclude that data quality issues have to be considered before obtaining the experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Valverde, C., Grazioli, F., Vallespir, D.: A study of the quality of data gathered during the use of personal software process. In: Proceedings JIISIC 2012. Lima, Peru (2012)

    Google Scholar 

  2. Valverde, C., Vallespir, D., Marotta, A.: Data quality analysis in software engineering experimental data. In: Proceedings CACIC 2012, Argentina, pp. 794-803 (2012)

    Google Scholar 

  3. Batini, C., Scannapieco, M.: Data Quality: Concepts, Methodologies and Techniques. Springer, Heidelberg (2006)

    Google Scholar 

  4. Strong, D.M., Lee, Y.W., Wang, R.Y.: Data quality in context. Communications of ACM 40, 103–110 (1997)

    Article  Google Scholar 

  5. Pipino, L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Communications of ACM 45(4), 211–218 (2002)

    Article  Google Scholar 

  6. Wang, R.Y., Strong, D.M.: Beyond Accuracy: What Data Quality Means to Data Consumers. Journal of Management Information Systems 12(4), 5–33 (1996)

    MATH  Google Scholar 

  7. Scannapieco, M., Catarci, T.: Data quality under a computer science perspective. Archivi & Computer 2, 1–15 (2002)

    Google Scholar 

  8. Redman, T.: Data Quality for the Information Age. Artech House (1996)

    Google Scholar 

  9. Crosby, P.B.: Quality without tears: The art of hassle free management. McGraw-Hill, New York (1984)

    Google Scholar 

  10. Moranga, M.A., Calero, C., Piattini, M.: Comparing different quality models for portals. Online Information Review 30(5), 555–568 (2006)

    Article  Google Scholar 

  11. Etcheverry, L., Marotta, A., Ruggia, R.: Data Quality Metrics for Genome Wide Association Studies. In: DEXA Workshops, pp. 105–109 (2010)

    Google Scholar 

  12. Embley, D.W., Liddle, S., Pastor, Ó.: Conceptual-Model Programming: A Manifesto. In: Handbook of Conceptual Modeling, pp. 3–16. Springer (2011)

    Google Scholar 

  13. Bachmann, A.J.E.: Why Should We Care about Data Quality in Software Engineering? Ph.D. thesis, University of Zurich (2010)

    Google Scholar 

  14. Liebchen, G.A.: Data Cleaning Techniques for Software Engineering Data Sets. Ph.D. thesis, Brunel University (2010)

    Google Scholar 

  15. Liebchen, G.A., Shepperd, M.: Data sets and data quality in software engineering. In: Proceedings PROMISE 2008, pp. 39–44. ACM, New York (2008)

    Google Scholar 

  16. Liebchen, G.A., Twala, B., Shepperd, M., Cartwright, M., Stephens, M.: Filtering, robust, filtering, polishing: Techniques for addressing quality in software data. In: ESEM 2007, Madrid, Spain, pp. 99–106 (2007)

    Google Scholar 

  17. Bachmann, A., Bernstein, A.: When Process Data Quality Affects the Number of Bugs: Correlations in Software Engineering Datasets. In: MSR 2010, pp. 62–71. IEEE Computer Society, Cape Town (2010)

    Google Scholar 

  18. Chen, K., Schach, S.R., Yu, L., Offutt, J., Heller, G.Z.: Open-source change logs. Emp. Softw. Eng. 9(3), 197–210 (2004)

    Article  Google Scholar 

  19. Liebchen, G.A., Shepperd, M.: Software productivity analysis of a large data set and issues of confidentiality and data quality. In: Proceedings of METRICS 2005 (2005)

    Google Scholar 

  20. Bachmann, A., Bernstein, A.: Software process data quality and characteristics - a historical view on open and closed source projects. In: IWPSE-Evol 2009, Amsterdam, The Netherlands, pp. 119–128 (2009)

    Google Scholar 

  21. Basili, V., Weiss, D.: A methodology for collecting valid software engineering data. IEEE Transactions on Software Engineering 10(6), 728–738 (1984)

    Article  Google Scholar 

  22. Kim, S., Zhang, H., Wu, R., Gong, L.: Dealing with Noise in Defect Prediction. In: Proc. of ICSE 2011, Honolulu, Hawaii, pp. 481–490 (2011)

    Google Scholar 

  23. Strike, K., Emam, K.E., Madhavji, N.: Software Cost Estimation with Incomplete Data. IEEE Trans. on Software Engineering 27(10), 890–908 (2001)

    Article  Google Scholar 

  24. Aranda, J., Venolia, G.: The secret life of bugs: Going past the errors and omissions in software repositories. In: ICSE 2009, pp. 298–308 (2009)

    Google Scholar 

  25. Liebchen, G.A., Twala, B., Shepperd, M., Cartwright, M.: Assessing the quality and cleaning of a software project data set: An experience report. In: Proceedings of EASE 2006. British Computer Society (2006)

    Google Scholar 

  26. Cartwright, M.H., Shepperd, M.J., Song, Q.: Dealing with Missing Software Project Data. In: Proceedings of METRICS 2003, p. 154. IEEE Computer Society, Australia (2003)

    Google Scholar 

  27. Rodriguez, D., Herraiz, I., Harrison, R.: On software engineering repositories and their open problems. In: RAISE (2012)

    Google Scholar 

  28. Bachmann, A., Bird, C., Rahman, F., Devanbu, P., Bernstein, A.: The Missing Links: Bugs and Bug-Fix Commits. In: ACM SIGSOFT / FSE 2010. ACM, USA (2010)

    Google Scholar 

  29. Wu, R., Zhang, H., Kim, S., Cheung, S.: ReLink: recovering links between bugs and changes. In: Proceedings of the 19th ACM SIGSOFT, Szeged, Hungary (2011)

    Google Scholar 

  30. Bosu, M.F., MacDonell, S.G.: Data quality in empirical software engineering: A targeted review. In: Proceedings of EASE 2013, pp. TBC. ACM Press, Brazil (2013)

    Google Scholar 

  31. Bosu, M.F., MacDonell, S.G.: A Taxonomy of Data Quality Challenges in Empirical Software Engineering. In: Australian Software Engineering Conference, pp. 97–106 (2013)

    Google Scholar 

  32. Etcheverry, L., Peralta, V., Bouzeghoub, M.: Qbox-Foundation: A Metadata Platform for Quality Measurement. In: DKQ 2008 in EGC 2008, Sophia-Antipolis, France (January 2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Valverde, M.C., Vallespir, D., Marotta, A., Panach, J.I. (2014). Applying a Data Quality Model to Experiments in Software Engineering. In: Indulska, M., Purao, S. (eds) Advances in Conceptual Modeling. ER 2014. Lecture Notes in Computer Science, vol 8823. Springer, Cham. https://doi.org/10.1007/978-3-319-12256-4_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12256-4_18

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12255-7

  • Online ISBN: 978-3-319-12256-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics