Productivity Reanalysis for Unbalanced Datasets with Mixed-Effects Models

Amasaki, Sousuke

doi:10.1007/978-3-642-13792-1_22

Sousuke Amasaki¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6156))

Included in the following conference series:

International Conference on Product Focused Software Process Improvement

1641 Accesses

Abstract

Data analysis is a major and important activity in software engineering research. For example, productivity analysis and evaluation of new technologies almost always conduct statistical analysis on collected data. Software data are usually unbalanced because they are collected from actual projects, not from formal experiments, and therefore their population is biased. Fixed-effects models have often been used for data analysis though they are for balanced datasets. This misuse causes analysis to be insufficient and conclusion to be wrong. The past study[1] proposed an iterative procedure to treat unbalanced datasets for productivity analysis. However, this procedure was sometimes failed to identify partially-confounded factors and its estimated effects were not easy to interpret. This study examined mixed-effects models for productivity analysis. Mixed-effects models can work the same for unbalanced datasets as for balanced datasets. Furthermore its application is straightforward and estimated effects are easy to interpret. Experiments with four datasets showed advantages of the mixed-effects models clearly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kitchenham, B.: A procedure for analyzing unbalanced datasets. IEEE Trans. on Software Engineering 24(4), 278–301 (1998)
Article Google Scholar
Kemerer, C.F., Paulk, M.C.: The impact of design and code reviews on software quality: An empirical study based on psp data. IEEE Trans. on Software Engineering 35(4), 534–550 (2009)
Article Google Scholar
Bazeghi, C., Mesa-Martinez, F.J., Renau, J.: μComplexity: Estimating processor design effort. In: Proc. of 38th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 209–218 (2005)
Google Scholar
Reinhartz-Berger, I., Dori, D.: OPM vs. UML – experimenting with comprehension and construction of web application models. Empirical Software Engineering 10, 57–79 (2005)
Article Google Scholar
Lawrie, D., Feild, H., Binkley, D.: Quantifying identifier quality: an analysis of trends. Empirical Software Engineering 12, 359–388 (2007)
Article Google Scholar
Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, Cambridge (2007)
Google Scholar
Boehm, B.W.: Software Engineering Economics. Prentice-Hall, Englewood Cliffs (1981)
MATH Google Scholar
Shirabad, J.S., Menzies, T.J.: The PROMISE repository of software engineering databases. In: School of Information Technology and Engineering. University of Ottawa, Canada (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Systems Engineering, Okayama Prefectural University, 111 Kuboki, Soja, Okayama, 719-1197, Japan
Sousuke Amasaki

Authors

Sousuke Amasaki
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Software Development Group, IT University of Copenhagen, Rued Langgaards Vej 7, 2300, Copenhagen, Denmark
M. Ali Babar
VTT Technical Research Centre of Finland, Kaitoväylä 1, 90570, Oulu, Finland
Matias Vierimaa
Department of Information Processing Science, University of Oulu, P.O. Box 3000, 90014, Oulu, Finland
Markku Oivo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Amasaki, S. (2010). Productivity Reanalysis for Unbalanced Datasets with Mixed-Effects Models. In: Ali Babar, M., Vierimaa, M., Oivo, M. (eds) Product-Focused Software Process Improvement. PROFES 2010. Lecture Notes in Computer Science, vol 6156. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13792-1_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-13792-1_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13791-4
Online ISBN: 978-3-642-13792-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics