Transfer learning in effort estimation

Kocaguneli, Ekrem; Menzies, Tim; Mendes, Emilia

doi:10.1007/s10664-014-9300-5

Transfer learning in effort estimation

Published: 29 March 2014

Volume 20, pages 813–843, (2015)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Ekrem Kocaguneli¹,
Tim Menzies¹ &
Emilia Mendes²

1600 Accesses
60 Citations
1 Altmetric
Explore all metrics

Abstract

When projects lack sufficient local data to make predictions, they try to transfer information from other projects. How can we best support this process? In the field of software engineering, transfer learning has been shown to be effective for defect prediction. This paper checks whether it is possible to build transfer learners for software effort estimation. We use data on 154 projects from 2 sources to investigate transfer learning between different time intervals and 195 projects from 51 sources to provide evidence on the value of transfer learning for traditional cross-company learning problems. We find that the same transfer learning method can be useful for transfer effort estimation results for the cross-company learning problem and the cross-time learning problem. It is misguided to think that: (1) Old data of an organization is irrelevant to current context or (2) data of another organization cannot be used for local solutions. Transfer learning is a promising research direction that transfers relevant cross data between time intervals and domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

From Function Points to COSMIC - A Transfer Learning Approach for Effort Estimation

Which models of the past are relevant to the present? A software effort estimation approach to exploiting useful past models

Article Open access 28 December 2016

Cross-Project Defect Prediction: Leveraging Knowledge Transfer for Improved Software Quality Assurance

Notes

In the literature, this is known as a negative transfer effect (Pan and Yang 2010) where transfer learning actually makes matters worse.
Terminology note: Kitchenham et al. called such transfers “cross-company” learning.
Examples of delphi localizations come from Boehm (1981) and Petersen and Wohlin (2009). Boehm divided software projects into one of the “embedded”, “semi-detached” or “organic” projects and offered different COCOMO-I effort models for each. Petersen & Wohlin offer a rich set of dimensions for contextualizing projects (processes, product, organization, etc).
http://goo.gl/WxGXv
http://goo.gl/ioXDy
Note that the literature contains numerous synonyms for data set shift including “concept shift” or “concept drift”, “changes of classification”, “changing environments”, “contrast mining in classification learning”,“fracture points” and “fractures between data”. We will use the term as defined in the above text.

References

Alpaydin E (2010) Introduction to Machine Learning, 2nd edn. MIT Press
Arnold A, Nallapati R, Cohen W (2007) A comparative study of methods for transductive transfer learning. In: ICDM’07: 17th IEEE international conference on data mining workshops, pp 77 –82
Bettenburg N, Nagappan M, Hassan AE (2012) Think locally, act globally: improving defect and effort prediction models. In MSR’12
Boehm B (1981) Software engineering economics. Prentice hall
Chang C-l (1974) Finding prototypes for nearest classifiers. IEEE Trans Comput C3(11)
Corazza A, Di Martino S, Ferrucci F, Gravino C, Sarro F, Mendes E (2010) How effective is tabu search to configure support vector regression for effort estimation? In: Proceedings of the 6th international conference on predictive models in software engineering
Rodriguez D, Herraiz I, Harrison R (2012) On software engineering repositories and their open problems. In: Proceedings RAISE’12
Dai W, Xue G-R, Yang Q, Yong Y (2007) Transferring naive bayes classifiers for text classification. In: AAAI’07: Proceedings of the 22nd national conference on artificial intelligence, pp 540–545
Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A simulation study of the model evaluation criterion mmre. IEEE Trans Softw Eng 29(11):985–995
Article Google Scholar
Foster G, Goutte C, Kuhn R (2010) Discriminative instance weighting for domain adaptation in statistical machine translation. In: EMNLP ’10: conference on empirical methods in natural language processing, pp 451–459
Gao J, Fan W, Jiang J, Han J (2008) Knowledge transfer via multiple model local structure mapping. In: International conference on knowledge discovery and data mining. Las Vegas, NV
Harman M, Jia Y, Zhang Y (2012) App store mining and analysis: Msr for app stores. In: MSR, pp 108–111
Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer
Hayes JH, Dekhtyar A, Sundaram SK (2006) Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans Softw Eng 32(1):4–19
Article Google Scholar
He Z, Shu F, Yang Y, Li M, Wang Q (2012) An investigation on the feasibility of cross-project defect prediction. Autom Softw Eng 19:167–199
Article Google Scholar
Hihn J, Habib-agahi H (1991) Cost estimation of software intensive projects: a survey of current practices. In: 13th international conference on software engineering 1991, pp 276 –287
Hindle A (2012) Green mining: a methodology of relating software change to power consumption. In: Proceedings, MSR’12
Huang J, Smola A, Gretton A, Borgwardt K, Scholkopf B (2007) Correcting sample selection bias by unlabeled data. In: Proceedings of the 19th Annual Conference on Neural Information Processing Systems, pp 601–608
Jiang Y, Cukic B, Menzies T, Bartlow N (2008) Comparing design and code metrics for software quality prediction. In: Proceedings PROMISE 2008, pp 11–18
Kadoda G, Cartwright M, Shepperd M (2000) On configuring a case-based reasoning software project prediction system. UK CBR Workshop, Cambridge, UK, pp 1–10
Keung J (2008) Empirical evaluation of analogy-x for software cost estimation. In: ESEM ’08: Proceedings of the second international symposium on empirical software engineering and measurement. ACM, New York, NY, pp 294–296
Keung J, Kocaguneli E, Menzies T (2012) Finding conclusion stability for selecting the best effort predictor in software effort estimation. Automated Software Engineering, pp 1–25. doi:10.1007/s10515-012-0108-5
Kitchenham BA, Mendes E, Travassos GH (2007) Cross versus within-company cost estimation studies: a systematic review. IEEE Trans Softw Eng 33(5):316–329
Article Google Scholar
Kocaguneli E, Gay G, Yang Y, Menzies T, Keung J (2010) When to use data from other projects for effort estimation. In: ASE ’10: Proceedings of the international conference on automated software engineering (short paper). New York, NY
Kocaguneli E, Menzies T (2011) How to find relevant data for effort estimation. In: ESEM’11: international symposium on empirical software engineering and measurement
Kocaguneli E, Menzies T (2012) Software effort models should be assessed via leave-one-out validation. Under Review
Kocaguneli E, Menzies T, Bener A, Keung JW (2012) Exploiting the essential assumptions of analogy-based effort estimation. IEEE Trans Softw Eng 38(2):425–438
Article Google Scholar
Lee S-I, Chatalbashev V, Vickrey D, Koller D (2007) Learning a meta-level prior for feature relevance from multiple related tasks. In: ICML ’07: Proceedings of the 24th international conference on machine learning, pp 489–496
Li Y, Xie M, Goh T (2009) A study of project selection and feature weighting for analogy based software cost estimation. J Syst Softw 82:241–252
Article Google Scholar
Lokan C, Mendes E (2009a) Applying moving windows to software effort estimation. In: ESEM’09: Proceedings of the 3rd international symposium on empirical software engineering and measurement, pp 111–122
Lokan C, Mendes E (2009b) Using chronological splitting to compare cross- and single-company effort models: further investigation. In: Proceedings of the thirty-second Australasian conference on computer science, vol 91. ACSC ’09, pp 47–54
Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for cross-company software defect prediction. Inf Softw Technol 54(3):248–256
Article Google Scholar
Mendes E, Mosley N (2008) Bayesian network models for web effort prediction: a comparative study. IEEE Trans Softw Eng 34:723–737
Article Google Scholar
Mendes E, Mosley N, Counsell S (2005) Investigating web size metrics for early web cost estimation. J Syst Softw 77:157–172
Article Google Scholar
Mendes E, Watson ID, Triggs C, Mosley N, Counsell S (2003) A comparative study of cost estimation models for web hypermedia applications. Empir Softw Eng 8(2):163–196
Article Google Scholar
Menzies T, Butcher A, Cok D, Marcus A, Layman L, Shull F, Turhan B, Zimmermann T (2012) Local vs. global lessons for defect prediction and effort estimation. In: IEEE transactions on software engineering, p 1
Menzies T, Butcher A, Marcus A, Zimmermann T, Cok D (2011) Local vs global models for effort estimation and defect prediction. In: IEEE ASE’11. Available from http://menzies.us/pdf/11ase.pdf
Menzies T, Greenwald J, Frank A (2007) Data mining static code attributes to learn defect predictors. In: IEEE transactions on software engineering. Available from http://menzies.us/pdf/06learnPredict.pdf
Mihalkova L, Huynh T, Mooney RJ (2007) Mapping and revising markov logic networks for transfer learning. In: AAAI’07: Proceedings of the 22nd national conference on Artificial intelligence, pp 608–614
Milicic D, Wohlin C (2004) Distribution patterns of effort estimations. In: Euromicro conference series on software engineering and advanced applications, pp 422–429
Minku LL, Yao X (2012) Can cross-company data improve performance in software effort estimation? In: PROMISE ’12: Proceedings of the 8th international conference on predictive models in software engineering
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Article Google Scholar
Petersen K, Wohlin C (2009) Context in industrial software engineering research. In: 3rd International Symposium on empirical software engineering and measurement, 2009. ESEM 2009, pp 401 –404
Posnett D, Filkov V, Devanbu P (2011) Ecological inference in empirical software engineering. In: Proceedings of ASE’11
Reifer D, Boehm BW, Chulani S (1999) The Rosetta Stone: Making COCOMO 81 Estimates Work with COCOMO II. Crosstalk. The Journal of Defense Software Engineering, pp 11–15
Robson C (2002) Real world research: a resource for social scientists and practitioner-researchers. Blackwell Publisher Ltd
Shepperd M, MacDonell S (2012) Evaluating prediction systems in software project estimation. Inf Softw Technol 54(8):820–827
Article Google Scholar
Shepperd M, Schofield C (1997) Estimating software project effort using analogies. IEEE Trans Softw Eng 23(11):736–743
Article Google Scholar
Stensrud E, Foss T, Kitchenham B, Myrtveit I (2002) An empirical validation of the relationship between the magnitude of relative error and project size. In: Proceedings of the 8th IEEE symposium on software metrics, pp 3–12
Storkey A (2009) When training and test sets are different: characterizing learning transfer. In: Candela J, Sugiyama M, Schwaighofer A, Lawrence, N (eds) Dataset shift in machine learning. MIT Press, Cambridge, pp 3–28
Turhan B (2012) On the dataset shift problem in software engineering prediction models. Empir Softw Eng 17:62–74
Article MathSciNet Google Scholar
Turhan B, Menzies T, Bener A, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14(5):540–578
Article Google Scholar
Wu P, Dietterich TG (2004) Improving svm accuracy by training on auxiliary data sources. In: Proceedings of the twenty-first international conference on Machine learning, ICML ’04. ACM, New York, NY, p 110
Yang Y, Xie L, He Z, Li Q, Nguyen V, Boehm BW, Valerdi R (2011) Local bias and its impacts on the performance of parametric estimation models. In: PROMISE
Zhang H, Sheng S (2004) Learning weighted naive bayes with accurate ranking. In: ICDM ’04 4th IEEE international conference on data mining, pp 567–570
Zhang X, Dai W, Xue G-R, Yu Y (2007) Adaptive email spam filtering based on information theory. In: Web information systems engineering WISE 2007, Lecture notes in computer science, vol 4831. Springer Berlin/Heidelberg, pp 159–170
Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. ESEC/FSE, pp 91–100
žliobaitė I (2010) Learning under concept drift: an overview. CoRR, arXiv:1010.4784

Download references

Acknowledgments

The work was partially funded by NSF CCF grant, award number 1302169, and the Qatar/West Virginia University research grant NPRP 09-12-5-2-470.

Author information

Authors and Affiliations

Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, 26505, USA
Ekrem Kocaguneli & Tim Menzies
Department of Software Engineering, Faculty of Computing, Blekinge Institute of Technology, 37179, Karlskrona, Sweden
Emilia Mendes

Authors

Ekrem Kocaguneli
View author publications
You can also search for this author in PubMed Google Scholar
Tim Menzies
View author publications
You can also search for this author in PubMed Google Scholar
Emilia Mendes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ekrem Kocaguneli.

Additional information

Communicated by: Martin Shepperd

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kocaguneli, E., Menzies, T. & Mendes, E. Transfer learning in effort estimation. Empir Software Eng 20, 813–843 (2015). https://doi.org/10.1007/s10664-014-9300-5

Download citation

Published: 29 March 2014
Issue Date: June 2015
DOI: https://doi.org/10.1007/s10664-014-9300-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transfer learning in effort estimation

Abstract

Access this article

Similar content being viewed by others

From Function Points to COSMIC - A Transfer Learning Approach for Effort Estimation

Which models of the past are relevant to the present? A software effort estimation approach to exploiting useful past models

Cross-Project Defect Prediction: Leveraging Knowledge Transfer for Improved Software Quality Assurance

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Transfer learning in effort estimation

Abstract

Access this article

Similar content being viewed by others

From Function Points to COSMIC - A Transfer Learning Approach for Effort Estimation

Which models of the past are relevant to the present? A software effort estimation approach to exploiting useful past models

Cross-Project Defect Prediction: Leveraging Knowledge Transfer for Improved Software Quality Assurance

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation