Abstract
Continuous Integration (CI) is a cornerstone of modern quality assurance, providing on-demand builds (compilation and tests) of code changes or software releases. Yet the many existing CI systems do not help developers in interpreting build results, in particular when facing build inflation. Build inflation arises when each code change has to be built on dozens of combinations (configurations) of runtime environments (REs), operating systems (OSes), and hardware architectures (HAs). A code change C1 sent to the CI system may introduce programming faults that result in all these builds to fail, while a change C2 introducing a new library dependency might only lead one particular build configuration to fail. Consequently, the one build failure due to C2 will be “hidden” among the dozens of build failures due to C1 when the CI system reports the results of the builds. We have named this phenomenon build inflation, because it may bias the interpretation of build results by developers by “hiding” certain types of faults.
In this paper, we study build inflation through a large-scale study of the relationship between REs and OSes and build failures on 30 million builds of the CPAN repository on the CPAN Testers package-level CI system. We show that the builds of Perl packages may fail differently on different REs and OSes and any combination thereof . Thus, we show that the results provided by CPAN Testers require filtering and selection to identify real trends of build failures among the many failures. Manual analysis of 791 build failures shows that dependency faults (missing modules) and programming faults (undefined values) are the main reasons for failures, with dependency faults being easier to fix. We conclude with recommendations for practitioners and researchers in interpreting build results as well as for tool builders who should improve he scheduling of builds and the reporting of build failures.
Similar content being viewed by others
Notes
In this paper, we use the term “package” in its usual sense while Perl developers talk about “distribution”.
Most of the vectors did not contain any build failure, which is expected.
The models are not useful to predict build failures in practice because they only include OSes and REs and ignore other factors. However, they are useful to validate the extent to which OSes and REs alone explain build failures, i.e., to validate the strength of the link between build configurations and build failures.
References
Adams B, McIntosh S (2016) Modern release engineering in a nutshell – why researchers should care. In: Proceedings of the 23rd IEEE international conference on software analysis, evolution, and reengineering, Leaders of tomorrow: future of software engineering (SANER). Osaka, Japan
Adams B, Tromp H, De Schutter K, De Meuter W (2007) Design recovery and maintenance of build systems. In: 2007 IEEE international conference on software maintenance. IEEE, pp 114–123
Adams B, De Schutter K, Tromp H, De Meuter W (2008) The evolution of the linux build system. Electronic Communications of the EASST, vol 8. https://journal.ub.tuberlin.de/eceasst/article/view/115/0
Allende Esteban, Fabry Johan, Garcia Ronald, Tanter É (2014) Confined gradual typing. In: Proceedings of the 2014 ACM international conference on object oriented programming systems languages & applications, OOPSLA ’14, Portland, Oregon, USA. ACM, New York, pp 251–270. https://doi.org/10.1145/2660193.2660222. http://doi.acm.org/10.1145/2660193.2660222
Anderson C, Giannini P, Drossopoulou S (2005) Towards Type Inference for Javascript. In: Proceedings of the 19th European conference on object-oriented programming, ECOOP’05, Glasgow, UK. Springer, Berlin, pp 428–452. https://doi.org/10.1007/11531142_19
Atlee C (2017) What happens when you push - 2012 edition. https://atlee.ca/blog/posts/blog20120113what-happens-when-you-push-2012-edition. Accessed 07 March 2017
Bass L, Weber I, Zhu L (2015) Devops: a software architect’s perspective, 1st. Addison-Wesley Professional, Reading
Bell J, Legunsen O, Hilton M, Eloussi L, Yung T, Marinov D (2018) Deflaker: automatically detecting flaky tests. In: 40th international conference on software engineering (ICSE). [Online]. Available: https://doi.org/10.1145/3180155.3180164, pp 433–444
Beller M, Gousios G, Zaidman A (2017) Oops, my tests broke the build: an explorative analysis of travis ci with github. In: Proceedings of the 14th international conference on mining software repositories. IEEE Press, pp 356–367
Booch G (1994) Object-oriented analysis and design with applications, 2nd. Benjamin-Cummings Publishing Co. Inc., Redwood City
Bouckaert RR, Frank E, Hall M, Kirkby R, Reutemann P, Seewald A, Scuse D (2010) Weka manual for version 3-7-3, The University of WAIKATO
Bracha G, Griswold D (1993) Strongtalk: typechecking smalltalk in a production environment. In: Proceedings of the eighth annual conference on object-oriented programming systems, languages, and applications, ser OOPSLA ’93. ACM, New York, pp 215–230. https://doi.org/10.1145/165854.165893. http://doi.acm.org/10.1145/165854.165893
Calle ML, Urrea V, Boulesteix A-L, Malats N (2011) Auc-rf: a new strategy for genomic profiling with random forest. Hum Hered 72(2):121–132
Campbell JL, Quincy C, Osserman J, Pedersen OK (2013) Coding in-depth semistructured interviews: Problems of unitization and intercoder reliability and agreement. Sociol Methods Res 42(3):294–320. [Online]. Available: https://doi.org/10.1177/0049124113500475
Carrez T (2015) Preventing craziness: a deep dive into OpenStack testing automation. Presentation at FOSDEM, Feb 2014
Chaudhuri A, Vekris P, Goldman S, Roch M, Levi G (2017) Fast and precise type checking for JavaScript. Proc. ACM Program. Lang. 1(OOPSLA):48:1–48:30. https://doi.org/10.1145/3133872. http://doi.acm.org/10.1145/3133872
CPAN Comprehensive Perl Archive Network (2015). http://www.cpan.org. Accessed 22 Dec 2015
CPAN Testers (2015) http://www.cpantesters.org. Accessed 22 Dec 2015
mailto:cpan@perl.org (2016) PerlSource versions and release date, accessed: 2016-11-01. [Online]. Available: http://www.cpan.org/src/
Denny P, Luxton-Reilly A, Tempero E (2012) All syntax errors are not equal. In: Proceedings of the 17th ACM annual conference on Innovation and technology in computer science education. ACM, pp 75–80
DeRemer F, Kron H (1975) Programming-in-the large versus programming-in-the-small. In: Proceedings of the international conference on reliable software, Los Angeles, California. ACM, New York, pp 114–121. https://doi.org/10.1145/800027.808431. http://doi.acm.org/10.1145/800027.808431
Duvall P, Matyas SM (2007) A glover continuous integration: improving software quality and reducing risk (The Addison-Wesley Signature Series). Addison-Wesley Professional, Reading
Dyke G (2011) Which aspects of novice programmers’ usage of an ide predict learning outcomes. In: Proceedings of the 42nd ACM technical symposium on Computer science education. ACM, pp 505–510
Feldman SI (1979) Make a program for maintaining computer programs. Software: Practice and Experience 9(4):255–265
Fowler M, Foemmel M (2006) Continuous integration, Thought-Works). http://www.thoughtworks.com/ContinuousIntegration.pdf
Gallaba K, Macho C, Pinzger M, McIntosh S (2018) Noise and heterogeneity in historical build data: an empirical study of travis ci. In: 33rd ACM/IEEE international conference on automated software engineering (ASE), pp 87–97
Gao Z, Bird C, Barr ET (2017) To type or not to type: quantifying detectable bugs in javascript. In: Proceedings of the 39th international conference on software engineering, ser. ICSE ’17. Piscataway, IEEE Press, pp 758–769. [Online]. Available: https://doi.org/10.1109/ICSE.2017.75
Glatard T, Lewis LB, Ferreira da Silva R, Adalat R, Beck N, Lepage C, Rioux P, Rousseau M-E, Sherif T, Deelman E, Khalili-Mahani N, Evans AC (2015) Reproducibility of neuroimaging analyses across operating systems. Frontiers in Neuroinformatics 9:12. https://doi.org/10.3389/fninf.2015.00012. https://www.frontiersin.org/article/10.3389/fninf.2015.00012
Hassan AE, Zhang K (2006) Using decision trees to predict the certification result of a build. In: 2006. ASE’06 21st IEEE/ACM international conference on automated software engineering. IEEE, pp 189–198
Hassan F, Wang X (2018) Hirebuild: an automatic approach to history-driven repair of build scripts. In: 40th international conference on software engineering (ICSE), pp 1078–1089
Hilton M, Tunnell T, Huang K, Marinov D, Dig D (2016) Usage, costs, and benefits of continuous integration in open-source projects. In: 2016 31st IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 426–437
Humble J, Farley D (2010) Continuous delivery: reliable software releases through build, test, and deployment automation, 1st. Addison-Wesley Professional, Reading
Kerzazi N, Khomh F, Adams B (2014) Why do automated builds break? an empirical study. In: 2014 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 41–50
Kruchten P (1995) The 4 + 1 view model of architecture. IEEE Softw 12(6):42–50. [Online]. Available: https://doi.org/10.1109/52.469759
Labuschagne A, Inozemtseva L, Holmes R (2017) Measuring the cost of regression testing in practice: a study of java projects using continuous integration. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering, ser. ESEC/FSE 2017. ACM, New York, pp 821–830. [Online]. Available: https://doi.org/10.1145/3106237.3106288
Laukkanen E, Paasivaara M, Arvonen T (2015) Stakeholder perceptions of the adoption of continuous integration–a case study. In: 2015 Agile conference (AGILE). IEEE, pp 11–20
Lehman MM (1996) Laws of software evolution revisited. In: European workshop on software process technology. Springer, pp 108–124
Leppänen M, Mäkinen S, Pagels M, Eloranta V-P, Itkonen J, Mäntylä MV, Männistö T (2015) The highways and country roads to continuous deployment. IEEE Softw 32(2):64–72
Luo Q, Hariri F, Eloussi L, Marinov D (2014) An empirical analysis of flaky tests. In: 22nd ACM SIGSOFT International symposium on foundations of software engineering (FSE), pp 643–653
Macho C, McIntosh S, Pinzger M (2018) Automatically repairing dependency-related build breakage. In: International conference on software analysis, evolution, and reengineering (SANER
McIntosh S, Adams B, Hassan AE (2010) The evolution of ant build systems. In: 2010 7th IEEE working conference on mining software repositories (MSR 2010). IEEE, pp 42–51
McIntosh S, Adams B, Kamei Y, Nguyen T, Hassan AE (2011) An empirical study of build maintenance effort. In: Proceedings of the 33rd international conference on software engineering (ICSE), Waikiki, Honolulu, Hawaii, pp 141–150
McIntosh S, Nagappan M, Adams B, Mockus A, Hassan AE (2015) A large-scale empirical study of the relationship between build technology and build maintenance. Empir Softw Eng 20(6):1587–1633
MetaCPAN API (2016) https://github.com/metacpan/metacpan-api. Accessed 07 Dec 2016
Micco J (2016) Continuous integration at google scale. https://www.slideshare.net/JohnMicco1/2016-0425-continuous-integration-at-google-scale
Miller A (2008) A hundred days of continuous integration. In: AGILE’08 conference Agile 2008. IEEE, pp 289–293
Mirhosseini S, Parnin C (2017) Can automated pull requests encourage software developers to upgrade out-of-date dependencies?. In: Proceedings of the 32Nd IEEE/ACM international conference on automated software engineering, ser. ASE 2017. Piscataway, IEEE Press, pp 84–94. [Online]. Available: http://dl.acm.org/citation.cfm?id=3155562.3155577
O’Duinn J (2013) The financial cost of a checkin (part 2). https://oduinn.com/2013/12/13/the-financial-cost-of-a-checkin-part-2/
Openstack Zuul CI Dashboard (2014) http://zuul.openstack.org
Palomba F, Zaidman A (2017) Does refactoring of test smells induce fixing flaky tests?. In: 2017 IEEE international conference on software maintenance and evolution (ICSME), pp 1–12
Raemaekers S, van Deursen A, Visser J (2012) Measuring software library stability through historical version analysis. In: 2012 28th IEEE international conference on software maintenance (ICSM). IEEE, pp 378–387
Rausch T, Hummer W, Leitner P, Schulte S (2017) An empirical analysis of build failures in the continuous integration workflows of java-based open-source software. In: Proceedings of the 14th International Conference on Mining Software Repositories. IEEE Press, pp 345–355
Rogers RO (2004) Scaling continuous integration. In: International conference on extreme programming and Agile processes in software engineering. Springer, pp 68–76
Seo H, Sadowski C, Elbaum S, Aftandilian E, Bowdidge R (2014) Programmers’ build errors: a case study (at Google). In: Proceedings of the 36th international conference on software engineering. ACM, pp 724–734
Ståhl D, Bosch J (2014) Modeling continuous integration practice differences in industry software development. J Syst Softw 87:48–59
Suvorov R, Nagappan M, Hassan AE, Zou Y, Adams B (2012) An empirical study of build system migrations in practice: case studies on kde and the linux kernel. In: IEEE, pp 160–169
TreeHerder (2017) https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound. Accessed 20 Sept 2017
Tu Q, Godfrey MW (2001) The build-time software architecture view. In: Proceedings of the IEEE international conference on software maintenance (ICSM’01), ser. ICSM ’01. Washington, DC, USA: IEEE Computer Society, p 398. [Online]. Available: https://doi.org/10.1109/ICSM.2001.972753
Vasilescu B, Yu Y, Wang H, Devanbu P, Filkov V (2015) Quality and productivity outcomes relating to continuous integration in github. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 805–816
Vassallo C, Schermann G, Zampetti F, Romano D, Leitner P, Zaidman A, Penta MD, Panichella S (2017) A tale of ci build failures: an open source and a financial organization perspective. In: 2017 IEEE international conference on software maintenance and evolution (ICSME), pp 183–193
Wikipedia (2008) List of build automation software. https://en.wikipedia.org/wiki/List_of_build_automation_software. Accessed 14 May 2019
Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2000) Experimentation in software engineering: an introduction. Kluwer Academic Publishers, Norwell
Yoo S, Harman M (2012) Regression testing minimization, selection and prioritization: a survey. Softw Test Verif Reliab 22(2):67–120. [Online]. Available: https://doi.org/10.1002/stv.430
Zhao Y, Serebrenik A, Zhou Y, Filkov V, Vasilescu B (2017) The impact of continuous integration on other software development practices: a large-scale empirical study. In: Proceedings of the 32Nd IEEE/ACM International Conference on Automated Software Engineering, ser. ASE 2017. Piscataway, IEEE Press, pp 60–71. [Online]. Available: http://dl.acm.org/citation.cfm?id=3155562.3155575
Ziftci C, Reardon J (2017) Who broke the build?: automatically identifying changes that induce test failures in continuous integration at Google scale. In: Proceedings of the 39th international conference on software engineering: software engineering in practice track, ser. ICSE-SEIP ’17. IEEE Press, Piscataway, pp 113–122. [Online]. Available: https://doi.org/10.1109/ICSE-SEIP.2017.13
Zolfagharinia M, Adams B, Guéhéneuc Y-G (2017) Do not trust build results at face value: an empirical study of 30 million CPAN builds. In: Proceedings of the 14th international conference on mining software repositories. IEEE Press, pp 312–322
Acknowledgements
Part of this work was funded by the NSERC Discovery Grant and Canada Research Chair programs.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Romain Robbes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zolfagharinia, M., Adams, B. & Guéhéneuc, YG. A study of build inflation in 30 million CPAN builds on 13 Perl versions and 10 operating systems. Empir Software Eng 24, 3933–3971 (2019). https://doi.org/10.1007/s10664-019-09709-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-019-09709-6