Abstract
While the application of empirical methods has a long tradition in domains such as performance evaluation, the application of empirical methods with human subjects in order to evaluate the usability of programming techniques, programming language constructs or whole programming languages is relatively new (or, at least, running such studies is becoming more common). Despite the urgent need for such usability studies, few researchers are well-versed in such techniques, certainly when compared to the large number of researchers inventing new programming techniques or formal approaches. The main goal of this text is to introduce empirical methods for evaluating programming language constructs, with a strong focus on quantitative methods. The paper concludes with by explaining how and why a series of controlled experiments were gradually designed to study the usability of type systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
According to Hanenberg [14] the phrase software science is being used in order to describe the research related to software artifacts in general. While the term software engineering is used much more often, especially the programming language community or people doing performance measurements feel that this term does not adequately describe their domains. We think that the term software science, although originally used by Halstead [12] for something different, is more appropriate to describe the whole domain of software-related research.
- 2.
Sheil called the study of programming as practiced by computer science even ‘an unholy mixture of mathematics, literary criticism, and folklore.’ 1 [37, p. 102].
- 3.
It should be noted that just recently a study appeared which was not able to reveal a measurable benefit of lambda expressions in C++. Instead, the study showed at least for non-professional programmers a measurable disadvantage (see [45]).
- 4.
Again, to get an impression of how less it is main-stream: according to Kaijanaho the number of randomized controlled trials on human-factors comparative evaluation of language features up to 2012 was 22 (see [22, p. 143]).
- 5.
Additionally, the paper collection of Victor Basili by Boehm et al. [1] gives a larger set of examples about performed controlled trials.
- 6.
- 7.
- 8.
The corresponding non-parametric tests [5] are valid here, too, i.e. it is possible to analyse the crossover trial using a U-test and a Wilcoxon-test.
- 9.
The rather arbitrary choice of .05 is probably commonly used because it has been originally proposed by Fisher [8] although some other disciplines use a different alpha level.
- 10.
The points are word-by-word citatations from Souza and Figueiredo [39].
- 11.
Two other questions are formulated, which are skipped were for reasons of simplification.
- 12.
It is understandable that the authors do not run inference-statistical methods: a huge number of different words is being tested and it sounds plausible, that traditional approaches from inference statistics would not have revealed differences at all – because of the high number of variables.
- 13.
The authors distinguish in his paper between a third and a fourth study that we present here as one, because the hypothesis and applied analysis methods were identical.
- 14.
At least, this statement can be found in the work by Kaijanaho [22].
- 15.
The result of the experient was that the additional type annotations of generic Java helped when using an undocumented API – which (again) confirmed the previous findings– but which also showed a situation where generic types reduced the extensibility of an API (see [20]).
- 16.
References
Boehm, B., Rombach, H.D., Zelkowitz, M.V.: Foundations of Empirical Software Engineering: The Legacy of Victor R. Basili. Springer, Heidelberg (2005)
Bracha, G.: Pluggable type systems. In: OOPSLA’04 Workshop on Revival of Dynamic Languages (2004)
Bruce, K.B.: Foundations of Object-Oriented Languages: Types and Semantics. MIT Press, Cambridge (2002)
Cohen, J.: Statistical Power Analysis for the Behavioral Sciences. L. Erlbaum Associates, Hillsdale (1988)
Conover, W.J.: Practical Nonparametric Statistics, 3rd edn. Wiley, New York (1998)
Endrikat, S., Hanenberg, S., Robbes, R., Stefik, A.: How do API documentation and static typing affect API usability? In: 36th International Conference on Software Engineering, ICSE 2014, Hyderabad, India - 31 May–07 June 2014, pp. 632–642 (2014)
Fischer, L., Hanenberg, S.: An empirical investigation of the effects of type systems and code completion on API usability using typescript and javascript in MS visual studio. In: Proceedings of the Dynamic Language Symposium. accepted for publication (2015)
Fisher, R.A.: Statistical Methods for Research Workers. Cosmo Study Guides. Cosmo Publications, New Delhi (1925)
Gannon, J.D.: An experimental evaluation of data type conventions. Commun. ACM 20(8), 584–595 (1977)
Georges, A., Buytaert, D., Eeckhout, L.: Statistically rigorous java performance evaluation. SIGPLAN Not. 42(10), 57–76 (2007)
Glaser, B.G., Strauss, A.L.: The Discovery of Grounded Theory: Strategies for Qualitative Research. Aldine Publishing Company, Chicago (1967). Observations
Halstead, M.H.: Elements of Software Science (Operating and Programming Systems Series). Elsevier Science Inc., New York (1977)
Hanenberg, S.: An experiment about static and dynamic type systems: Doubts about the positive impact of static type systems on development time. In: Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA, pp. 22–35. ACM, New York (2010)
Hanenberg, S.: Faith, hope, and love: An essay on software science’s neglect of human factors. In: Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages And Applications, OOPSLA 2010, pp. 933–946. Reno/Tahoe, Nevada, October 2010
Hanenberg, S.: Why do we know so little about programming languages, and what would have happened if we had known more? In: Proceedings of the 10th ACM Symposium on Dynamic Languages, DLS 2014, p. 1. ACM, New York (2014)
Hanenberg, S., Kleinschmager, S., Robbes, R., Tanter, É., Stefik, A.: An empirical study on the impact of static typing on software maintainability. Empirical Softw. Eng. 19(5), 1335–1382 (2014)
Hanenberg, S., Stefik, A.: On the need to define community agreements for controlled experiments with human subjects - a discussion paper. In: Submitted to PLATEAU 2015 (2015)
Harlow, L.L., Mulaik, S.A., Steiger, J.H.: What If There Were No Significance Tests?. Multivariate Applications Book Series. Lawrence Erlbaum Associates Publishers, Hillsdale (1997)
Hoda, R., Noble, J., Marshall, S.: Developing a grounded theory to explain the practices of self-organizing agile teams. Empirical Softw. Eng. 17(6), 609–639 (2012)
Hoppe, M., Hanenberg, S.: Do developers benefit from generic types? An empirical comparison of generic and raw types in java. In: Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2013, pp. 457–474. ACM, New York (2013)
Juristo, N., Moreno, A.M.: Basics of Software Engineering Experimentation. Springer, Heidelberg (2001)
Kaijanaho, A.-J.: Evidence-based programming language design: A philosophical and methodological exploration. Number 222 in Jyväskylä Studies in Computing. University of Jyväskylä, Finland (2015)
Kirk, R.E.: Experimental Design: Procedures for the Behavioral Sciences Procedures for the Behavioral Sciences. SAGE Publications, Thousand Oaks (2012)
Kleinschmager, S., Hanenberg, S., Robbes, R., Tanter, É., Stefik, A.: Do static type systems improve the maintainability of software systems? An empirical study. In: IEEE 20th International Conference on Program Comprehension, ICPC 2012, Passau, Germany, pp. 153–162, 11–13 June 2012
Ko, A.J., LaToza, T.D., Burnett, M.M.: A practical guide to controlled experiments of software engineering tools with human participants. Empirical Softw. Eng. 20(1), 110–141 (2015)
Laprie, J.-C.: Dependability of computer systems: concepts, limits, improvements. In: Sixth International Symposium on Software Reliability Engineering, ISSRE 1995, Toulouse, France, 24–27 October 1995, pp. 2–11 (1995)
Mayer, C., Hanenberg, S., Robbes, R., Tanter, É., Stefik, A.: An empirical study of the influence of static type systems on the usability of undocumented software. In: Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2012, part of SPLASH 2012, Tucson, AZ, USA, 21–25 October 2012, pp. 683–702. ACM (2012)
McConnell, S.: What does 10x mean? Measuring variations in programmer productivity. In: Oram, A., Wilson, G. (eds.) Making Software: What Really Works, and Why We Believe It, O’Reilly Series, pp. 567–575. O’Reilly Media (2010)
Okon, S., Hanenberg, S.: Can we enforce a benefit for dynamically typed languages in comparison to statically typed ones? A controlled experiment. In: 2016 IEEE 24th International Conference on Program Comprehension (ICPC), pp. 1–10, May 2016
Parnin, C., Bird, C., Murphy-Hill, E.R.: Java generics adoption: How new features are introduced, championed, or ignored. In: Proceedings of the 8th International Working Conference on Mining Software Repositories, MSR 2011 (Co-located with ICSE), Waikiki, Honolulu, HI, USA, 21–28 May 2011, pp. 3–12. IEEE (2011)
Petersen, P., Hanenberg, S., Robbes, R.: An empirical comparison of static and dynamic type systems on API usage in the presence of an IDE: Java vs. groovy with eclipse. In: 22nd International Conference on Program Comprehension, ICPC 2014, Hyderabad, India, 2–3 June 2014, pp. 212–222 (2014)
Pierce, B.C.: Types and Programming Languages. MIT Press, Cambridge (2002)
Popper, K.R.: The Logic of Scientific Discovery, Routledge. 1st English Edition: 1959, Original First Edition (German): Logik der Forschung, published 1935 by Julius Springer, Austria, Vienna (2002)
Prechelt, L., Tichy, W.F.: A controlled experiment to assess the benefits of procedure argument type checking. IEEE Trans. Softw. Eng. 24(4), 302–312 (1998)
Seaman, C.B.: Qualitative methods in empirical studies of software engineering. IEEE Trans. Software Eng. 25(4), 557–572 (1999)
Senn, S.S.: Cross-over Trials in Clinical Research. Statistics in Practice. Wiley, Chichester (1993)
Sheil, B.A.: The psychological study of programming. ACM Comput. Surv. 13(1), 101–120 (1981)
Shneiderman, B., Psychology, S.: Human Factors in Computer and Information Systems. Winthrop Publishers, Cambridge (1980)
Souza, C., Figueiredo, E.: How do programmers use optional typing?: An empirical study. In: Proceedings of the 13th International Conference on Modularity, MODULARITY 2014, pp. 109–120. ACM, New York (2014)
Spiza, S., Hanenberg, S.: Type names without static type checking already improve the usability of APIS (as long as the type names are correct): An empirical study. In: Proceedings of the 13th International Conference on Modularity, MODULARITY 2014, pp. 99–108. ACM, New York (2014)
Stefik, A., Hanenberg, S.: The programming language wars: Questions and responsibilities for the programming language community. In: 2014 Proceedings of the 2014 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software, Onward!, pp. 283–299. ACM, New York (2014)
Stefik, A., Siebert, S.: An empirical investigation into programming language syntax. Trans. Comput. Educ. 13(4), 19:1–19:40 (2013)
Stuchlik, A., Hanenberg, S.: Static vs. dynamic type systems: An empirical study about the relationship between type casts and development time. In: Proceedings of the 7th Symposium on Dynamic Languages, DLS 2011, Portland, Oregon, pp. 97–106. ACM (2011)
Tichy, W.F.: Should computer scientists experiment more? IEEE Comput. 31, 32–40 (1998)
Uesbeck, P.M., Stefik, A., Hanenberg, S., Pedersen, J., Daleiden, P.: An empirical study on the impact of C++ lambdas and programmer experience. In: 38th International Conference on Software Engineering Austin, TX, 14–22 May 2016. to appear (2016)
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M.C., Regnell, B., Wesslén, A.: Experimentation in Software Engineering: An Introduction. Kluwer Academic Publishers, Norwell (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Hanenberg, S. (2017). Empirical, Human-Centered Evaluation of Programming and Programming Language Constructs: Controlled Experiments. In: Cunha, J., Fernandes, J., Lämmel, R., Saraiva, J., Zaytsev, V. (eds) Grand Timely Topics in Software Engineering. GTTSE 2015. Lecture Notes in Computer Science(), vol 10223. Springer, Cham. https://doi.org/10.1007/978-3-319-60074-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-60074-1_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60073-4
Online ISBN: 978-3-319-60074-1
eBook Packages: Computer ScienceComputer Science (R0)