Empirical, Human-Centered Evaluation of Programming and Programming Language Constructs: Controlled Experiments

Hanenberg, Stefan

doi:10.1007/978-3-319-60074-1_3

Stefan Hanenberg¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10223))

Included in the following conference series:

International Summer School on Generative and Transformational Techniques in Software Engineering

750 Accesses
3 Citations

Abstract

While the application of empirical methods has a long tradition in domains such as performance evaluation, the application of empirical methods with human subjects in order to evaluate the usability of programming techniques, programming language constructs or whole programming languages is relatively new (or, at least, running such studies is becoming more common). Despite the urgent need for such usability studies, few researchers are well-versed in such techniques, certainly when compared to the large number of researchers inventing new programming techniques or formal approaches. The main goal of this text is to introduce empirical methods for evaluating programming language constructs, with a strong focus on quantitative methods. The paper concludes with by explaining how and why a series of controlled experiments were gradually designed to study the usability of type systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
According to Hanenberg [14] the phrase software science is being used in order to describe the research related to software artifacts in general. While the term software engineering is used much more often, especially the programming language community or people doing performance measurements feel that this term does not adequately describe their domains. We think that the term software science, although originally used by Halstead [12] for something different, is more appropriate to describe the whole domain of software-related research.
2.
Sheil called the study of programming as practiced by computer science even ‘an unholy mixture of mathematics, literary criticism, and folklore.’ 1 [37, p. 102].
3.
It should be noted that just recently a study appeared which was not able to reveal a measurable benefit of lambda expressions in C++. Instead, the study showed at least for non-professional programmers a measurable disadvantage (see [45]).
4.
Again, to get an impression of how less it is main-stream: according to Kaijanaho the number of randomized controlled trials on human-factors comparative evaluation of language features up to 2012 was 22 (see [22, p. 143]).
5.
Additionally, the paper collection of Victor Basili by Boehm et al. [1] gives a larger set of examples about performed controlled trials.
6.
www.ibm.com/software/analytics/spss/.
7.
https://www.r-project.org/.
8.
The corresponding non-parametric tests [5] are valid here, too, i.e. it is possible to analyse the crossover trial using a U-test and a Wilcoxon-test.
9.
The rather arbitrary choice of .05 is probably commonly used because it has been originally proposed by Fisher [8] although some other disciplines use a different alpha level.
10.
The points are word-by-word citatations from Souza and Figueiredo [39].
11.
Two other questions are formulated, which are skipped were for reasons of simplification.
12.
It is understandable that the authors do not run inference-statistical methods: a huge number of different words is being tested and it sounds plausible, that traditional approaches from inference statistics would not have revealed differences at all – because of the high number of variables.
13.
The authors distinguish in his paper between a third and a fourth study that we present here as one, because the hypothesis and applied analysis methods were identical.
14.
At least, this statement can be found in the work by Kaijanaho [22].
15.
The result of the experient was that the additional type annotations of generic Java helped when using an undocumented API – which (again) confirmed the previous findings– but which also showed a situation where generic types reduced the extensibility of an API (see [20]).
16.
Which was the results of the replication study by Kleinschmager et al. [16, 24].

References

Boehm, B., Rombach, H.D., Zelkowitz, M.V.: Foundations of Empirical Software Engineering: The Legacy of Victor R. Basili. Springer, Heidelberg (2005)
Book Google Scholar
Bracha, G.: Pluggable type systems. In: OOPSLA’04 Workshop on Revival of Dynamic Languages (2004)
Google Scholar
Bruce, K.B.: Foundations of Object-Oriented Languages: Types and Semantics. MIT Press, Cambridge (2002)
Google Scholar
Cohen, J.: Statistical Power Analysis for the Behavioral Sciences. L. Erlbaum Associates, Hillsdale (1988)
MATH Google Scholar
Conover, W.J.: Practical Nonparametric Statistics, 3rd edn. Wiley, New York (1998)
Google Scholar
Endrikat, S., Hanenberg, S., Robbes, R., Stefik, A.: How do API documentation and static typing affect API usability? In: 36th International Conference on Software Engineering, ICSE 2014, Hyderabad, India - 31 May–07 June 2014, pp. 632–642 (2014)
Google Scholar
Fischer, L., Hanenberg, S.: An empirical investigation of the effects of type systems and code completion on API usability using typescript and javascript in MS visual studio. In: Proceedings of the Dynamic Language Symposium. accepted for publication (2015)
Google Scholar
Fisher, R.A.: Statistical Methods for Research Workers. Cosmo Study Guides. Cosmo Publications, New Delhi (1925)
MATH Google Scholar
Gannon, J.D.: An experimental evaluation of data type conventions. Commun. ACM 20(8), 584–595 (1977)
Article MATH Google Scholar
Georges, A., Buytaert, D., Eeckhout, L.: Statistically rigorous java performance evaluation. SIGPLAN Not. 42(10), 57–76 (2007)
Article Google Scholar
Glaser, B.G., Strauss, A.L.: The Discovery of Grounded Theory: Strategies for Qualitative Research. Aldine Publishing Company, Chicago (1967). Observations
Google Scholar
Halstead, M.H.: Elements of Software Science (Operating and Programming Systems Series). Elsevier Science Inc., New York (1977)
MATH Google Scholar
Hanenberg, S.: An experiment about static and dynamic type systems: Doubts about the positive impact of static type systems on development time. In: Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA, pp. 22–35. ACM, New York (2010)
Google Scholar
Hanenberg, S.: Faith, hope, and love: An essay on software science’s neglect of human factors. In: Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages And Applications, OOPSLA 2010, pp. 933–946. Reno/Tahoe, Nevada, October 2010
Google Scholar
Hanenberg, S.: Why do we know so little about programming languages, and what would have happened if we had known more? In: Proceedings of the 10th ACM Symposium on Dynamic Languages, DLS 2014, p. 1. ACM, New York (2014)
Google Scholar
Hanenberg, S., Kleinschmager, S., Robbes, R., Tanter, É., Stefik, A.: An empirical study on the impact of static typing on software maintainability. Empirical Softw. Eng. 19(5), 1335–1382 (2014)
Article Google Scholar
Hanenberg, S., Stefik, A.: On the need to define community agreements for controlled experiments with human subjects - a discussion paper. In: Submitted to PLATEAU 2015 (2015)
Google Scholar
Harlow, L.L., Mulaik, S.A., Steiger, J.H.: What If There Were No Significance Tests?. Multivariate Applications Book Series. Lawrence Erlbaum Associates Publishers, Hillsdale (1997)
Google Scholar
Hoda, R., Noble, J., Marshall, S.: Developing a grounded theory to explain the practices of self-organizing agile teams. Empirical Softw. Eng. 17(6), 609–639 (2012)
Article Google Scholar
Hoppe, M., Hanenberg, S.: Do developers benefit from generic types? An empirical comparison of generic and raw types in java. In: Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2013, pp. 457–474. ACM, New York (2013)
Google Scholar
Juristo, N., Moreno, A.M.: Basics of Software Engineering Experimentation. Springer, Heidelberg (2001)
Book MATH Google Scholar
Kaijanaho, A.-J.: Evidence-based programming language design: A philosophical and methodological exploration. Number 222 in Jyväskylä Studies in Computing. University of Jyväskylä, Finland (2015)
Google Scholar
Kirk, R.E.: Experimental Design: Procedures for the Behavioral Sciences Procedures for the Behavioral Sciences. SAGE Publications, Thousand Oaks (2012)
Book MATH Google Scholar
Kleinschmager, S., Hanenberg, S., Robbes, R., Tanter, É., Stefik, A.: Do static type systems improve the maintainability of software systems? An empirical study. In: IEEE 20th International Conference on Program Comprehension, ICPC 2012, Passau, Germany, pp. 153–162, 11–13 June 2012
Google Scholar
Ko, A.J., LaToza, T.D., Burnett, M.M.: A practical guide to controlled experiments of software engineering tools with human participants. Empirical Softw. Eng. 20(1), 110–141 (2015)
Article Google Scholar
Laprie, J.-C.: Dependability of computer systems: concepts, limits, improvements. In: Sixth International Symposium on Software Reliability Engineering, ISSRE 1995, Toulouse, France, 24–27 October 1995, pp. 2–11 (1995)
Google Scholar
Mayer, C., Hanenberg, S., Robbes, R., Tanter, É., Stefik, A.: An empirical study of the influence of static type systems on the usability of undocumented software. In: Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2012, part of SPLASH 2012, Tucson, AZ, USA, 21–25 October 2012, pp. 683–702. ACM (2012)
Google Scholar
McConnell, S.: What does 10x mean? Measuring variations in programmer productivity. In: Oram, A., Wilson, G. (eds.) Making Software: What Really Works, and Why We Believe It, O’Reilly Series, pp. 567–575. O’Reilly Media (2010)
Google Scholar
Okon, S., Hanenberg, S.: Can we enforce a benefit for dynamically typed languages in comparison to statically typed ones? A controlled experiment. In: 2016 IEEE 24th International Conference on Program Comprehension (ICPC), pp. 1–10, May 2016
Google Scholar
Parnin, C., Bird, C., Murphy-Hill, E.R.: Java generics adoption: How new features are introduced, championed, or ignored. In: Proceedings of the 8th International Working Conference on Mining Software Repositories, MSR 2011 (Co-located with ICSE), Waikiki, Honolulu, HI, USA, 21–28 May 2011, pp. 3–12. IEEE (2011)
Google Scholar
Petersen, P., Hanenberg, S., Robbes, R.: An empirical comparison of static and dynamic type systems on API usage in the presence of an IDE: Java vs. groovy with eclipse. In: 22nd International Conference on Program Comprehension, ICPC 2014, Hyderabad, India, 2–3 June 2014, pp. 212–222 (2014)
Google Scholar
Pierce, B.C.: Types and Programming Languages. MIT Press, Cambridge (2002)
MATH Google Scholar
Popper, K.R.: The Logic of Scientific Discovery, Routledge. 1st English Edition: 1959, Original First Edition (German): Logik der Forschung, published 1935 by Julius Springer, Austria, Vienna (2002)
Google Scholar
Prechelt, L., Tichy, W.F.: A controlled experiment to assess the benefits of procedure argument type checking. IEEE Trans. Softw. Eng. 24(4), 302–312 (1998)
Article Google Scholar
Seaman, C.B.: Qualitative methods in empirical studies of software engineering. IEEE Trans. Software Eng. 25(4), 557–572 (1999)
Article Google Scholar
Senn, S.S.: Cross-over Trials in Clinical Research. Statistics in Practice. Wiley, Chichester (1993)
Google Scholar
Sheil, B.A.: The psychological study of programming. ACM Comput. Surv. 13(1), 101–120 (1981)
Article Google Scholar
Shneiderman, B., Psychology, S.: Human Factors in Computer and Information Systems. Winthrop Publishers, Cambridge (1980)
Google Scholar
Souza, C., Figueiredo, E.: How do programmers use optional typing?: An empirical study. In: Proceedings of the 13th International Conference on Modularity, MODULARITY 2014, pp. 109–120. ACM, New York (2014)
Google Scholar
Spiza, S., Hanenberg, S.: Type names without static type checking already improve the usability of APIS (as long as the type names are correct): An empirical study. In: Proceedings of the 13th International Conference on Modularity, MODULARITY 2014, pp. 99–108. ACM, New York (2014)
Google Scholar
Stefik, A., Hanenberg, S.: The programming language wars: Questions and responsibilities for the programming language community. In: 2014 Proceedings of the 2014 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software, Onward!, pp. 283–299. ACM, New York (2014)
Google Scholar
Stefik, A., Siebert, S.: An empirical investigation into programming language syntax. Trans. Comput. Educ. 13(4), 19:1–19:40 (2013)
Article Google Scholar
Stuchlik, A., Hanenberg, S.: Static vs. dynamic type systems: An empirical study about the relationship between type casts and development time. In: Proceedings of the 7th Symposium on Dynamic Languages, DLS 2011, Portland, Oregon, pp. 97–106. ACM (2011)
Google Scholar
Tichy, W.F.: Should computer scientists experiment more? IEEE Comput. 31, 32–40 (1998)
Article Google Scholar
Uesbeck, P.M., Stefik, A., Hanenberg, S., Pedersen, J., Daleiden, P.: An empirical study on the impact of C++ lambdas and programmer experience. In: 38th International Conference on Software Engineering Austin, TX, 14–22 May 2016. to appear (2016)
Google Scholar
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M.C., Regnell, B., Wesslén, A.: Experimentation in Software Engineering: An Introduction. Kluwer Academic Publishers, Norwell (2000)
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

Paluno – The Ruhr Instistute for Software Technology, University of Duisburg-Essen, Essen, Germany
Stefan Hanenberg

Authors

Stefan Hanenberg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefan Hanenberg .

Editor information

Editors and Affiliations

NOVA LINCS, Universidade nova de Lisboa , Lisbon, Portugal
Jácome Cunha
CISUC, Universida de Coimbra, Coimbra, Portugal
João P. Fernandes
Institut für Informatik, Universität Koblenz-Landau, Koblenz, Rheinland-Pfalz, Germany
Ralf Lämmel
Departamento de Informática, universidate do Minho, Braga, Portugal
João Saraiva
Universiteit van Amsterdam, Amsterdam, Holy See (Vatican City State)
Vadim Zaytsev

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hanenberg, S. (2017). Empirical, Human-Centered Evaluation of Programming and Programming Language Constructs: Controlled Experiments. In: Cunha, J., Fernandes, J., Lämmel, R., Saraiva, J., Zaytsev, V. (eds) Grand Timely Topics in Software Engineering. GTTSE 2015. Lecture Notes in Computer Science(), vol 10223. Springer, Cham. https://doi.org/10.1007/978-3-319-60074-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-60074-1_3
Published: 29 June 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60073-4
Online ISBN: 978-3-319-60074-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics