Abstract
In Experiments, researchers set up comparable situations in which they carefully manipulate variables and collect people’s behavior in each condition. Experiments are very effective in determining causation in controlled situations and complement techniques that investigate ongoing behavior in more natural settings. For example, experiments are excellent for determining whether increased audio quality reduces blood pressure of participants in a video conference, and can add important insights to the larger question of when people choose video conferences over audio-only ones.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Much of what makes for good experimental design centers on minimizing what are known as threats to internal validity. Throughout this chapter we address many of these including construct validity, confounds, experimenter biases, selection and dropout biases, and statistical threats.
- 2.
G*Power 3 is a specialized software tool for power analysis that has a wide number of features and is free for noncommercial use. It is available at http://www.gpower.hhu.de
- 3.
Here we present the Neyman–Pearson approach to hypothesis testing as opposed to Fisher’s significance testing approach. Lehmann (1993) details the history and distinctions between these two common approaches.
- 4.
We return to effect sizes and confidence intervals in the section “What constitutes good work,” where we describe how they can be used to better express the magnitude of an effect and its real world implications.
- 5.
When using measures such as education level or test performance, you have to be cautious of regression to the mean and be sure that you are not assigning participants to levels of your independent variable based on their scores on the dependent variable or something strongly correlated with the DV (also known as sampling on the dependent variable) (Galton, 1886).
- 6.
- 7.
When developing new measures it is important to assess and report their reliability. This can be done using a variety of test–retest assessments.
- 8.
Sara Kiesler and Jonathon Cummings provided this structured way to think about dependent variables and assessing forms of reliability and validity.
- 9.
It should be noted that numerous surveys and questionnaires published in the HCI literature were not validated or did not make use of validated measures. While there is still some benefit to consistency in measurement, it is less clear in these cases that the measures validly capture the stated construct.
- 10.
Lazar and colleagues (Lazar, Feng, & Hochheiser, 2010, pp. 28–30) provide a step-by-step discussion of how to use a random number table to assign participants to conditions in various experimental designs. In addition, numerous online resources exist to generate tables for random assignment to experimental conditions (e.g., http://www.graphpad.com/quickcalcs/randomize1.cfm).
- 11.
There are numerous online resources for obtaining Latin square tables (e.g., http://statpages.org/latinsq.html).
- 12.
This approach only balances for what are known as first-order sequential effects. There are still a number of ways in which repeated measurement can be systematically affected such as nonlinear or asymmetric transfer effects. See (Kirk, 2013, Chap. 14) or other literature on Latin square or combinatorial designs for more details.
- 13.
If your experiment has an odd number of conditions, then two balanced Latin squares are needed. The first square is generated using the same method described in the text, and the second square is a reversal of the first square.
- 14.
As a side note, Latin square designs are a within-subject version of a general class of designs known as fractional factorial designs. Fractional factorial designs are useful when you want to explore numerous factors at once but do not have the capacity to run hundreds or thousands of participants to cover the complete factorial (see Collins, Dziak, & Li, 2009).
- 15.
In practice, mixed factorial designs are often used when examining different groups of participants (e.g., demographics, skills). For example, if you are interested in differences in user experience across three different age groups, a between-subjects factor may be age group (teen, adult, elderly), while a within-subjects factor may be three different interaction styles.
- 16.
Note that common transformations of the data (e.g., logarithmic or reciprocal transformations) can affect the detection and interpretation of interactions. Such transformations are performed when the data deviate from the distributional requirements of statistical tests, and researchers need to be cautious when interpreting the results of transformed data.
- 17.
For factorial designs with more factors, higher-order interactions can mask lower-order effects.
- 18.
- 19.
Time-series approaches have particular statistical concerns that must be addressed when analyzing the data. In particular, they often produce data points that exhibit various forms of autocorrelation, whereas many statistical analyses require that the data points are independent. There are numerous books and manuscripts on the proper treatment of time-series data, many of which reside in the domain of econometrics (Gujarati, 1995, pp. 707–754; Kennedy, 1998, pp. 263–287).
- 20.
For a detailed discussion of interrupted time-series designs see (Shadish et al., 2002, pp. 171–206).
- 21.
These are also known as A-B-A or withdrawal designs, and are similar to many approaches used for small-N or single-subject studies with multiple baselines. For further details see (Shadish et al., 2002, pp. 188–190).
- 22.
We use a two-condition example for ease of exposition.
- 23.
While we separate these three areas in order to discuss the relative contributions that are made in each, it is not to suggest that these are mutually exclusive categories. In fact, some of the most influential work has all three dimensions. For a more nuanced discussion of the integration of theoretical (basic) and practical (applied) research in an innovation context see Stokes (1997) Pasteur’s Quadrant.
- 24.
Not all of these studies are strict randomized experiments. For example, the SHARK evaluation does not make use of a control or comparison group. However, many use experimental research techniques to effectively demonstrate the feasibility of their approach.
- 25.
The framing questions in this section are drawn from Judy Olson’s “10 questions that every graduate student should be able to answer.” The list of questions and related commentary can be found here: http://beki70.wordpress.com/2010/09/30/judy-olsons-10-questions-and-some-commentary/
References
Abelson, R. P. (1995). Statistics as principled argument. Hillsdale, NJ: L. Erlbaum Associates.
Accot, J., & Zhai, S. (1997). Beyond Fitts’ law: Models for trajectory-based HCI tasks. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 295–302). New York, NY: ACM.
American Psychological Association. (2010). APA manual (publication manual of the American Psychological Association). Washington, DC: American Psychological Association.
Bao, P., & Gergle, D. (2009). What’s “this” you say?: The use of local references on distant displays. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1029–1032). New York, NY: ACM.
Bausell, R. B., & Li, Y.-F. (2002). Power analysis for experimental research: A practical guide for the biological, medical, and social sciences. Cambridge, NY: Cambridge University Press.
Bem, D. J. (2003). Writing the empirical journal article. In J. M. Darley, M. P. Zanna, & H. L. Roediger III (Eds.), The compleat academic: A practical guide for the beginning social scientist (2nd ed.). Washington, DC: American Psychological Association.
Borenstein, D. M., Hedges, L. V., & Higgins, J. (2009). Introduction to meta-analysis. Chichester: Wiley.
Bradley, J. V. (1958). Complete counterbalancing of immediate sequential effects in a Latin square design. Journal of the American Statistical Association, 53(282), 525–528.
Campbell, D. T., Stanley, J. C., & Gage, N. L. (1963). Experimental and quasi-experimental designs for research. Boston, MA: Houghton Mifflin.
Carter, S., Mankoff, J., Klemmer, S., & Matthews, T. (2008). Exiting the cleanroom: On ecological validity and ubiquitous computing. Human–Computer Interaction, 23(1), 47–99.
Carver, R. P. (1993). The case against statistical significance testing, revisited. The Journal of Experimental Education, 61(4), 287–292.
Cochran, W. G., & Cox, G. M. (1957). Experimental designs. New York, NY: Wiley.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: L. Erlbaum Associates.
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997–1003.
Collins, L. M., Dziak, J. J., & Li, R. (2009). Design of experiments with multiple independent variables: A resource management perspective on complete and reduced factorial designs. Psychological Methods, 14(3), 202–224.
Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design & analysis issues for field settings. Chicago: Rand McNally.
Cosley, D., Lam, S. K., Albert, I., Konstan, J. A., & Riedl, J. (2003). Is seeing believing?: How recommender system interfaces affect users’ opinions. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 585–592). New York, NY: ACM.
Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY: Routledge.
Cumming, G., & Finch, S. (2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61(4), 532–574.
Czerwinski, M., Tan, D. S., & Robertson, G. G. (2002). Women take a wider view. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 195–202). New York, NY: ACM.
Dabbish, L., Kraut, R., & Patton, J. (2012). Communication and commitment in an online game team. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 879–888). New York, NY: ACM.
Ellis, P. D. (2010). The essential guide to effect sizes: Statistical power, meta-analysis, and the interpretation of research results. Cambridge, NY: Cambridge University Press.
Evans, A., & Wobbrock, J. O. (2012). Taming wild behavior: The input observer for text entry and mouse pointing measures from everyday computer use. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1947–1956). New York, NY: ACM.
Fisher, R. A. (1925). Statistical methods for research workers. Edinburgh: Oliver and Boyd.
Fisher, R. A., & Yates, F. (1953). Statistical tables for biological, agricultural and medical research. Edinburgh: Oliver & Boyd.
Galton, F. (1886). Regression towards mediocrity in hereditary stature. The Journal of the Anthropological Institute of Great Britain and Ireland, 15, 246–263.
Gergle, D., Kraut, R. E., & Fussell, S. R. (2013). Using visual information for grounding and awareness in collaborative tasks. Human–Computer Interaction, 28(1), 1–39.
Grissom, R. J., & Kim, J. J. (2005). Effect sizes for research: A broad practical approach. Mahwah, NJ: Lawrence Erlbaum Associates.
Gujarati, D. N. (1995). Basic econometrics. New York, NY: McGraw-Hill.
Gutwin, C., & Penner, R. (2002). Improving interpretation of remote gestures with telepointer traces. In Proceedings of the ACM SIGCHI conference on computer supported cooperative work (pp. 49–57). New York, NY: ACM.
Hancock, J. T., Landrigan, C., & Silver, C. (2007). Expressing emotion in text-based communication. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 929–932). New York, NY: ACM.
Hancock, G. R., & Mueller, R. O. (2010). The reviewer’s guide to quantitative methods in the social sciences. New York, NY: Routledge.
Harrison, C., Tan, D., & Morris, D. (2010). Skinput: Appropriating the body as an input surface. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 453–462). New York, NY: ACM.
Hornbæk, K. (2011). Some whys and hows of experiments in Human–Computer Interaction. Foundations and Trends in Human–Computer Interaction, 5(4), 299–373.
Johnson, D. H. (1999). The insignificance of statistical significance testing. The Journal of Wildlife Management, 63, 763–772.
Keegan, B., & Gergle, D. (2010). Egalitarians at the gate: One-sided gatekeeping practices in social media. In Proceedings of the ACM SIGCHI conference on computer supported cooperative work (pp. 131–134). New York, NY: ACM.
Kelley, K., & Preacher, K. J. (2012). On effect size. Psychological Methods, 17(2), 137–152.
Kenny, D.A. (1987). Statistics for the social and behavioral sciences. Canada: Little, Brown and Company.
Kennedy, P. (1998). A guide to econometrics. Cambridge, MA: The MIT Press.
Kirk, R. E. (1982). Experimental design: Procedures for the behavioral sciences (2nd ed.). Monterey, CA: Brooks/Cole.
Kirk, R. E. (2013). Experimental design: Procedures for the behavioral sciences. Thousand Oaks, CA: Sage.
Kline, R. B. (2004). Beyond significance testing: Reforming data analysis methods in behavioral research. Washington, DC: American Psychological Association.
Kline, R. B. (2013). Beyond significance testing: Statistics reform in the behavioral sciences. Washington, DC: American Psychological Association.
Koedinger, K. R., Anderson, J. R., Hadley, W. H., & Mark, M. A. (1997). Intelligent tutoring goes to school in the big city. International Journal of Artificial Intelligence in Education, 8, 30–43.
Kohavi, R., Henne, R. M., & Sommerfield, D. (2007). Practical guide to controlled experiments on the web: Listen to your customers not to the hippo. In Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (pp. 959–967). New York, NY: ACM.
Kohavi, R., & Longbotham, R. (2007). Online experiments: Lessons learned. Computer, 40(9), 103–105.
Kohavi, R., Longbotham, R., & Walker, T. (2010). Online experiments: Practical lessons. Computer, 43(9), 82–85.
Kristensson, P.-O., & Zhai, S. (2004). SHARK^2: A large vocabulary shorthand writing system for pen-based computers. In Proceedings of the ACM symposium on user interface software and technology (pp. 43–52). New York, NY: ACM.
Kruschke, J. K. (2010). What to believe: Bayesian methods for data analysis. Trends in Cognitive Sciences, 14(7), 293–300.
Lazar, J., Feng, J. H., & Hochheiser, H. (2010). Research methods in human-computer interaction. Chichester: Wiley.
Lehmann, E. L. (1993). The fisher, Neyman-Pearson theories of testing hypotheses: One theory or two? Journal of the American Statistical Association, 88(424), 1242–1249.
Lieberman, H. (2003). The tyranny of evaluation. Retrieved August 15, 2012, from http://web.media.mit.edu/~lieber/Misc/Tyranny-Evaluation.html
MacKenzie, I. S., & Zhang, S. X. (1999). The design and evaluation of a high-performance soft keyboard. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 25–31). New York, NY: ACM.
Martin, D. W. (2004). Doing psychology experiments. Belmont, CA: Thomson/Wadsworth.
McLeod, P. L. (1992). An assessment of the experimental literature on electronic support of group work: Results of a meta-analysis. Human–Computer Interaction, 7(3), 257–280.
Nguyen, D., & Canny, J. (2005). MultiView: Spatially faithful group video conferencing. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 799–808). New York, NY: ACM.
Olson, J. S., Olson, G. M., Storrøsten, M., & Carter, M. (1993). Groupwork close up: A comparison of the group design process with and without a simple group editor. ACM Transactions on Information Systems, 11(4), 321–348.
Oulasvirta, A. (2009). Field experiments in HCI: Promises and challenges. In P. Saariluoma & H. Isomaki (Eds.), Future interaction design II. New York, NY: Springer.
Oulasvirta, A., Tamminen, S., Roto, V., & Kuorelahti, J. (2005). Interaction in 4-second bursts: The fragmented nature of attentional resources in mobile HCI. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 919–928). New York, NY: ACM.
Rogers, J. L., Howard, K. I., & Vessey, J. T. (1993). Using significance tests to evaluate equivalence between two experimental groups. Psychological Bulletin, 113(3), 553.
Rosenthal, R., & Rosnow, R. L. (2008). Essentials of behavioral research: Methods and data analysis (3rd ed.). New York, NY: McGraw-Hill.
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin.
Smithson, M. (2003). Confidence intervals. Thousand Oaks, CA: Sage.
Stokes, D. E. (1997). Pasteur’s quadrant: Basic science and technological innovation. Washington, DC: Brookings Institution Press.
Tan, D. S., Gergle, D., Scupelli, P., & Pausch, R. (2003). With similar visual angles, larger displays improve spatial performance. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 217–224). New York, NY: ACM.
Tan, D. S., Gergle, D., Scupelli, P. G., & Pausch, R. (2004). Physically large displays improve path integration in 3D virtual navigation tasks. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 439–446). New York, NY: ACM.
Tan, D. S., Gergle, D., Scupelli, P., & Pausch, R. (2006). Physically large displays improve performance on spatial tasks. ACM Transactions on Computer Human Interaction, 13(1), 71–99.
Veinott, E. S., Olson, J., Olson, G. M., & Fu, X. (1999). Video helps remote work: Speakers who need to negotiate common ground benefit from seeing each other. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 302–309). New York, NY: ACM.
Weir, P. (1998). The Truman show. Drama, Sci-Fi.
Weisband, S., & Kiesler, S. (1996). Self disclosure on computer forms: Meta-analysis and implications. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 3–10). New York, NY: ACM.
Weiss, N. A. (2008). Introductory statistics. San Francisco, CA: Pearson Addison-Wesley.
Wigdor, D., Forlines, C., Baudisch, P., Barnwell, J., & Shen, C. (2007). Lucid touch: A see-through mobile device. In Proceedings of the ACM symposium on user interface software and technology (pp. 269–278). New York, NY: ACM.
Williams, E. J. (1949). Experimental designs balanced for the estimation of residual effects of treatments. Australian Journal of Chemistry, 2(2), 149–168.
Wilson, M. L., Mackay, W., Chi, E., Bernstein, M., & Nichols, J. (2012). RepliCHI SIG: From a panel to a new submission venue for replication. In Proceedings of the ACM conference extended abstracts on human factors in computing systems (pp. 1185–1188). New York, NY: ACM.
Wobbrock, J. O. (2011). Practical statistics for human-computer interaction: An independent study combining statistics theory and tool know-how. Presented at the Annual workshop of the Human-Computer Interaction Consortium (HCIC ’11). Pacific Grove, CA.
Wobbrock, J. O., Cutrell, E., Harada, S., & MacKenzie, I. S. (2008). An error model for pointing based on Fitts’ law. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1613–1622). New York, NY: ACM.
Yee, N., Bailenson, J. N., & Rickertsen, K. (2007). A meta-analysis of the impact of the inclusion and realism of human-like faces on user experiences in interfaces. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1–10). New York, NY: ACM.
Zhai, S. (2003). Evaluation is the worst form of HCI research except all those other forms that have been tried. Retrieved February 18, 2014, from http://shuminzhai.com/papers/EvaluationDemocracy.htm
Zhai, S., & Kristensson, P.-O. (2003). Shorthand writing on stylus keyboard. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 97–104). New York, NY: ACM.
Zhu, H., Kraut, R., & Kittur, A. (2012). Effectiveness of shared leadership in online communities. In Proceedings of the ACM SIGCHI conference on computer supported cooperative work (pp. 407–416). New York, NY: ACM.
Acknowledgements
We would like to thank Wendy Kellogg, Robert Kraut, Anne Oeldorf-Hirsch, Gary Olson, Judy Olson, and Lauren Scissors for their thoughtful reviews and comments on the chapter.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this chapter
Cite this chapter
Gergle, D., Tan, D.S. (2014). Experimental Research in HCI. In: Olson, J., Kellogg, W. (eds) Ways of Knowing in HCI. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0378-8_9
Download citation
DOI: https://doi.org/10.1007/978-1-4939-0378-8_9
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-0377-1
Online ISBN: 978-1-4939-0378-8
eBook Packages: Computer ScienceComputer Science (R0)