Experimental Research in HCI

Gergle, Darren; Tan, Desney S.

doi:10.1007/978-1-4939-0378-8_9

Darren Gergle³ &
Desney S. Tan⁴

13k Accesses
10 Citations
1 Altmetric

Abstract

In Experiments, researchers set up comparable situations in which they carefully manipulate variables and collect people’s behavior in each condition. Experiments are very effective in determining causation in controlled situations and complement techniques that investigate ongoing behavior in more natural settings. For example, experiments are excellent for determining whether increased audio quality reduces blood pressure of participants in a video conference, and can add important insights to the larger question of when people choose video conferences over audio-only ones.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Much of what makes for good experimental design centers on minimizing what are known as threats to internal validity. Throughout this chapter we address many of these including construct validity, confounds, experimenter biases, selection and dropout biases, and statistical threats.
2.
G*Power 3 is a specialized software tool for power analysis that has a wide number of features and is free for noncommercial use. It is available at http://www.gpower.hhu.de
3.
Here we present the Neyman–Pearson approach to hypothesis testing as opposed to Fisher’s significance testing approach. Lehmann (1993) details the history and distinctions between these two common approaches.
4.
We return to effect sizes and confidence intervals in the section “What constitutes good work,” where we describe how they can be used to better express the magnitude of an effect and its real world implications.
5.
When using measures such as education level or test performance, you have to be cautious of regression to the mean and be sure that you are not assigning participants to levels of your independent variable based on their scores on the dependent variable or something strongly correlated with the DV (also known as sampling on the dependent variable) (Galton, 1886).
6.
http://www.openstreetmap.org
7.
When developing new measures it is important to assess and report their reliability. This can be done using a variety of test–retest assessments.
8.
Sara Kiesler and Jonathon Cummings provided this structured way to think about dependent variables and assessing forms of reliability and validity.
9.
It should be noted that numerous surveys and questionnaires published in the HCI literature were not validated or did not make use of validated measures. While there is still some benefit to consistency in measurement, it is less clear in these cases that the measures validly capture the stated construct.
10.
Lazar and colleagues (Lazar, Feng, & Hochheiser, 2010, pp. 28–30) provide a step-by-step discussion of how to use a random number table to assign participants to conditions in various experimental designs. In addition, numerous online resources exist to generate tables for random assignment to experimental conditions (e.g., http://www.graphpad.com/quickcalcs/randomize1.cfm).
11.
There are numerous online resources for obtaining Latin square tables (e.g., http://statpages.org/latinsq.html).
12.
This approach only balances for what are known as first-order sequential effects. There are still a number of ways in which repeated measurement can be systematically affected such as nonlinear or asymmetric transfer effects. See (Kirk, 2013, Chap. 14) or other literature on Latin square or combinatorial designs for more details.
13.
If your experiment has an odd number of conditions, then two balanced Latin squares are needed. The first square is generated using the same method described in the text, and the second square is a reversal of the first square.
14.
As a side note, Latin square designs are a within-subject version of a general class of designs known as fractional factorial designs. Fractional factorial designs are useful when you want to explore numerous factors at once but do not have the capacity to run hundreds or thousands of participants to cover the complete factorial (see Collins, Dziak, & Li, 2009).
15.
In practice, mixed factorial designs are often used when examining different groups of participants (e.g., demographics, skills). For example, if you are interested in differences in user experience across three different age groups, a between-subjects factor may be age group (teen, adult, elderly), while a within-subjects factor may be three different interaction styles.
16.
Note that common transformations of the data (e.g., logarithmic or reciprocal transformations) can affect the detection and interpretation of interactions. Such transformations are performed when the data deviate from the distributional requirements of statistical tests, and researchers need to be cautious when interpreting the results of transformed data.
17.
For factorial designs with more factors, higher-order interactions can mask lower-order effects.
18.
For more detailed coverage of quasi-experimental designs see (Cook & Campbell, 1979; Shadish et al., 2002).
19.
Time-series approaches have particular statistical concerns that must be addressed when analyzing the data. In particular, they often produce data points that exhibit various forms of autocorrelation, whereas many statistical analyses require that the data points are independent. There are numerous books and manuscripts on the proper treatment of time-series data, many of which reside in the domain of econometrics (Gujarati, 1995, pp. 707–754; Kennedy, 1998, pp. 263–287).
20.
For a detailed discussion of interrupted time-series designs see (Shadish et al., 2002, pp. 171–206).
21.
These are also known as A-B-A or withdrawal designs, and are similar to many approaches used for small-N or single-subject studies with multiple baselines. For further details see (Shadish et al., 2002, pp. 188–190).
22.
We use a two-condition example for ease of exposition.
23.
While we separate these three areas in order to discuss the relative contributions that are made in each, it is not to suggest that these are mutually exclusive categories. In fact, some of the most influential work has all three dimensions. For a more nuanced discussion of the integration of theoretical (basic) and practical (applied) research in an innovation context see Stokes (1997) Pasteur’s Quadrant.
24.
Not all of these studies are strict randomized experiments. For example, the SHARK evaluation does not make use of a control or comparison group. However, many use experimental research techniques to effectively demonstrate the feasibility of their approach.
25.
The framing questions in this section are drawn from Judy Olson’s “10 questions that every graduate student should be able to answer.” The list of questions and related commentary can be found here: http://beki70.wordpress.com/2010/09/30/judy-olsons-10-questions-and-some-commentary/

References

Abelson, R. P. (1995). Statistics as principled argument. Hillsdale, NJ: L. Erlbaum Associates.
Google Scholar
Accot, J., & Zhai, S. (1997). Beyond Fitts’ law: Models for trajectory-based HCI tasks. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 295–302). New York, NY: ACM.
Google Scholar
American Psychological Association. (2010). APA manual (publication manual of the American Psychological Association). Washington, DC: American Psychological Association.
Google Scholar
Bao, P., & Gergle, D. (2009). What’s “this” you say?: The use of local references on distant displays. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1029–1032). New York, NY: ACM.
Google Scholar
Bausell, R. B., & Li, Y.-F. (2002). Power analysis for experimental research: A practical guide for the biological, medical, and social sciences. Cambridge, NY: Cambridge University Press.
Book Google Scholar
Bem, D. J. (2003). Writing the empirical journal article. In J. M. Darley, M. P. Zanna, & H. L. Roediger III (Eds.), The compleat academic: A practical guide for the beginning social scientist (2nd ed.). Washington, DC: American Psychological Association.
Google Scholar
Borenstein, D. M., Hedges, L. V., & Higgins, J. (2009). Introduction to meta-analysis. Chichester: Wiley.
Book MATH Google Scholar
Bradley, J. V. (1958). Complete counterbalancing of immediate sequential effects in a Latin square design. Journal of the American Statistical Association, 53(282), 525–528.
Article MATH Google Scholar
Campbell, D. T., Stanley, J. C., & Gage, N. L. (1963). Experimental and quasi-experimental designs for research. Boston, MA: Houghton Mifflin.
Google Scholar
Carter, S., Mankoff, J., Klemmer, S., & Matthews, T. (2008). Exiting the cleanroom: On ecological validity and ubiquitous computing. Human–Computer Interaction, 23(1), 47–99.
Article Google Scholar
Carver, R. P. (1993). The case against statistical significance testing, revisited. The Journal of Experimental Education, 61(4), 287–292.
Article Google Scholar
Cochran, W. G., & Cox, G. M. (1957). Experimental designs. New York, NY: Wiley.
MATH Google Scholar
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: L. Erlbaum Associates.
MATH Google Scholar
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997–1003.
Article Google Scholar
Collins, L. M., Dziak, J. J., & Li, R. (2009). Design of experiments with multiple independent variables: A resource management perspective on complete and reduced factorial designs. Psychological Methods, 14(3), 202–224.
Article Google Scholar
Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design & analysis issues for field settings. Chicago: Rand McNally.
Google Scholar
Cosley, D., Lam, S. K., Albert, I., Konstan, J. A., & Riedl, J. (2003). Is seeing believing?: How recommender system interfaces affect users’ opinions. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 585–592). New York, NY: ACM.
Google Scholar
Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY: Routledge.
Google Scholar
Cumming, G., & Finch, S. (2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61(4), 532–574.
Article MathSciNet Google Scholar
Czerwinski, M., Tan, D. S., & Robertson, G. G. (2002). Women take a wider view. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 195–202). New York, NY: ACM.
Google Scholar
Dabbish, L., Kraut, R., & Patton, J. (2012). Communication and commitment in an online game team. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 879–888). New York, NY: ACM.
Google Scholar
Ellis, P. D. (2010). The essential guide to effect sizes: Statistical power, meta-analysis, and the interpretation of research results. Cambridge, NY: Cambridge University Press.
Book Google Scholar
Evans, A., & Wobbrock, J. O. (2012). Taming wild behavior: The input observer for text entry and mouse pointing measures from everyday computer use. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1947–1956). New York, NY: ACM.
Google Scholar
Fisher, R. A. (1925). Statistical methods for research workers. Edinburgh: Oliver and Boyd.
Google Scholar
Fisher, R. A., & Yates, F. (1953). Statistical tables for biological, agricultural and medical research. Edinburgh: Oliver & Boyd.
Google Scholar
Galton, F. (1886). Regression towards mediocrity in hereditary stature. The Journal of the Anthropological Institute of Great Britain and Ireland, 15, 246–263.
Article Google Scholar
Gergle, D., Kraut, R. E., & Fussell, S. R. (2013). Using visual information for grounding and awareness in collaborative tasks. Human–Computer Interaction, 28(1), 1–39.
Google Scholar
Grissom, R. J., & Kim, J. J. (2005). Effect sizes for research: A broad practical approach. Mahwah, NJ: Lawrence Erlbaum Associates.
Google Scholar
Gujarati, D. N. (1995). Basic econometrics. New York, NY: McGraw-Hill.
Google Scholar
Gutwin, C., & Penner, R. (2002). Improving interpretation of remote gestures with telepointer traces. In Proceedings of the ACM SIGCHI conference on computer supported cooperative work (pp. 49–57). New York, NY: ACM.
Google Scholar
Hancock, J. T., Landrigan, C., & Silver, C. (2007). Expressing emotion in text-based communication. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 929–932). New York, NY: ACM.
Google Scholar
Hancock, G. R., & Mueller, R. O. (2010). The reviewer’s guide to quantitative methods in the social sciences. New York, NY: Routledge.
Google Scholar
Harrison, C., Tan, D., & Morris, D. (2010). Skinput: Appropriating the body as an input surface. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 453–462). New York, NY: ACM.
Google Scholar
Hornbæk, K. (2011). Some whys and hows of experiments in Human–Computer Interaction. Foundations and Trends in Human–Computer Interaction, 5(4), 299–373.
Article Google Scholar
Johnson, D. H. (1999). The insignificance of statistical significance testing. The Journal of Wildlife Management, 63, 763–772.
Article Google Scholar
Keegan, B., & Gergle, D. (2010). Egalitarians at the gate: One-sided gatekeeping practices in social media. In Proceedings of the ACM SIGCHI conference on computer supported cooperative work (pp. 131–134). New York, NY: ACM.
Google Scholar
Kelley, K., & Preacher, K. J. (2012). On effect size. Psychological Methods, 17(2), 137–152.
Article Google Scholar
Kenny, D.A. (1987). Statistics for the social and behavioral sciences. Canada: Little, Brown and Company.
Google Scholar
Kennedy, P. (1998). A guide to econometrics. Cambridge, MA: The MIT Press.
Google Scholar
Kirk, R. E. (1982). Experimental design: Procedures for the behavioral sciences (2nd ed.). Monterey, CA: Brooks/Cole.
Google Scholar
Kirk, R. E. (2013). Experimental design: Procedures for the behavioral sciences. Thousand Oaks, CA: Sage.
Google Scholar
Kline, R. B. (2004). Beyond significance testing: Reforming data analysis methods in behavioral research. Washington, DC: American Psychological Association.
Book Google Scholar
Kline, R. B. (2013). Beyond significance testing: Statistics reform in the behavioral sciences. Washington, DC: American Psychological Association.
Book Google Scholar
Koedinger, K. R., Anderson, J. R., Hadley, W. H., & Mark, M. A. (1997). Intelligent tutoring goes to school in the big city. International Journal of Artificial Intelligence in Education, 8, 30–43.
Google Scholar
Kohavi, R., Henne, R. M., & Sommerfield, D. (2007). Practical guide to controlled experiments on the web: Listen to your customers not to the hippo. In Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (pp. 959–967). New York, NY: ACM.
Google Scholar
Kohavi, R., & Longbotham, R. (2007). Online experiments: Lessons learned. Computer, 40(9), 103–105.
Article Google Scholar
Kohavi, R., Longbotham, R., & Walker, T. (2010). Online experiments: Practical lessons. Computer, 43(9), 82–85.
Article Google Scholar
Kristensson, P.-O., & Zhai, S. (2004). SHARK^2: A large vocabulary shorthand writing system for pen-based computers. In Proceedings of the ACM symposium on user interface software and technology (pp. 43–52). New York, NY: ACM.
Google Scholar
Kruschke, J. K. (2010). What to believe: Bayesian methods for data analysis. Trends in Cognitive Sciences, 14(7), 293–300.
Article Google Scholar
Lazar, J., Feng, J. H., & Hochheiser, H. (2010). Research methods in human-computer interaction. Chichester: Wiley.
Google Scholar
Lehmann, E. L. (1993). The fisher, Neyman-Pearson theories of testing hypotheses: One theory or two? Journal of the American Statistical Association, 88(424), 1242–1249.
Article MATH MathSciNet Google Scholar
Lieberman, H. (2003). The tyranny of evaluation. Retrieved August 15, 2012, from http://web.media.mit.edu/~lieber/Misc/Tyranny-Evaluation.html
MacKenzie, I. S., & Zhang, S. X. (1999). The design and evaluation of a high-performance soft keyboard. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 25–31). New York, NY: ACM.
Google Scholar
Martin, D. W. (2004). Doing psychology experiments. Belmont, CA: Thomson/Wadsworth.
Google Scholar
McLeod, P. L. (1992). An assessment of the experimental literature on electronic support of group work: Results of a meta-analysis. Human–Computer Interaction, 7(3), 257–280.
Article Google Scholar
Nguyen, D., & Canny, J. (2005). MultiView: Spatially faithful group video conferencing. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 799–808). New York, NY: ACM.
Google Scholar
Olson, J. S., Olson, G. M., Storrøsten, M., & Carter, M. (1993). Groupwork close up: A comparison of the group design process with and without a simple group editor. ACM Transactions on Information Systems, 11(4), 321–348.
Article Google Scholar
Oulasvirta, A. (2009). Field experiments in HCI: Promises and challenges. In P. Saariluoma & H. Isomaki (Eds.), Future interaction design II. New York, NY: Springer.
Google Scholar
Oulasvirta, A., Tamminen, S., Roto, V., & Kuorelahti, J. (2005). Interaction in 4-second bursts: The fragmented nature of attentional resources in mobile HCI. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 919–928). New York, NY: ACM.
Google Scholar
Rogers, J. L., Howard, K. I., & Vessey, J. T. (1993). Using significance tests to evaluate equivalence between two experimental groups. Psychological Bulletin, 113(3), 553.
Article Google Scholar
Rosenthal, R., & Rosnow, R. L. (2008). Essentials of behavioral research: Methods and data analysis (3rd ed.). New York, NY: McGraw-Hill.
Google Scholar
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin.
Google Scholar
Smithson, M. (2003). Confidence intervals. Thousand Oaks, CA: Sage.
Google Scholar
Stokes, D. E. (1997). Pasteur’s quadrant: Basic science and technological innovation. Washington, DC: Brookings Institution Press.
Google Scholar
Tan, D. S., Gergle, D., Scupelli, P., & Pausch, R. (2003). With similar visual angles, larger displays improve spatial performance. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 217–224). New York, NY: ACM.
Google Scholar
Tan, D. S., Gergle, D., Scupelli, P. G., & Pausch, R. (2004). Physically large displays improve path integration in 3D virtual navigation tasks. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 439–446). New York, NY: ACM.
Google Scholar
Tan, D. S., Gergle, D., Scupelli, P., & Pausch, R. (2006). Physically large displays improve performance on spatial tasks. ACM Transactions on Computer Human Interaction, 13(1), 71–99.
Article Google Scholar
Veinott, E. S., Olson, J., Olson, G. M., & Fu, X. (1999). Video helps remote work: Speakers who need to negotiate common ground benefit from seeing each other. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 302–309). New York, NY: ACM.
Google Scholar
Weir, P. (1998). The Truman show. Drama, Sci-Fi.
Google Scholar
Weisband, S., & Kiesler, S. (1996). Self disclosure on computer forms: Meta-analysis and implications. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 3–10). New York, NY: ACM.
Google Scholar
Weiss, N. A. (2008). Introductory statistics. San Francisco, CA: Pearson Addison-Wesley.
Google Scholar
Wigdor, D., Forlines, C., Baudisch, P., Barnwell, J., & Shen, C. (2007). Lucid touch: A see-through mobile device. In Proceedings of the ACM symposium on user interface software and technology (pp. 269–278). New York, NY: ACM.
Google Scholar
Williams, E. J. (1949). Experimental designs balanced for the estimation of residual effects of treatments. Australian Journal of Chemistry, 2(2), 149–168.
Article Google Scholar
Wilson, M. L., Mackay, W., Chi, E., Bernstein, M., & Nichols, J. (2012). RepliCHI SIG: From a panel to a new submission venue for replication. In Proceedings of the ACM conference extended abstracts on human factors in computing systems (pp. 1185–1188). New York, NY: ACM.
Google Scholar
Wobbrock, J. O. (2011). Practical statistics for human-computer interaction: An independent study combining statistics theory and tool know-how. Presented at the Annual workshop of the Human-Computer Interaction Consortium (HCIC ’11). Pacific Grove, CA.
Google Scholar
Wobbrock, J. O., Cutrell, E., Harada, S., & MacKenzie, I. S. (2008). An error model for pointing based on Fitts’ law. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1613–1622). New York, NY: ACM.
Google Scholar
Yee, N., Bailenson, J. N., & Rickertsen, K. (2007). A meta-analysis of the impact of the inclusion and realism of human-like faces on user experiences in interfaces. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1–10). New York, NY: ACM.
Google Scholar
Zhai, S. (2003). Evaluation is the worst form of HCI research except all those other forms that have been tried. Retrieved February 18, 2014, from http://shuminzhai.com/papers/EvaluationDemocracy.htm
Zhai, S., & Kristensson, P.-O. (2003). Shorthand writing on stylus keyboard. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 97–104). New York, NY: ACM.
Google Scholar
Zhu, H., Kraut, R., & Kittur, A. (2012). Effectiveness of shared leadership in online communities. In Proceedings of the ACM SIGCHI conference on computer supported cooperative work (pp. 407–416). New York, NY: ACM.
Google Scholar

Download references

Acknowledgements

We would like to thank Wendy Kellogg, Robert Kraut, Anne Oeldorf-Hirsch, Gary Olson, Judy Olson, and Lauren Scissors for their thoughtful reviews and comments on the chapter.

Author information

Authors and Affiliations

Northwestern University, 2240 Campus Drive, Evanston, IL, 60208, USA
Darren Gergle
Microsoft Research, One Microsoft Way, Redmond, WA, 98052, USA
Desney S. Tan

Authors

Darren Gergle
View author publications
You can also search for this author in PubMed Google Scholar
Desney S. Tan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Darren Gergle .

Editor information

Editors and Affiliations

University of California, Irvine, Irvine, California, USA
Judith S. Olson
Group Social Computing, IBM T.J. Watson Research Center, Yorktown Heights, New York, USA
Wendy A. Kellogg

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gergle, D., Tan, D.S. (2014). Experimental Research in HCI. In: Olson, J., Kellogg, W. (eds) Ways of Knowing in HCI. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0378-8_9

Download citation

DOI: https://doi.org/10.1007/978-1-4939-0378-8_9
Published: 20 March 2014
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-0377-1
Online ISBN: 978-1-4939-0378-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics