Skip to main content

Comparing Psychometric and Behavioral Predictors of Compliance During Human-AI Interactions

  • Conference paper
  • First Online:
Persuasive Technology (PERSUASIVE 2023)

Abstract

Optimization of human-AI teams hinges on the AI’s ability to tailor its interaction to individual human teammates. A common hypothesis in adaptive AI research is that minor differences in people’s predisposition to trust can significantly impact their likelihood of complying with recommendations from the AI. Predisposition to trust is often measured with self-report inventories that are administered before interactions. We benchmark a popular measure of this kind against behavioral predictors of compliance. We find that the inventory is a less effective predictor of compliance than the behavioral measures in datasets taken from three previous research projects. This suggests a general property that individual differences in initial behavior are more predictive than differences in self-reported trust attitudes. This result also shows a potential for easily accessible behavioral measures to provide an AI with more accurate models without the use of (often costly) survey instruments.

Part of the effort behind this work was sponsored by the Defense Advanced Research Projects Agency (DARPA) under contract number W911NF2010011. The content of the information does not necessarily reflect the position or the policy of the U.S. Government or the Defense Advanced Research Projects Agency, and no official endorsements should be inferred.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Controlling for mission did not meaningfully change the interpretation of the results.

  2. 2.

    A likelihood ratio test is another method of comparing such models and will produce similar insights.

References

  1. Ajzen, I.: The theory of planned behavior. Organ. Behav. Hum. Decis. Process. 50(2), 179–211 (1991)

    Article  Google Scholar 

  2. Aliasghari, P., Ghafurian, M., Nehaniv, C.L., Dautenhahn, K.: Effect of domestic trainee robots’ errors on human teachers’ trust. In: Proceedings of the IEEE International Conference on Robot & Human Interactive Communication (RO-MAN), pp. 81–88. IEEE (2021)

    Google Scholar 

  3. Aliasghari, P., Ghafurian, M., Nehaniv, C.L., Dautenhahn, K.: How do different modes of verbal expressiveness of a student robot making errors impact human teachers’ intention to use the robot? In: Proceedings of the 9th International Conference on Human-Agent Interaction, pp. 21–30 (2021)

    Google Scholar 

  4. Amershi, S., et al.: Guidelines for human-AI interaction. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, pp. 1–13 (2019)

    Google Scholar 

  5. Aroyo, A.M., Rea, F., Sandini, G., Sciutti, A.: Trust and social engineering in human robot interaction: will a robot make you disclose sensitive information, conform to its recommendations or gamble? IEEE Robot. Autom. Lett. 3(4), 3701–3708 (2018)

    Article  Google Scholar 

  6. Ashleigh, M.J., Higgs, M., Dulewicz, V.: A new propensity to trust scale and its relationship with individual well-being: implications for HRM policies and practices. Hum. Resour. Manage. J. 22(4), 360–376 (2012)

    Article  Google Scholar 

  7. Bargain, O., Aminjonov, U.: Trust and compliance to public health policies in times of covid-19. J. Publ. Econ. 192, 104316 (2020)

    Article  Google Scholar 

  8. Barnes, M.J., Wang, N., Pynadath, D.V., Chen, J.Y.: Human-agent bidirectional transparency. In: Trust in Human-Robot Interaction, pp. 209–232. Elsevier (2021)

    Google Scholar 

  9. Brentano, F.: Psychology from an Empirical Standpoint. Routledge, Milton Park (2012)

    Google Scholar 

  10. Chater, N., Zeitoun, H., Melkonyan, T.: The paradox of social interaction: shared intentionality, we-reasoning, and virtual bargaining. Psychol. Rev. 129(3), 415 (2022)

    Article  Google Scholar 

  11. Chi, O.H., Jia, S., Li, Y., Gursoy, D.: Developing a formative scale to measure consumers’ trust toward interaction with artificially intelligent (AI) social robots in service delivery. Comput. Hum. Behav. 118, 106700 (2021)

    Article  Google Scholar 

  12. Dennett, D.C.: The Intentional Stance. MIT press, Cambridge (1987)

    Google Scholar 

  13. Elliot, J.: Artificial social intelligence for successful teams (ASIST) (2021). www.darpa.mil/program/artificial-social-intelligence-for-successful-teams

  14. Gurney, N., Pynadath, D., Wang, N.: My actions speak louder than your words: when user behavior predicts their beliefs about agents’ attributes. arXiv preprint arXiv:2301.09011 (2023)

  15. Gurney, N., Pynadath, D.V., Wang, N.: Measuring and predicting human trust in recommendations from an AI teammate. In: International Conference on Human-Computer Interaction, pp. 22–34. Springer (2022). https://doi.org/10.1007/978-3-031-05643-7_2

  16. Hancock, P.A., Billings, D.R., Schaefer, K.E., Chen, J.Y., De Visser, E.J., Parasuraman, R.: A meta-analysis of factors affecting trust in human-robot interaction. Hum. Factors 53(5), 517–527 (2011)

    Article  Google Scholar 

  17. Hoff, K.A., Bashir, M.: Trust in automation: integrating empirical evidence on factors that influence trust. Hum. Factors 57(3), 407–434 (2015)

    Article  Google Scholar 

  18. Jessup, S.A., Schneider, T.R., Alarcon, G.M., Ryan, T.J., Capiola, A.: The measurement of the propensity to trust automation. In: Chen, J.Y.C., Fragomeni, G. (eds.) HCII 2019. LNCS, vol. 11575, pp. 476–489. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21565-1_32

    Chapter  Google Scholar 

  19. Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1), 99–134 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  20. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)

    Article  Google Scholar 

  21. Lee, J.D., See, K.A.: Trust in automation: designing for appropriate reliance. Hum. Factors 46(1), 50–80 (2004)

    Article  Google Scholar 

  22. Lutz, C., Tamó-Larrieux, A.: The robot privacy paradox: understanding how privacy concerns shape intentions to use social robots. Hum. Mach. Commun. 1, 87–111 (2020)

    Article  Google Scholar 

  23. McKnight, D.H., Choudhury, V., Kacmar, C.: Developing and validating trust measures for e-commerce: an integrative typology. Inf. Syst. Res. 13(3), 334–359 (2002)

    Article  Google Scholar 

  24. Merritt, S.M., Huber, K., LaChapell-Unnerstall, J., Lee, D.: Continuous Calibration of Trust in Automated Systems. MISSOURI UNIV-ST LOUIS, Tech. rep. (2014)

    Book  Google Scholar 

  25. Millikan, R.G.: Biosemantics. J. Philos. 86(6), 281–297 (1989)

    Google Scholar 

  26. Mischel, W.: Personality and Assessment. Psychology Press, London (2013)

    Google Scholar 

  27. Nomura, T., Kanda, T., Suzuki, T.: Experimental investigation into influence of negative attitudes toward robots on human-robot interaction. AI Soc. 20(2), 138–150 (2006)

    Article  Google Scholar 

  28. Nomura, T., Suzuki, T., Kanda, T., Kato, K.: Measurement of negative attitudes toward robots. Interact. Stud. 7(3), 437–454 (2006)

    Article  Google Scholar 

  29. Ouellette, J.A., Wood, W.: Habit and intention in everyday life: the multiple processes by which past behavior predicts future behavior. Psychol. Bull. 124(1), 54 (1998)

    Article  Google Scholar 

  30. Parasuraman, R., Riley, V.: Humans and automation: use, misuse, disuse, abuse. Hum. Factors 39(2), 230–253 (1997)

    Article  Google Scholar 

  31. Pynadath, D.V., Gurney, N., Wang, N.: Explainable reinforcement learning in human-robot teams: the impact of decision-tree explanations on transparency. In: 2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), pp. 749–756. IEEE (2022)

    Google Scholar 

  32. Pynadath, D.V., Wang, N., Kamireddy, S.: A markovian method for predicting trust behavior in human-agent interaction. In: Proceedings of the 7th International Conference on Human-Agent Interaction, pp. 171–178 (2019)

    Google Scholar 

  33. Rossi, A., Dautenhahn, K., Koay, K.L., Walters, M.L.: The impact of peoples’ personal dispositions and personalities on their trust of robots in an emergency scenario. Paladyn J. Behav. Robot. 9(1), 137–154 (2018)

    Article  Google Scholar 

  34. Rossi, A., Dautenhahn, K., Koay, K.L., Walters, M.L., Holthaus, P.: Evaluating people’s perceptions of trust in a robot in a repeated interactions study. In: Wagner, A.R. (ed.) ICSR 2020. LNCS (LNAI), vol. 12483, pp. 453–465. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62056-1_38

    Chapter  Google Scholar 

  35. Schaefer, K.: The perception and measurement of human-robot trust (2013). stars.library.ucf.edu/etd/2688

  36. Schrum, M.L., Johnson, M., Ghuy, M., Gombolay, M.C.: Four years in review: statistical practices of likert scales in human-robot interaction studies. In: Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction, pp. 43–52 (2020)

    Google Scholar 

  37. Seeber, I., et al.: Machines as teammates: a research agenda on AI in team collaboration. Inf. Manage. 57(2), 103174 (2020)

    Article  MathSciNet  Google Scholar 

  38. Shneiderman, B.: Human-centered artificial intelligence: reliable, safe & trustworthy. Int. J. Hum. Comput. Interact. 36(6), 495–504 (2020)

    Article  Google Scholar 

  39. Stevenson, D.C.: The internet classics archive: on interpretation by aristotle (2009). https://classics.mit.edu/Aristotle/interpretation.htmlclassics.mit.edu/Aristotle/interpretation.html

  40. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (2018)

    Google Scholar 

  41. Tauchert, C., Mesbah, N., et al.: Following the robot? Investigating users’ utilization of advice from robo-advisors. In: Proceedings of the International Conference on Information Systems (2019)

    Google Scholar 

  42. Textor, C., Pak, R.: Paying attention to trust: Exploring the relationship between attention control and trust in automation. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting. vol. 65 no. 1, pp. 817–821. SAGE Publications Sage CA: Los Angeles, CA (2021)

    Google Scholar 

  43. Venkatesh, V.: Determinants of perceived ease of use: integrating control, intrinsic motivation, and emotion into the technology acceptance model. Inf. Syst. Res. 11(4), 342–365 (2000)

    Article  Google Scholar 

  44. Wang, N., Pynadath, D.V., Hill, S.G.: The impact of POMDP-generated explanations on trust and performance in human-robot teams. In: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, pp. 997–1005 (2016)

    Google Scholar 

  45. Wang, N., Pynadath, D.V., Hill, S.G.: Trust calibration within a human-robot team: Comparing automatically generated explanations. In: 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 109–116. IEEE (2016)

    Google Scholar 

  46. Wang, N., Pynadath, D.V., Rovira, E., Barnes, M.J., Hill, S.G.: Is It My Looks? Or Something I Said? The impact of explanations, embodiment, and expectations on trust and performance in human-robot teams. In: Ham, J., Karapanos, E., Morita, P.P., Burns, C.M. (eds.) PERSUASIVE 2018. LNCS, vol. 10809, pp. 56–69. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78978-1_5

    Chapter  Google Scholar 

  47. Wong, C.M.L., Jensen, O.: The paradox of trust: perceived risk and public compliance during the COVID-19 pandemic in Singapore. J. Risk Res. 23(7–8), 1021–1030 (2020)

    Article  Google Scholar 

  48. Xu, A., Dudek, G.: OPTIMo: online probabilistic trust inference model for asymmetric human-robot collaborations. In: 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 221–228. IEEE (2015)

    Google Scholar 

  49. Yagoda, R.E., Gillan, D.J.: You want me to trust a robot? The development of a human-robot interaction trust scale. Int. J. Soc. Robot. 4(3), 235–248 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikolos Gurney .

Editor information

Editors and Affiliations

Appendices

Appendix A Figures

Fig. 1.
figure 1

Disposition to Trust Inventory Items

Appendix B Data and Models

1.1 B.1 Models

We used linear regression to model and test the predictive value of the various behavioral measures. The basic approach is to fit a reference model (Model 1) in which the outcome measure, future behavior, is predicted by the treatment conditions alone, such as:

$$\begin{aligned} Y_{iFB} = \beta _0 + \beta _\textrm{Treat} X_\textrm{iTreat} + \epsilon _i \end{aligned}$$
(1)

We use this general form for the reference (null) model across the experiments. Note that there are different FB measures; which one is used in a given set of models depends on the independent variable in question. A given reference and test model, however, always use the same dependent variable.

Fig. 2.
figure 2

The robot only communicated differently in study 3—it always made the same mistakes, in the same order. These features mean that the data lend themselves to concise, straightforward visual representation, which we present in the top panel. Note that the colors in this panel indicate the robot’s recommendation type. Specifically, the robot made two deadly and three innocuous mistakes which are highlighted above. Overall, there is a trend of increasing compliance. The figures for studies 1 and 2, which are respectively the bottom left and right panels, only indicate compliance percentage, not the type of recommendations made by the robot.

Model 2: The first category of test models incorporate participants’ DTI score as an independent variable:

$$\begin{aligned} Y_{iFB} = \beta _0 + \beta _\textrm{Treat} X_\textrm{iTreat} + \beta _\textrm{DTI} X_\textrm{iDTI} + \epsilon _i \end{aligned}$$
(2)

Note that the reference model is nested within this model. In other words, Eq. (2) represents the alternative hypothesis that adding the predictor variable to the model will result in accounting for more variance, or a significant improvement in model performance. Comparing the two models is as simple as applying an F-test, which in this case tells us whether the more complex model results in a statistically different residual sum of squares value. If it does, we can reject the null hypothesis, i.e. that the reference model is sufficient, in favor of the alternative hypothesis, or that adding the additional predictor variable(s) was warranted. The F-test, in this instance, can be thought of as allowing us to investigate the utility of adding DTI or other measures to the model.Footnote 2 When this test returns a p-value less than 0.05, we can conclude that the residual sum of squares values for the two models are significantly different at the level of \(\alpha = 0.05\), or in other words, the new explanatory variable is warranted as it significantly increases the amount of variance explained in the model (i.e. RSS is lower).

Model 3: The next category of test models incorporate participants’ past behavior as independent variables, for example, the model:

$$\begin{aligned} Y_{iFB} = \beta _0 + \beta _\textrm{Treat} X_\textrm{iTreat} + \beta _\textrm{FC} X_\textrm{iFC} + \epsilon _i \end{aligned}$$
(3)

This uses participants’ first choice to predict their compliance for all of the remaining choices that they faced. Again, this has the reference model embedded in it, and a simple F-test will reveal the utility of the FC predictor. We construct similar models for M1C, AFM, and AC-AFM (but using appropriate FB measures).

Model 4: Since it is likely that DTI and the past behavior measures are accounting for different variance in the models, directly comparing equations (2) and (3) via \(R^2\) is not entirely informative. Thus, we introduce a third category of test models in which both DTI and a past behavior measure are included. In the case of FC, the model is:

$$\begin{aligned} Y_{iFB} = \beta _0 + \beta _\textrm{Treat} X_\textrm{iTreat} + \beta _\textrm{DTI} X_\textrm{iDTI} + \beta _\textrm{FC} X_\textrm{iFC} + \epsilon _i \end{aligned}$$
(4)

These models facilitate assessing whether the added complexity of including both DTI and the past behavior measure is warranted, again using F-tests. Finally, for readability, we refer to relevant statistics within the text of the manuscript but place regression and other tables for each set of models and their associated tests in the appendix. These tables include, in the same order as above, models that facilitate comparing DTI and the behavioral measures. Each regression table is followed by a table presenting the results of the relevant F-tests.

1.2 B.2 Data

We used data from experiments conducted as part of a long-term research project on explainability and AI. Participants of these experiments team with a simulated robot during reconnaissance missions. The missions involve entering buildings to determine whether threats are present. The robot goes first and is equipped with a camera, microphone, and sensors for nuclear, biological, and chemical threats. These sensors are not perfectly reliable. Based on the data that it collects using its sensors, the robot makes a recommendation to the participant about putting on protective gear. The participant then makes a choice about wearing the gear, i.e., whether or not to comply with the robot’s recommendation. When participants wear the gear, it always neutralizes any threat. If they do not wear it and encounter a threat, they die in the virtual world, but in reality, incur a prohibitive time penalty. Finally, participants incur a slight time delay (much smaller than that for death in the virtual world) when equipping the gear.

In all three studies, the robot based its recommendations on the noisy sensor readings as input to a policy computed through either Partially Observable Markov Decision Processes (POMDPs) [19] or model-free reinforcement learning (RL) [20, 40] using the reward signal of the time cost and deaths incurred. The robot performed significantly better than chance across the studies. This means that compliance from the participants was highly correlated with making the normative choice, i.e., wearing protective equipment at the right time.

Participants of all three studies completed the 12-item DTI before starting their assigned mission(s). Gross experimental details and the results of these conditions are reported in the original publications. As such, we do not replicate those findings here for brevity’s sake. Do note, however, that the n’s we report may vary from the original papers because of incomplete observations (some participants chose to not complete the DTI).

Study 1 participants (\(n=198\), Amazon Mechanical Turk) completed three missions, each with eight buildings [44]. They were randomly paired with one of two POMDP-based robot types: a high-ability robot that was never wrong or a low-ability robot that made mistakes 20% of the time (or was 80% reliable). Both types of robots were crossed with four different recommendation explanation conditions: none, confidence level, sensor readings version 1, and sensor readings version 2. This experiment was fully between subjects, meaning that each participant interacted with only one robot type and received only one type of explanation throughout the missions. The coefficient \(\beta _\textrm{Treat}\) in the models for Study 1 captures which information condition participants experienced. First compliance choice (FC) takes 1 if participants heeded the robot’s recommendation for the first building and 0 if not. Mission 1 compliance (M1C), on the other hand, is the fraction of times that a participant complied with the robot’s recommendations during the first mission. The compliance future behavior (FB) measure associated with FC for study one is thus the fraction of times a participant complied for the remaining 23 buildings. Similarly, for M1C it is the fraction of times that a participant complied with the robot’s advice during missions two and three. It should be noted that participants were not told whether they were interacting with the same robot across missions; instead, the robot started each mission as if it had not previously interacted with a participant.

Study 2 participants (\(n=53\), cadets at West Point Academy) completed eight missions, each with a different POMDP-based robot [46]. In each mission, the human-robot team carried out a reconnaissance task of 15 buildings. The mission order was fixed (i.e., always searched the buildings in the same order and across missions), but the robot order was randomized. The \(2\times 2\times 2\) design crossed robot acknowledgment of mistakes (none/acknowledge), recommendation explanation (none/confidence), and embodiment (robot-like/doglike). Unlike Study 1, participants interacted with different robots during each mission. Nevertheless, to demonstrate the robustness of the simple behavioral measures, we rely on the same first compliance choice (FC) as Study 1 and a similar mission 1 compliance (M1C). The compliance measures, obviously, cover a longer horizon: 119 missions and 105 missions, respectively. The \(\beta _\textrm{Treat}\) of the models for Study 2 data captures the robot type of the first mission. It is possible that ordering of robot advisors mattered; however, the data are insufficient to specify a hierarchical model that would uncover such a feature.

Study 3 participants (\(n=148\), Amazon Mechanical Turk) completed one mission that covered 45 buildings with an RL-based (RL: reinforcement learning) robot in a fully between design [14, 15, 31]. The treatment conditions held the robot’s ability constant but varied how it explained its recommendations: no explanation, explanation of decision, explanation of decision and learning. Again, the first compliance choice (FC) is the same as the previous two studies and the FC outcome measure is the compliance fraction for the remaining 44 buildings. Mission 1 compliance (M1C) is not applicable given that the entire experiment consisted of a single mission. Given that building order and robot performance were fixed across treatment conditions, however, the two additional compliance measures, choice after the first mistake (AFM) and average compliance through the first mistake (AC-AFM), become meaningful. The first mistake occurred during building six, thus participants’ decision for building seven is the AFM measure and the fraction of times they complied during the first seven buildings is AC-AFM. Relatedly, the dependent variable is the fraction of times that a given participant complied during the remaining 38 buildings.

Appendix C Tables

Table 1. Study 1 FC Models
Table 2. Study 1 FC Model Comparisons
Table 3. Study 1 M1C Models
Table 4. Study 1 M1C Model Comparisons
Table 5. Study 2 FC Models
Table 6. Study 2 FC Model Comparison
Table 7. Study 2 M1C Models
Table 8. Study 2 M1C Model Comparison
Table 9. Study 3 FC Models
Table 10. Study 3 FC Model Comparison
Table 11. Study 3 AFM Models
Table 12. Study 3 AFM Model Comparison
Table 13. Study 3 AC-AFM Models
Table 14. Study 3 AC-AFM Model Comparison
Table 15. DTI as a Predictor of First Compliance Choice

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gurney, N., Pynadath, D.V., Wang, N. (2023). Comparing Psychometric and Behavioral Predictors of Compliance During Human-AI Interactions. In: Meschtscherjakov, A., Midden, C., Ham, J. (eds) Persuasive Technology. PERSUASIVE 2023. Lecture Notes in Computer Science, vol 13832. Springer, Cham. https://doi.org/10.1007/978-3-031-30933-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-30933-5_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-30932-8

  • Online ISBN: 978-3-031-30933-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics