Abstract
Optimization of human-AI teams hinges on the AI’s ability to tailor its interaction to individual human teammates. A common hypothesis in adaptive AI research is that minor differences in people’s predisposition to trust can significantly impact their likelihood of complying with recommendations from the AI. Predisposition to trust is often measured with self-report inventories that are administered before interactions. We benchmark a popular measure of this kind against behavioral predictors of compliance. We find that the inventory is a less effective predictor of compliance than the behavioral measures in datasets taken from three previous research projects. This suggests a general property that individual differences in initial behavior are more predictive than differences in self-reported trust attitudes. This result also shows a potential for easily accessible behavioral measures to provide an AI with more accurate models without the use of (often costly) survey instruments.
Part of the effort behind this work was sponsored by the Defense Advanced Research Projects Agency (DARPA) under contract number W911NF2010011. The content of the information does not necessarily reflect the position or the policy of the U.S. Government or the Defense Advanced Research Projects Agency, and no official endorsements should be inferred.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Controlling for mission did not meaningfully change the interpretation of the results.
- 2.
A likelihood ratio test is another method of comparing such models and will produce similar insights.
References
Ajzen, I.: The theory of planned behavior. Organ. Behav. Hum. Decis. Process. 50(2), 179–211 (1991)
Aliasghari, P., Ghafurian, M., Nehaniv, C.L., Dautenhahn, K.: Effect of domestic trainee robots’ errors on human teachers’ trust. In: Proceedings of the IEEE International Conference on Robot & Human Interactive Communication (RO-MAN), pp. 81–88. IEEE (2021)
Aliasghari, P., Ghafurian, M., Nehaniv, C.L., Dautenhahn, K.: How do different modes of verbal expressiveness of a student robot making errors impact human teachers’ intention to use the robot? In: Proceedings of the 9th International Conference on Human-Agent Interaction, pp. 21–30 (2021)
Amershi, S., et al.: Guidelines for human-AI interaction. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, pp. 1–13 (2019)
Aroyo, A.M., Rea, F., Sandini, G., Sciutti, A.: Trust and social engineering in human robot interaction: will a robot make you disclose sensitive information, conform to its recommendations or gamble? IEEE Robot. Autom. Lett. 3(4), 3701–3708 (2018)
Ashleigh, M.J., Higgs, M., Dulewicz, V.: A new propensity to trust scale and its relationship with individual well-being: implications for HRM policies and practices. Hum. Resour. Manage. J. 22(4), 360–376 (2012)
Bargain, O., Aminjonov, U.: Trust and compliance to public health policies in times of covid-19. J. Publ. Econ. 192, 104316 (2020)
Barnes, M.J., Wang, N., Pynadath, D.V., Chen, J.Y.: Human-agent bidirectional transparency. In: Trust in Human-Robot Interaction, pp. 209–232. Elsevier (2021)
Brentano, F.: Psychology from an Empirical Standpoint. Routledge, Milton Park (2012)
Chater, N., Zeitoun, H., Melkonyan, T.: The paradox of social interaction: shared intentionality, we-reasoning, and virtual bargaining. Psychol. Rev. 129(3), 415 (2022)
Chi, O.H., Jia, S., Li, Y., Gursoy, D.: Developing a formative scale to measure consumers’ trust toward interaction with artificially intelligent (AI) social robots in service delivery. Comput. Hum. Behav. 118, 106700 (2021)
Dennett, D.C.: The Intentional Stance. MIT press, Cambridge (1987)
Elliot, J.: Artificial social intelligence for successful teams (ASIST) (2021). www.darpa.mil/program/artificial-social-intelligence-for-successful-teams
Gurney, N., Pynadath, D., Wang, N.: My actions speak louder than your words: when user behavior predicts their beliefs about agents’ attributes. arXiv preprint arXiv:2301.09011 (2023)
Gurney, N., Pynadath, D.V., Wang, N.: Measuring and predicting human trust in recommendations from an AI teammate. In: International Conference on Human-Computer Interaction, pp. 22–34. Springer (2022). https://doi.org/10.1007/978-3-031-05643-7_2
Hancock, P.A., Billings, D.R., Schaefer, K.E., Chen, J.Y., De Visser, E.J., Parasuraman, R.: A meta-analysis of factors affecting trust in human-robot interaction. Hum. Factors 53(5), 517–527 (2011)
Hoff, K.A., Bashir, M.: Trust in automation: integrating empirical evidence on factors that influence trust. Hum. Factors 57(3), 407–434 (2015)
Jessup, S.A., Schneider, T.R., Alarcon, G.M., Ryan, T.J., Capiola, A.: The measurement of the propensity to trust automation. In: Chen, J.Y.C., Fragomeni, G. (eds.) HCII 2019. LNCS, vol. 11575, pp. 476–489. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21565-1_32
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1), 99–134 (1998)
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Lee, J.D., See, K.A.: Trust in automation: designing for appropriate reliance. Hum. Factors 46(1), 50–80 (2004)
Lutz, C., Tamó-Larrieux, A.: The robot privacy paradox: understanding how privacy concerns shape intentions to use social robots. Hum. Mach. Commun. 1, 87–111 (2020)
McKnight, D.H., Choudhury, V., Kacmar, C.: Developing and validating trust measures for e-commerce: an integrative typology. Inf. Syst. Res. 13(3), 334–359 (2002)
Merritt, S.M., Huber, K., LaChapell-Unnerstall, J., Lee, D.: Continuous Calibration of Trust in Automated Systems. MISSOURI UNIV-ST LOUIS, Tech. rep. (2014)
Millikan, R.G.: Biosemantics. J. Philos. 86(6), 281–297 (1989)
Mischel, W.: Personality and Assessment. Psychology Press, London (2013)
Nomura, T., Kanda, T., Suzuki, T.: Experimental investigation into influence of negative attitudes toward robots on human-robot interaction. AI Soc. 20(2), 138–150 (2006)
Nomura, T., Suzuki, T., Kanda, T., Kato, K.: Measurement of negative attitudes toward robots. Interact. Stud. 7(3), 437–454 (2006)
Ouellette, J.A., Wood, W.: Habit and intention in everyday life: the multiple processes by which past behavior predicts future behavior. Psychol. Bull. 124(1), 54 (1998)
Parasuraman, R., Riley, V.: Humans and automation: use, misuse, disuse, abuse. Hum. Factors 39(2), 230–253 (1997)
Pynadath, D.V., Gurney, N., Wang, N.: Explainable reinforcement learning in human-robot teams: the impact of decision-tree explanations on transparency. In: 2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), pp. 749–756. IEEE (2022)
Pynadath, D.V., Wang, N., Kamireddy, S.: A markovian method for predicting trust behavior in human-agent interaction. In: Proceedings of the 7th International Conference on Human-Agent Interaction, pp. 171–178 (2019)
Rossi, A., Dautenhahn, K., Koay, K.L., Walters, M.L.: The impact of peoples’ personal dispositions and personalities on their trust of robots in an emergency scenario. Paladyn J. Behav. Robot. 9(1), 137–154 (2018)
Rossi, A., Dautenhahn, K., Koay, K.L., Walters, M.L., Holthaus, P.: Evaluating people’s perceptions of trust in a robot in a repeated interactions study. In: Wagner, A.R. (ed.) ICSR 2020. LNCS (LNAI), vol. 12483, pp. 453–465. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62056-1_38
Schaefer, K.: The perception and measurement of human-robot trust (2013). stars.library.ucf.edu/etd/2688
Schrum, M.L., Johnson, M., Ghuy, M., Gombolay, M.C.: Four years in review: statistical practices of likert scales in human-robot interaction studies. In: Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction, pp. 43–52 (2020)
Seeber, I., et al.: Machines as teammates: a research agenda on AI in team collaboration. Inf. Manage. 57(2), 103174 (2020)
Shneiderman, B.: Human-centered artificial intelligence: reliable, safe & trustworthy. Int. J. Hum. Comput. Interact. 36(6), 495–504 (2020)
Stevenson, D.C.: The internet classics archive: on interpretation by aristotle (2009). https://classics.mit.edu/Aristotle/interpretation.htmlclassics.mit.edu/Aristotle/interpretation.html
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (2018)
Tauchert, C., Mesbah, N., et al.: Following the robot? Investigating users’ utilization of advice from robo-advisors. In: Proceedings of the International Conference on Information Systems (2019)
Textor, C., Pak, R.: Paying attention to trust: Exploring the relationship between attention control and trust in automation. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting. vol. 65 no. 1, pp. 817–821. SAGE Publications Sage CA: Los Angeles, CA (2021)
Venkatesh, V.: Determinants of perceived ease of use: integrating control, intrinsic motivation, and emotion into the technology acceptance model. Inf. Syst. Res. 11(4), 342–365 (2000)
Wang, N., Pynadath, D.V., Hill, S.G.: The impact of POMDP-generated explanations on trust and performance in human-robot teams. In: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, pp. 997–1005 (2016)
Wang, N., Pynadath, D.V., Hill, S.G.: Trust calibration within a human-robot team: Comparing automatically generated explanations. In: 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 109–116. IEEE (2016)
Wang, N., Pynadath, D.V., Rovira, E., Barnes, M.J., Hill, S.G.: Is It My Looks? Or Something I Said? The impact of explanations, embodiment, and expectations on trust and performance in human-robot teams. In: Ham, J., Karapanos, E., Morita, P.P., Burns, C.M. (eds.) PERSUASIVE 2018. LNCS, vol. 10809, pp. 56–69. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78978-1_5
Wong, C.M.L., Jensen, O.: The paradox of trust: perceived risk and public compliance during the COVID-19 pandemic in Singapore. J. Risk Res. 23(7–8), 1021–1030 (2020)
Xu, A., Dudek, G.: OPTIMo: online probabilistic trust inference model for asymmetric human-robot collaborations. In: 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 221–228. IEEE (2015)
Yagoda, R.E., Gillan, D.J.: You want me to trust a robot? The development of a human-robot interaction trust scale. Int. J. Soc. Robot. 4(3), 235–248 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix A Figures
Appendix B Data and Models
1.1 B.1 Models
We used linear regression to model and test the predictive value of the various behavioral measures. The basic approach is to fit a reference model (Model 1) in which the outcome measure, future behavior, is predicted by the treatment conditions alone, such as:
We use this general form for the reference (null) model across the experiments. Note that there are different FB measures; which one is used in a given set of models depends on the independent variable in question. A given reference and test model, however, always use the same dependent variable.
Model 2: The first category of test models incorporate participants’ DTI score as an independent variable:
Note that the reference model is nested within this model. In other words, Eq. (2) represents the alternative hypothesis that adding the predictor variable to the model will result in accounting for more variance, or a significant improvement in model performance. Comparing the two models is as simple as applying an F-test, which in this case tells us whether the more complex model results in a statistically different residual sum of squares value. If it does, we can reject the null hypothesis, i.e. that the reference model is sufficient, in favor of the alternative hypothesis, or that adding the additional predictor variable(s) was warranted. The F-test, in this instance, can be thought of as allowing us to investigate the utility of adding DTI or other measures to the model.Footnote 2 When this test returns a p-value less than 0.05, we can conclude that the residual sum of squares values for the two models are significantly different at the level of \(\alpha = 0.05\), or in other words, the new explanatory variable is warranted as it significantly increases the amount of variance explained in the model (i.e. RSS is lower).
Model 3: The next category of test models incorporate participants’ past behavior as independent variables, for example, the model:
This uses participants’ first choice to predict their compliance for all of the remaining choices that they faced. Again, this has the reference model embedded in it, and a simple F-test will reveal the utility of the FC predictor. We construct similar models for M1C, AFM, and AC-AFM (but using appropriate FB measures).
Model 4: Since it is likely that DTI and the past behavior measures are accounting for different variance in the models, directly comparing equations (2) and (3) via \(R^2\) is not entirely informative. Thus, we introduce a third category of test models in which both DTI and a past behavior measure are included. In the case of FC, the model is:
These models facilitate assessing whether the added complexity of including both DTI and the past behavior measure is warranted, again using F-tests. Finally, for readability, we refer to relevant statistics within the text of the manuscript but place regression and other tables for each set of models and their associated tests in the appendix. These tables include, in the same order as above, models that facilitate comparing DTI and the behavioral measures. Each regression table is followed by a table presenting the results of the relevant F-tests.
1.2 B.2 Data
We used data from experiments conducted as part of a long-term research project on explainability and AI. Participants of these experiments team with a simulated robot during reconnaissance missions. The missions involve entering buildings to determine whether threats are present. The robot goes first and is equipped with a camera, microphone, and sensors for nuclear, biological, and chemical threats. These sensors are not perfectly reliable. Based on the data that it collects using its sensors, the robot makes a recommendation to the participant about putting on protective gear. The participant then makes a choice about wearing the gear, i.e., whether or not to comply with the robot’s recommendation. When participants wear the gear, it always neutralizes any threat. If they do not wear it and encounter a threat, they die in the virtual world, but in reality, incur a prohibitive time penalty. Finally, participants incur a slight time delay (much smaller than that for death in the virtual world) when equipping the gear.
In all three studies, the robot based its recommendations on the noisy sensor readings as input to a policy computed through either Partially Observable Markov Decision Processes (POMDPs) [19] or model-free reinforcement learning (RL) [20, 40] using the reward signal of the time cost and deaths incurred. The robot performed significantly better than chance across the studies. This means that compliance from the participants was highly correlated with making the normative choice, i.e., wearing protective equipment at the right time.
Participants of all three studies completed the 12-item DTI before starting their assigned mission(s). Gross experimental details and the results of these conditions are reported in the original publications. As such, we do not replicate those findings here for brevity’s sake. Do note, however, that the n’s we report may vary from the original papers because of incomplete observations (some participants chose to not complete the DTI).
Study 1 participants (\(n=198\), Amazon Mechanical Turk) completed three missions, each with eight buildings [44]. They were randomly paired with one of two POMDP-based robot types: a high-ability robot that was never wrong or a low-ability robot that made mistakes 20% of the time (or was 80% reliable). Both types of robots were crossed with four different recommendation explanation conditions: none, confidence level, sensor readings version 1, and sensor readings version 2. This experiment was fully between subjects, meaning that each participant interacted with only one robot type and received only one type of explanation throughout the missions. The coefficient \(\beta _\textrm{Treat}\) in the models for Study 1 captures which information condition participants experienced. First compliance choice (FC) takes 1 if participants heeded the robot’s recommendation for the first building and 0 if not. Mission 1 compliance (M1C), on the other hand, is the fraction of times that a participant complied with the robot’s recommendations during the first mission. The compliance future behavior (FB) measure associated with FC for study one is thus the fraction of times a participant complied for the remaining 23 buildings. Similarly, for M1C it is the fraction of times that a participant complied with the robot’s advice during missions two and three. It should be noted that participants were not told whether they were interacting with the same robot across missions; instead, the robot started each mission as if it had not previously interacted with a participant.
Study 2 participants (\(n=53\), cadets at West Point Academy) completed eight missions, each with a different POMDP-based robot [46]. In each mission, the human-robot team carried out a reconnaissance task of 15 buildings. The mission order was fixed (i.e., always searched the buildings in the same order and across missions), but the robot order was randomized. The \(2\times 2\times 2\) design crossed robot acknowledgment of mistakes (none/acknowledge), recommendation explanation (none/confidence), and embodiment (robot-like/doglike). Unlike Study 1, participants interacted with different robots during each mission. Nevertheless, to demonstrate the robustness of the simple behavioral measures, we rely on the same first compliance choice (FC) as Study 1 and a similar mission 1 compliance (M1C). The compliance measures, obviously, cover a longer horizon: 119 missions and 105 missions, respectively. The \(\beta _\textrm{Treat}\) of the models for Study 2 data captures the robot type of the first mission. It is possible that ordering of robot advisors mattered; however, the data are insufficient to specify a hierarchical model that would uncover such a feature.
Study 3 participants (\(n=148\), Amazon Mechanical Turk) completed one mission that covered 45 buildings with an RL-based (RL: reinforcement learning) robot in a fully between design [14, 15, 31]. The treatment conditions held the robot’s ability constant but varied how it explained its recommendations: no explanation, explanation of decision, explanation of decision and learning. Again, the first compliance choice (FC) is the same as the previous two studies and the FC outcome measure is the compliance fraction for the remaining 44 buildings. Mission 1 compliance (M1C) is not applicable given that the entire experiment consisted of a single mission. Given that building order and robot performance were fixed across treatment conditions, however, the two additional compliance measures, choice after the first mistake (AFM) and average compliance through the first mistake (AC-AFM), become meaningful. The first mistake occurred during building six, thus participants’ decision for building seven is the AFM measure and the fraction of times they complied during the first seven buildings is AC-AFM. Relatedly, the dependent variable is the fraction of times that a given participant complied during the remaining 38 buildings.
Appendix C Tables
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gurney, N., Pynadath, D.V., Wang, N. (2023). Comparing Psychometric and Behavioral Predictors of Compliance During Human-AI Interactions. In: Meschtscherjakov, A., Midden, C., Ham, J. (eds) Persuasive Technology. PERSUASIVE 2023. Lecture Notes in Computer Science, vol 13832. Springer, Cham. https://doi.org/10.1007/978-3-031-30933-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-30933-5_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30932-8
Online ISBN: 978-3-031-30933-5
eBook Packages: Computer ScienceComputer Science (R0)