An Adversarial Algorithm for Delegation

  • Juan AfanadorEmail author
  • Murilo Baptista
  • Nir Oren
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11327)


Task delegation lies at the heart of the service economy, and is a fundamental aspect of many agent marketplaces. Research in computational trust considers which agent a task should be delegated to for execution given the agent’s past behaviour. However, such work does not consider the effects of the agent delegating the task onwards, forming a chain of delegations before the task is finally executed (as occurs in many human outsourcing scenarios). In this paper we consider such delegation chains, and empirically demonstrate that existing trust based approaches do not handle these situations as well. We then introduce a new algorithm based on quitting games to cater for recursive delegation.


  1. 1.
    Agrawal, S., Goyal, N.: Analysis of Thompson sampling for the multi-armed bandit problem. In: Conference on Learning Theory, pp. 39–1 (2012)Google Scholar
  2. 2.
    Auer, P., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47, 235–256 (2002)CrossRefGoogle Scholar
  3. 3.
    Brezzi, M., Lai, T.L.: Optimal learning and experimentation in bandit problems. J. Econ. Dyn. Control. 27(1), 87–108 (2002)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Burnett, C., Oren, N.: Sub-delegation and trust. In: AAMAS, pp. 1359–1360. IFAAMAS (2012)Google Scholar
  5. 5.
    Chapelle, O., Li, L.: An empirical evaluation of Thompson sampling. In: Advances in Neural Information Processing Systems, pp. 2249–2257 (2011)Google Scholar
  6. 6.
    Franke, S., Mehlitz, P., Pilecka, M.: Optimality conditions for the simple convex bilevel programming problem in banach spaces. Optimization 67(2), 237–268 (2018)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Gittins, J., Glazebrook, K., Weber, R.: Multi-Armed Bandit Allocation Indices. Wiley, Hoboken (2011)CrossRefGoogle Scholar
  8. 8.
    Gutin, E., Farias, V.: Optimistic Gittins indices. In: Advances in Neural Information Processing Systems, pp. 3153–3161 (2016)Google Scholar
  9. 9.
    He, X., Zhou, Y., Chen, Z.: Evolutionary bilevel optimization based on covariance matrix adaptation. IEEE Trans. Evol. Comput. (2018)Google Scholar
  10. 10.
    Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Koulouriotis, D.E., Xanthopoulos, A.: Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems. Appl. Math. Comput. 196(2), 913–922 (2008)zbMATHGoogle Scholar
  12. 12.
    Kulkarni, T.D., Narasimhan, K., Saeedi, A., Tenenbaum, J.: Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. In: Advances in Neural Information Processing Systems, pp. 3675–3683 (2016)Google Scholar
  13. 13.
    Sen, S., Ridgway, A., Ripley, M.: Adaptive budgeted bandit algorithms for trust development in a supply-chain. In: Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2015, pp. 137–144. International Foundation for Autonomous Agents and Multiagent Systems, Richland (2015).
  14. 14.
    Skibski, O., Michalak, T.P., Rahwan, T., Wooldridge, M.: Algorithms for the shapley and myerson values in graph-restricted games. In: Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems, pp. 197–204. International Foundation for Autonomous Agents and Multiagent Systems (2014)Google Scholar
  15. 15.
    Solan, E., Vieille, N.: Quitting games. Math. Oper. Res. 26(2), 265–285 (2001)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Solan, E., Vieille, N.: Quitting games-an example. Int. J. Game Theory 31(3), 365–381 (2003)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2011)zbMATHGoogle Scholar
  18. 18.
    Vezhnevets, A.S., et al.: Feudal networks for hierarchical reinforcement learning. arXiv preprint arXiv:1703.01161 (2017)
  19. 19.
    Welch, P.D.: The statistical analysis of simulation results. In: The Computer Performance Modeling Handbook, vol. 22, pp. 268–328 (1983)Google Scholar
  20. 20.
    Zhang, H., Zenios, S.: A dynamic principal-agent model with hidden information: sequential optimality through truthful state revelation. Oper. Res. 56(3), 681–696 (2008)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of AberdeenAberdeenScotland

Personalised recommendations