Skip to main content

Multi-agent Planning with High-Level Human Guidance

  • Conference paper
  • First Online:
PRIMA 2020: Principles and Practice of Multi-Agent Systems (PRIMA 2020)

Abstract

Planning and coordination of multiple agents in the presence of uncertainty and noisy sensors is extremely hard. A human operator who observes a multi-agent team can provide valuable guidance to the team based on her superior ability to interpret observations and assess the overall situation. We propose an extension of decentralized POMDPs that allows such human guidance to be factored into the planning and execution processes. Human guidance in our framework consists of intuitive high-level commands that the agents must translate into a suitable joint plan that is sensitive to what they know from local observations. The result is a framework that allows multi-agent systems to benefit from the complex strategic thinking of a human supervising them. We evaluate this approach on several common benchmark problems and show that it can lead to dramatic improvement in performance.

This work was supported in part by the National Key R&D Program of China (Grant No. 2017YFB1002204), the National Natural Science Foundation of China (Grant No. U1613216, Grant No. 61603368), and the Guangdong Province Science and Technology Plan (Grant No. 2017B010110011).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ai-Chang, M., et al.: MAPGEN: mixed-initiative planning and scheduling for the mars exploration rover mission. IEEE Intell. Syst. 19(1), 8–12 (2004)

    Article  Google Scholar 

  2. Amato, C., Dibangoye, J.S., Zilberstein, S.: Incremental policy generation for finite-horizon DEC-POMDPs. In: Proceedings of the 19th International Conference on Automated Planning and Scheduling, pp. 2–9 (2009)

    Google Scholar 

  3. Bechar, A., Edan, Y.: Human-robot collaboration for improved target recognition of agricultural robots. Ind. Robot Int. J. 30(5), 432–436 (2003)

    Article  Google Scholar 

  4. Bernstein, D.S., Givan, R., Immerman, N., Zilberstein, S.: The complexity of decentralized control of Markov decision processes. Math. Oper. Res. 27(4), 819–840 (2002)

    Article  MathSciNet  Google Scholar 

  5. Bradshaw, J.M., et al.: Kaa: policy-based explorations of a richer model for adjustable autonomy. In: Proceedings of the 4th International Conference on Autonomous Agents and Multiagent Systems, pp. 214–221 (2005)

    Google Scholar 

  6. Côté, N., Canu, A., Bouzid, M., Mouaddib, A.I.: Humans-robots sliding collaboration control in complex environments with adjustable autonomy. In: Proceedings of Intelligent Agent Technology (2013)

    Google Scholar 

  7. Dibangoye, J.S., Amato, C., Buffet, O., Charpillet, F.: Optimally solving Dec-POMDPs as continuous-state MDPs. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (2013)

    Google Scholar 

  8. Dorais, G., Bonasso, R.P., Kortenkamp, D., Pell, B., Schreckenghost, D.: Adjustable autonomy for human-centered autonomous systems. In: IJCAI Workshop on Adjustable Autonomy Systems, pp. 16–35 (1999)

    Google Scholar 

  9. Goldberg, K., et al.: Collaborative teleoperation via the internet. In: Proceedings of the 2000 IEEE International Conference on Robotics and Automation, vol. 2, pp. 2019–2024 (2000)

    Google Scholar 

  10. Goodrich, M.A., Olsen, D.R., Crandall, J.W., Palmer, T.J.: Experiments in adjustable autonomy. In: Proceedings of IJCAI Workshop on Autonomy, Delegation and Control: Interacting with Intelligent Agents, pp. 1624–1629 (2001)

    Google Scholar 

  11. Ishikawa, N., Suzuki, K.: Development of a human and robot collaborative system for inspecting patrol of nuclear power plants. In: Proceedings 6th IEEE International Workshop on Robot and Human Communication, pp. 118–123 (1997)

    Google Scholar 

  12. Kuhlmann, G., Knox, W.B., Stone, P.: Know thine enemy: a champion RoboCup coach agent. In: Proceedings of the 21st National Conference on Artificial Intelligence, pp. 1463–1468 (2006)

    Google Scholar 

  13. Kumar, A., Zilberstein, S.: Point-based backup for decentralized POMDPs: complexity and new algorithms. In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, pp. 1315–1322 (2010)

    Google Scholar 

  14. Mouaddib, A.I., Zilberstein, S., Beynier, A., Jeanpierre, L.: A decision-theoretic approach to cooperative control and adjustable autonomy. In: Proceedings of the 19th European Conference on Artificial Intelligence, pp. 971–972 (2010)

    Google Scholar 

  15. Oliehoek, F.A., Spaan, M.T., Amato, C., Whiteson, S.: Incremental clustering and expansion for faster optimal planning in decentralized POMDPs. J. Artif. Intell. Res. 46, 449–509 (2013)

    Article  MathSciNet  Google Scholar 

  16. Peshkin, L., Kim, K.E., Meuleau, N., Kaelbling, L.P.: Learning to cooperate via policy search. In: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, pp. 489–496 (2000)

    Google Scholar 

  17. Riley, P.F., Veloso, M.M.: Coach planning with opponent models for distributed execution. Auton. Agents Multi-Agent Syst. 13(3), 293–325 (2006). https://doi.org/10.1007/s10458-006-7449-z

    Article  Google Scholar 

  18. Rosenthal, S., Veloso, M.M.: Mobile robot planning to seek help with spatially-situated tasks. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence (2012)

    Google Scholar 

  19. Scerri, P., Pynadath, D., Tambe, M.: Adjustable autonomy in real-world multi-agent environments. In: Proceedings of the 5th International Conference on Autonomous Agents, pp. 300–307 (2001)

    Google Scholar 

  20. Seuken, S., Zilberstein, S.: Improved memory-bounded dynamic programming for decentralized POMDPs. In: Proceedings of the 23rd Conference Conference on Uncertainty in Artificial Intelligence, pp. 344–351 (2007)

    Google Scholar 

  21. Seuken, S., Zilberstein, S.: Memory-bounded dynamic programming for DEC-POMDPs. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 2009–2015 (2007)

    Google Scholar 

  22. Shiomi, M., Sakamoto, D., Kanda, T., Ishi, C.T., Ishiguro, H., Hagita, N.: A semi-autonomous communication robot: a field trial at a train station. In: Proceedings of the 3rd ACM/IEEE International Conference on Human Robot Interaction, pp. 303–310. ACM, New York (2008)

    Google Scholar 

  23. Szer, D., Charpillet, F.: Point-based dynamic programming for DEC-POMDPs. In: Proceedings of the 21st National Conference on Artificial Intelligence, pp. 1233–1238 (2006)

    Google Scholar 

  24. Wu, F., Jennings, N.R., Chen, X.: Sample-based policy iteration for constrained DEC-POMDPs. In: Proceedings of the 20th European Conference on Artificial Intelligence (ECAI), pp. 858–863 (2012)

    Google Scholar 

  25. Wu, F., Zilberstein, S., Chen, X.: Trial-based dynamic programming for multi-agent planning. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence, pp. 908–914 (2010)

    Google Scholar 

  26. Yanco, H.A., Drury, J.L., Scholtz, J.: Beyond usability evaluation: analysis of human-robot interaction at a major robotics competition. Hum.-Comput. Interact. 19(1–2), 117–149 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feng Wu .

Editor information

Editors and Affiliations

A The Benchmark Problems

A The Benchmark Problems

Fig. 3.
figure 3

The benchmark problems

1.1 A.1 Meeting in a 3 \(\times \) 3 Grid

In this problem, as shown in Fig. 3(a), two robots R1 and R2 situated in a 3 \(\times \) 3 grid try to stay in the same cell together as fast as possible. There are 81 states in total since each robot can be in any of the 9 cells. They can move up, down, left, right, or stay so each robot has 5 actions. Their moving actions (i.e., the actions except stay) are stochastic. With probability 0.6, they can move in the desired direction. With probability 0.1, they may move in another direction or just stay in the same cell. There are 9 observations per robot. Each robot can observe if it is near one of the corners or walls. The robots may meet at any of the 4 corners. Once they meet there, a reward of 1 is received by the agents. To make the problem more challenging, the agents are reset to their initial locations when they meet at the corners.

We design the high-level commands so that the robots are asked to meet at a specific corner. In more detail, the command set for this problem is \(C=\{ c_0, c_1, c_2, c_3, c_4 \}\) where \(c_0\) allows the agents to meet at any corner and \(c_i (i\ne 0)\) is the command for the robots to meeting in the corner labeled with i in Fig. 3(a). Depending on what we want to achieve in the problem, the commands can be more general (e.g., meeting at any of the top corners) or specific (e.g., meeting at the top-left corner without going through the center). The reward function is implemented so that they are rewarded only when they meet at the specific corner (for \(c_0\), the original reward function is used). For example, \(R(c_1, \cdot , \cdot ) = 1\) only when they meet at the top-left corner and 0 otherwise.

During execution time, we simulate a random event to determine whether meeting at one of the corners has a much higher reward (i.e., 100) and which corner. It is a finite state machine (FSM) with 5 states where state 0 means there is no highly rewarded corner and state i (1\(\le \) \(i\) \(\le \)4) means the corner labeled with i has the highest reward. The transition function of this FSM is predetermined and fixed for all the tests, but it is not captured by the model and not known during planning time. Therefore, they are not considered in the agents’ policies. The event can only be observed by the operator at runtime. This kind of a stochastic event can be used to simulate a disaster response scenario, where a group of robots with pre-computed plans are sent to search and rescue victims at several locations (i.e., the corners). For each location, the robots must cooperate and work together (i.e., meeting at the corner). As more information (e.g., messages reported by the people nearby) is collected at the base station, one of the locations may becomes more likely to have victims. Thus, the operators should guide the robots to search that location and rescue the victims there.

1.2 A.2 Cooperative Box-Pushing

In this problem, as shown in Fig. 3(b), there are two robots R1 and R2 in a 3 \(\times \) 3 grid trying to push the large box (LB) together or independently push the small boxes (SB). Each robot can turn left, turn right, move forward, or stay so there are 5 actions per robot. For each action, with probability 0.9, they can turn to the desired direction or move forward and with probability 0.1 they just stay in the same position and orientation. Each robot has 5 observations to identify the object in front, which can be either an empty field, a wall, the other robot, a small box, or the large box. For each robot, executing an action has a cost of 0.1 for energy consumption. If a robot bumps into a wall, the other robot, or a box without pushing it, it gets a penalty of 5. The standard reward function is designed to encourage cooperation. Specifically, the reward for cooperatively pushing the large box is 100, while the reward of pushing a small box is just 10. Each run includes 100 steps. Once a box is pushed to its goal location, the robots are reset to an initial state.

The high-level commands C=\(\{ c_0, c_1, c_2, c_3 \}\) are designed as follow: (\(c_0\)) the robots should push any box; (\(c_1\)) the robots should only push the small box on the left side; (\(c_2\)) the robots should only push the small box on the right side; (\(c_3\)) the robots should only push the large box in the middle. Specifying the corresponding reward function is straightforward. For \(c_0\), we use the original reward function. For \(c_i\) (\(1\le i \le 3\)), we reward the agents (\(+100\)) for pushing the right box and penalize them for pushing other boxes (\(-100\)).

Similar to the previous domain, we also simulate a random event representing a trapped animal in one of the cells labeled with numbers in Fig. 3(b). The animal is hidden behind a box so the robots cannot see it with their cameras. However, if the robots push a box while an animal is on the other side of that box, it will get injured and the robots get a high penalty of 100. The random event is modeled by a FSM with 5 states where state 0 represents no animal and state i (1\(\le \) \(i\) \(\le \)4) means an animal is trapped in the cell labeled i. If the animal gets injured, the FSM transitions to another state based on a predefined transition function. Again, this event is neither captured by the agents’ model nor their policies. The animal can only be observed by the operators during execution time with an additional camera attached behind the boxes. This setting allows us to simulate a scenario where the operators supervise robots performing risk-sensitive tasks. For example, robots doing construction work on a crowded street.

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, F., Zilberstein, S., Jennings, N.R. (2021). Multi-agent Planning with High-Level Human Guidance. In: Uchiya, T., Bai, Q., Marsá Maestre, I. (eds) PRIMA 2020: Principles and Practice of Multi-Agent Systems. PRIMA 2020. Lecture Notes in Computer Science(), vol 12568. Springer, Cham. https://doi.org/10.1007/978-3-030-69322-0_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-69322-0_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-69321-3

  • Online ISBN: 978-3-030-69322-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics