Advertisement

Multi-objective Optimal Control for Proactive Decision Making with Temporal Logic Models

  • Sandeep P. ChinchaliEmail author
  • Scott C. Livingston
  • Marco Pavone
Conference paper
Part of the Springer Proceedings in Advanced Robotics book series (SPAR, volume 10)

Abstract

The operation of today’s robots entails interactions with humans, in settings ranging from autonomous driving amidst human-driven vehicles to collaborative manufacturing. To effectively do so, robots must proactively decode the intent or plan of humans and concurrently leverage such a knowledge for safe, cooperative task satisfaction—a problem we refer to as proactive decision making. However, the problem of proactive intent decoding coupled with robotic control is computationally intractable as a robot must reason over several possible human behavioral models and resulting high-dimensional state trajectories. In this paper, we address the proactive decision making problem using a novel combination of algorithmic and data mining techniques. First, we distill high-dimensional state trajectories of human-robot interaction into concise, symbolic behavioral summaries that can be learned from data. Second, we leverage formal methods to model high-level agent goals, safe interaction, and information-seeking behavior with temporal logic formulae. Finally, we design a novel decision-making scheme that simply maintains a belief distribution over high-level, symbolic models of human behavior, and proactively plans informative control actions. Leveraging a rich dataset of real human driving data in crowded merging scenarios, we generate temporal logic models and use them to synthesize control strategies using tree-based value iteration and reinforcement learning (RL). Results from cooperative and adversarial simulated self-driving car scenarios demonstrate that our data-driven control strategies enable safe interaction, correct model identification, and significant dimensionality reduction.

Keywords

Decision-making Formal methods Human-robot interaction Data-mining 

References

  1. 1.
    Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al.: Tensorflow: A system for large-scale machine learning. In: Proceedings of the OSDI (2016)Google Scholar
  2. 2.
    Braziunas, D.: POMDP solution methods. Technical report, Department of Computer Science, University of Toronto (2003)Google Scholar
  3. 3.
    Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: OpenAI gym. https://arxiv.org/abs/1606.01540 (2016)
  4. 4.
    Chinchali, S.P., Livingston, S.C., Pavone, M., Burdick, J.W.: Simultaneous model identification and task satisfaction in the presence of temporal logic constraints. In: Proceedings of the IEEE Conference on Robotics and Automation (2016)Google Scholar
  5. 5.
    Gmytrasiewicz, P.J., Doshi, P.: A framework for sequential planning in multi-agent settings. J. Artif. Intell. Res. 24, 49–79 (2005)CrossRefGoogle Scholar
  6. 6.
    Javdani, S., Srinivasa, S.S., Bagnell, J.A.: Shared autonomy via hindsight optimization. In: Robotics, Science and Systems (2015)Google Scholar
  7. 7.
    Jones, A., Schwager, M., Belta, C.: Information-guided persistent monitoring under temporal logic constraints. In: American Control Conference, pp. 1911–1916 (2015)Google Scholar
  8. 8.
    Knight, W.: New self-driving car tells pedestrians when it’s safe to cross the street. MIT Technology Review (2016)Google Scholar
  9. 9.
    Konda, V., Tsitsiklis, J.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems (2000)Google Scholar
  10. 10.
    Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning and infinite-horizon partially observable markov decision problems. In: Proceedings of the AAAI Conference on Artificial Intelligence (1999)Google Scholar
  11. 11.
    Nguyen, T.-H.D., Hsu, D., Lee, W.-S., Leong, T.-Y., Kaelbling, L.P., Lozano-Perez, T., Grant, A.H.: Capir: collaborative action planning with intention recognition. In: Seventh Artificial Intelligence and Interactive Digital Entertainment Conference (2011)Google Scholar
  12. 12.
    Papadimitriou, C.H., Tsitsiklis, J.N.: The complexity of markov decision processes. Math. Oper. Res. 12(3), 441–450 (1987)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Raman, V., Donzé, A., Sadigh, D., Murray, R.M., Seshia, S.A.: Reactive synthesis from signal temporal logic specifications. In: Hybrid Systems, Computation and Control (2015)Google Scholar
  14. 14.
    Robicquet, A., Sadeghian, A., Alahi, A., Savarese, S.: Learning social etiquette: Human trajectory understanding in crowded scenes. In: European Control Conference (2016)Google Scholar
  15. 15.
    Sadigh, D., Sastry, S.S., Seshia, S.A., Dragan, A.: Information gathering actions over human internal state. In: IEEE/RSJ International Conference on Intelligent Robots & Systems (2016)Google Scholar
  16. 16.
    Trautman, P., Krause, A.: Unfreezing the robot: Navigation in dense, interacting crowds. In: IEEE/RSJ International Conference on Intelligent Robots & Systems (2010)Google Scholar
  17. 17.
    Wongpiromsarn, T., Frazzoli, E.: Control of probabilistic systems under dynamic, partially known environments with temporal logic specifications. In: Proceedings of the IEEE Conference on Decision and Control (2012)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Sandeep P. Chinchali
    • 1
    Email author
  • Scott C. Livingston
    • 2
  • Marco Pavone
    • 1
  1. 1.Stanford UniversityStanfordUSA
  2. 2.California Institute of TechnologyPasadenaUSA

Personalised recommendations