Advertisement

Safe Exploration for Active Learning with Gaussian Processes

  • Jens Schreiter
  • Duy Nguyen-Tuong
  • Mona Eberts
  • Bastian Bischoff
  • Heiner Markert
  • Marc Toussaint
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9286)

Abstract

In this paper, the problem of safe exploration in the active learning context is considered. Safe exploration is especially important for data sampling from technical and industrial systems, e.g. combustion engines and gas turbines, where critical and unsafe measurements need to be avoided. The objective is to learn data-based regression models from such technical systems using a limited budget of measured, i.e. labelled, points while ensuring that critical regions of the considered systems are avoided during measurements. We propose an approach for learning such models and exploring new data regions based on Gaussian processes (GP’s). In particular, we employ a problem specific GP classifier to identify safe and unsafe regions, while using a differential entropy criterion for exploring relevant data regions. A theoretical analysis is shown for the proposed algorithm, where we provide an upper bound for the probability of failure. To demonstrate the efficiency and robustness of our safe exploration scheme in the active learning setting, we test the approach on a policy exploration task for the inverse pendulum hold up problem.

Keywords

Discriminative Function Gaussian Process Input Space Decision Boundary Exploration Scheme 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Auer, P.: Using Confidence Bounds for Exploitation-Exploration Trade-Offs. Journal of Machine Learning Research 3, 397–422 (2002)MathSciNetzbMATHGoogle Scholar
  2. Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley & Sons (2006)Google Scholar
  3. Deisenroth, M.P., Fox, D., Rasmussen, C.E.: Gaussian Processes for Data-Efficient Learning in Robotics and Control. Transactions on Pattern Analysis and Machine Intelligence 37, 408–423 (2015)CrossRefGoogle Scholar
  4. Fedorov, V.V.: Theory of Optimal Experiments. Academic Press (1972)Google Scholar
  5. Galichet, N., Sebag, M., Teytaud, O.: Exploration vs exploitation vs safety: risk-aware multi-armed bandits. In: Ong, C.S., Ho, T.B. (eds.) Proceedings of the 5th Asian Conference on Machine Learning, JMLR: W&CP, vol. 29, pp. 245–260 (2013)Google Scholar
  6. Geibel, P.: Reinforcement learning with bounded risk. In: Brodley, C.E., Danyluk, A.P. (eds.) Proceedings of the 18th International Conference on Machine Learning, pp. 162–169 (2001)Google Scholar
  7. Gillula, J.H., Tomlin, C.J.: Guaranteed safe online learning of a bounded system. In: Amato, N.M. (ed.) Proceedings of the International Conference on Intelligent Robots and Systems, pp. 2979–2984 (2011)Google Scholar
  8. Guestrin, C., Krause, A., Singh, A.: Near-Optimal sensor placements in gaussian processes. In: De Raedt, L., Wrobel, S. (eds.) Proceedings of the 22nd International Conference on Machine Learning, pp. 265–275 (2005)Google Scholar
  9. Hans, A., Schneegaß, D., Schäfer, AM., Udluft, S.: Safe Exploration for reinforcement learning. In: Verleysen, M. (ed.) Proceedings of the European Symposium on Artificial Neural Networks, pp. 143–148 (2008)Google Scholar
  10. Ko, C., Lee, J., Queyranne, M.: An Exact Algorithm for Maximum Entropy Sampling. Operations Research 43, 684–691 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  11. Krause, A., Guestrin, C.: Nonmyopic active learning of gaussian processes: an exploration–exploitation approach. In: Ghahramani, Z. (ed.) Proceedings of the 24th International Conference on Machine Learning, pp. 449–456 (2007)Google Scholar
  12. Lang, K.J., Baum, E.B.: Query learning can work poorly when a human oracle is used. In: Proceedings of the International Joint Conference on Neural Networks, pp. 335–340 (1992)Google Scholar
  13. Moldovan, T.M., Abbeel, P.: Safe exploration in markov decision processes. In: Langford, J., Pineau, J. (eds.) Proceedings of the 29th International Conference on Machine Learning, pp. 1711–1718 (2012)Google Scholar
  14. Nemhauser, G.L., Wolsey, L.A., Fisher, M.L.: An Analysis of the Approximations for Maximizing Submodular Set Functions. Mathematical Programming 14, 265–294 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  15. Nickisch, H., Rasmussen, C.E.: Approximations for Binary Gaussian Process Classification. Journal of Machine Learning Research 9, 2035–2078 (2008)MathSciNetGoogle Scholar
  16. Polo, F.J.G., Rebollo, F.F.: Safe reinforcement learning in high-risk tasks through policy improvement. In: Proceedings of the Symposium on Adaptive Dynamic Programming and Reinforcement Learning, pp. 76–83 (2011)Google Scholar
  17. Ramakrishnan, N., Bailey-Kellogg, C., Tadepalli, S., Pandey, V.N.: Gaussian processes for active data mining of spatial aggregates. In: Kargupta, H., Kamath, C., Srivastava, J., Goodman, A. (eds.) Proceedings of the 5th SIAM International Conference on Data Mining, pp. 427–438 (2005)Google Scholar
  18. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. The MIT Press (2006)Google Scholar
  19. Seo, S., Wallat, M., Graepel, T., Obermayer, K.: Gaussian process regression: active data selection and test point rejection. In: Proceedings of the International Joint Conference on Neural Networks vol. 3, pp. 241–246 (2000)Google Scholar
  20. Settles, B.: Active Learning Literature Survey. In: Computer Sciences Technical Report University of Wisconsin, Madison (2010)Google Scholar
  21. Sobol, I.M.: Uniformly Distributed Sequences with an Additional Uniform Property. USSR Computational Mathematics and Mathematical Physics 16, 236–242 (1976)MathSciNetCrossRefzbMATHGoogle Scholar
  22. Srinivas, N., Krause, A., Kakade, S.M., Seeger, M.W.: Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting. Transactions on Information Theory 58, 3250–3265 (2012)MathSciNetCrossRefGoogle Scholar
  23. Valiant, L.G.: A Theory of the Learnable. Communications of the ACM 27, 1134–1142 (1984)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Jens Schreiter
    • 1
  • Duy Nguyen-Tuong
    • 1
  • Mona Eberts
    • 1
  • Bastian Bischoff
    • 1
  • Heiner Markert
    • 1
  • Marc Toussaint
    • 2
  1. 1.Robert Bosch GmbHStuttgartGermany
  2. 2.University of Stuttgart, MLR LaboratoryStuttgartGermany

Personalised recommendations