Advertisement

Active Learning of Dynamic Bayesian Networks in Markov Decision Processes

  • Anders Jonsson
  • Andrew Barto
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4612)

Abstract

Several recent techniques for solving Markov decision processes use dynamic Bayesian networks to compactly represent tasks. The dynamic Bayesian network representation may not be given, in which case it is necessary to learn it if one wants to apply these techniques. We develop an algorithm for learning dynamic Bayesian network representations of Markov decision processes using data collected through exploration in the environment. To accelerate data collection we develop a novel scheme for active learning of the networks. We assume that it is not possible to sample the process in arbitrary states, only along trajectories, which prevents us from applying existing active learning techniques. Our active learning scheme selects actions that maximize the total entropy of distributions used to evaluate potential refinements of the networks.

Keywords

Active Learning Bayesian Network Bayesian Information Criterion Markov Decision Process Dynamic Bayesian Network 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Dean, T., Kanazawa, K.: A model for reasoning about persistence and causation. Computational Intelligence 5(3), 142–150 (1989)CrossRefGoogle Scholar
  2. 2.
    Boutilier, C., Dearden, R., Goldszmidt, M.: Exploiting structure in policy construction. In: Proceedings of the International Joint Conference on Artificial Intelligence, vol. 14, pp. 1104–1113 (1995)Google Scholar
  3. 3.
    Feng, Z., Hansen, E., Zilberstein, Z.: Symbolic Generalization for On-line Planning. In: Proceedings of Uncertainty in Artificial Intelligence, vol. 19, pp. 209–216 (2003)Google Scholar
  4. 4.
    Guestrin, C., Koller, D., Parr, R.: Max-norm Projections for Factored MDPs. In: Proceedings of the International Joint Conference on Artificial Intelligence, vol. 17, pp. 673–680 (2001)Google Scholar
  5. 5.
    Jonsson, A., Barto, A.: Causal Graph Based Decomposition of Factored MDPs. Journal of Machine Learning Research 7, 2259–2301 (2006)Google Scholar
  6. 6.
    Kearns, M., Koller, D.: Efficient Reinforcement Learning in Factored MDPs. In: Proceedings of the International Joint Conference on Artificial Intelligence, vol. 16, pp. 740–747 (1999)Google Scholar
  7. 7.
    Buntime, W.: Theory refinement on Bayesian networks. In: Proceedings of Uncertainty in Artificial Intelligence, vol. 7, pp. 52–60 (1991)Google Scholar
  8. 8.
    Friedman, N., Murphy, K., Russell, S.: Learning the structure of dynamic probabilistic networks. In: Proceedings of Uncertainty in Artificial Intelligence, vol. 14, pp. 139–147 (1998)Google Scholar
  9. 9.
    Heckerman, D., Geiger, D., Chickering, D.: Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning 20, 197–243 (1995)zbMATHGoogle Scholar
  10. 10.
    Murphy, K.: Active learning of causal Bayes net structure. Technical report, Computer Science Division, University of Berkeley (2001)Google Scholar
  11. 11.
    Steck, H., Jaakkola, T.: Unsupervised active learning in large domains. In: Proceedings of Uncertainty in Artificial Intelligence, vol. 18, pp. 469–476 (2002)Google Scholar
  12. 12.
    Tong, S., Koller, D.: Active learning for structure in Bayesian networks. In: Proceedings of the International Joint Conference on Artificial Intelligence, vol. 17, pp. 863–869 (2001)Google Scholar
  13. 13.
    Schwartz, G.: Estimating the dimension of a model. Annals of Statistics 6, 461–464 (1978)Google Scholar
  14. 14.
    Poggio, T., Girosi, F.: Regularization Algorithms for Learning that are Equivalent to Multilayer Networks. Science 247, 978–982 (1990)CrossRefGoogle Scholar
  15. 15.
    Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  16. 16.
    Chickering, D., Geiger, D., Heckerman, D.: Learning Bayesian networks: search methods and experimental results. In: Proceedings of Artificial Intelligence and Statistics, vol. 5, pp. 112–128 (1995)Google Scholar
  17. 17.
    Dietterich, T.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)zbMATHGoogle Scholar
  18. 18.
    Ghavamzadeh, M., Mahadevan, S.: Continuous-Time Hierarchical Reinforcement Learning. In: Proceedings of the International Conference on Machine Learning, vol. 18, pp. 186–193 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Anders Jonsson
    • 1
  • Andrew Barto
    • 2
  1. 1.Department of Information and Communication Technologies, Universitat Pompeu Fabra, Passeig de Circumval·lació, 8, 08003 BarcelonaSpain
  2. 2.Autonomous Learning Laboratory, Department of Computer Science, University of Massachusetts, Amherst MA 01003USA

Personalised recommendations