Lagrange Dual Decomposition for Finite Horizon Markov Decision Processes

Furmston, Thomas; Barber, David

doi:10.1007/978-3-642-23780-5_41

Thomas Furmston²³ &
David Barber²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6911))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

3028 Accesses
3 Citations

Abstract

Solving finite-horizon Markov Decision Processes with stationary policies is a computationally difficult problem. Our dynamic dual decomposition approach uses Lagrange duality to decouple this hard problem into a sequence of tractable sub-problems. The resulting procedure is a straightforward modification of standard non-stationary Markov Decision Process solvers and gives an upper-bound on the total expected reward. The empirical performance of the method suggests that not only is it a rapidly convergent algorithm, but that it also performs favourably compared to standard planning algorithms such as policy gradients and lower-bound procedures such as Expectation Maximisation.

Download to read the full chapter text

Chapter PDF

Markov Decision Processes with Discounted Rewards: New Action Elimination Procedure

Markov Decision Processes with Discounted Rewards: Improved Successive Over-Relaxation Method

Markov Decision Processes with Discounted Costs: Improved Successive Over-Relaxation Method

Keywords

References

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Vlassis, N.: A Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence. Synthesis Lectures on Artificial Intelligence and Machine Learning 1(1), 1–71 (2007)
Article MATH Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control, 2nd edn. Athena Scientific, Belmont (2000)
Google Scholar
Shachter, R.D.: Probabilistic Inference and Influence Diagrams. Operations Research 36, 589–604 (1988)
Article MATH Google Scholar
Williams, R.: Simple Statistical Gradient Following Algorithms for Connectionist Reinforcement Learning. Machine Learning 8, 229–256 (1992)
MATH Google Scholar
Toussaint, M., Storkey, A., Harmeling, S.: Bayesian Time Series Models. In: Expectation-Maximization Methods for Solving (PO)MDPs and Optimal Control Problems, Cambridge University, Cambridge (in press 2011), userpage.fu-berlin.de/~mtoussai
Google Scholar
Furmston, T., Barber, D.: Efficient Inference in Markov Control Problems. In: Uncertainty in Artificial Intelligence. North-Holland, Amsterdam (2011)
Google Scholar
Furmston, T., Barber, D.: An analysis of the Expectation Maximisation algorithm for Markov Decision Processes. Research Report RN/11/13, Centre for Computational Statistics and Machine Learning, University College London (2011)
Google Scholar
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)
MATH Google Scholar
Sontag, D., Globerson, A., Jaakkola, T.: Introduction to Dual Decomposition for Inference. In: Sra, S., Nowozin, S., Wright, S. (eds.) Optimisation for Machine Learning, MIT Press, Cambridge (2011)
Google Scholar
Furmston, T., Barber, D.: Variational Methods for Reinforcement Learning. AISTATS 9(13), 241–248 (2010)
Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Komodakis, N., Paragios, N., Tziritas, G.: MRF Optimization via Dual Decomposition: Message-Passing Revisited. In: IEEE 11th International Conference on Computer Vision, ICCV, pp. 1–8 (2007)
Google Scholar
Dearden, R., Friedman, N., Russell, S.: Bayesian Q learning. AAAI 15, 761–768 (1998)
MathSciNet Google Scholar
Sutton, R.: Generalization in Reinforcment Learning: Successful Examples Using Sparse Coarse Coding. NIPS (8), 1038–1044 (1996)
Google Scholar
Hoffman, M., Doucet, A., De Freitas, N., Jasra, A.: Bayesian Policy Learning with Trans-Dimensional MCMC. NIPS (20), 665–672 (2008)
Google Scholar
Hoffman, M., de Freitas, N., Doucet, A., Peters, J.: An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Rewards. AISTATS 5(12), 232–239 (2009)
Google Scholar
Salakhutdinov, R., Roweis, S., Ghahramani, Z.: Optimization with EM and Expectation-Conjugate-Gradient. ICML (20), 672–679 (2003)
Google Scholar
Fraley, C.: On Computing the Largest Fraction of Missing Information for the EM Algorithm and the Worst Linear Function for Data Augmentation. Research Report EDI-INF-RR-0934, University OF Washington (1999)
Google Scholar
Barber, D.: Bayesian Reasoning and Machine Learning. Cambridge University Press, Cambridge (2011)
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
Thomas Furmston & David Barber

Authors

Thomas Furmston
View author publications
You can also search for this author in PubMed Google Scholar
David Barber
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics and Telecommunications, University of Athens, Panepistimioupolis, Ilisia, 15784, Athens, Greece
Dimitrios Gunopulos
Google Switzerland GmbH, Brandschenkestrasse 110, 8002, Zurich, Switzerland
Thomas Hofmann
Department of Computer Science, University of Bari “Aldo Moro”, via Orabona 4, 70125, Bari, Italy
Donato Malerba
Deptartment of Informatics, Athens University of Economics and Business, Patision 76, 10434, Athens, Greece
Michalis Vazirgiannis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Furmston, T., Barber, D. (2011). Lagrange Dual Decomposition for Finite Horizon Markov Decision Processes. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science(), vol 6911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23780-5_41

Download citation

DOI: https://doi.org/10.1007/978-3-642-23780-5_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23779-9
Online ISBN: 978-3-642-23780-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Lagrange Dual Decomposition for Finite Horizon Markov Decision Processes

Abstract

Chapter PDF

Similar content being viewed by others

Markov Decision Processes with Discounted Rewards: New Action Elimination Procedure

Markov Decision Processes with Discounted Rewards: Improved Successive Over-Relaxation Method

Markov Decision Processes with Discounted Costs: Improved Successive Over-Relaxation Method

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Lagrange Dual Decomposition for Finite Horizon Markov Decision Processes

Abstract

Chapter PDF

Similar content being viewed by others

Markov Decision Processes with Discounted Rewards: New Action Elimination Procedure

Markov Decision Processes with Discounted Rewards: Improved Successive Over-Relaxation Method

Markov Decision Processes with Discounted Costs: Improved Successive Over-Relaxation Method

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation