Abstract
Solving finite-horizon Markov Decision Processes with stationary policies is a computationally difficult problem. Our dynamic dual decomposition approach uses Lagrange duality to decouple this hard problem into a sequence of tractable sub-problems. The resulting procedure is a straightforward modification of standard non-stationary Markov Decision Process solvers and gives an upper-bound on the total expected reward. The empirical performance of the method suggests that not only is it a rapidly convergent algorithm, but that it also performs favourably compared to standard planning algorithms such as policy gradients and lower-bound procedures such as Expectation Maximisation.
Chapter PDF
Similar content being viewed by others
References
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Vlassis, N.: A Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence. Synthesis Lectures on Artificial Intelligence and Machine Learning 1(1), 1–71 (2007)
Bertsekas, D.P.: Dynamic Programming and Optimal Control, 2nd edn. Athena Scientific, Belmont (2000)
Shachter, R.D.: Probabilistic Inference and Influence Diagrams. Operations Research 36, 589–604 (1988)
Williams, R.: Simple Statistical Gradient Following Algorithms for Connectionist Reinforcement Learning. Machine Learning 8, 229–256 (1992)
Toussaint, M., Storkey, A., Harmeling, S.: Bayesian Time Series Models. In: Expectation-Maximization Methods for Solving (PO)MDPs and Optimal Control Problems, Cambridge University, Cambridge (in press 2011), userpage.fu-berlin.de/~mtoussai
Furmston, T., Barber, D.: Efficient Inference in Markov Control Problems. In: Uncertainty in Artificial Intelligence. North-Holland, Amsterdam (2011)
Furmston, T., Barber, D.: An analysis of the Expectation Maximisation algorithm for Markov Decision Processes. Research Report RN/11/13, Centre for Computational Statistics and Machine Learning, University College London (2011)
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)
Sontag, D., Globerson, A., Jaakkola, T.: Introduction to Dual Decomposition for Inference. In: Sra, S., Nowozin, S., Wright, S. (eds.) Optimisation for Machine Learning, MIT Press, Cambridge (2011)
Furmston, T., Barber, D.: Variational Methods for Reinforcement Learning. AISTATS 9(13), 241–248 (2010)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Komodakis, N., Paragios, N., Tziritas, G.: MRF Optimization via Dual Decomposition: Message-Passing Revisited. In: IEEE 11th International Conference on Computer Vision, ICCV, pp. 1–8 (2007)
Dearden, R., Friedman, N., Russell, S.: Bayesian Q learning. AAAI 15, 761–768 (1998)
Sutton, R.: Generalization in Reinforcment Learning: Successful Examples Using Sparse Coarse Coding. NIPS (8), 1038–1044 (1996)
Hoffman, M., Doucet, A., De Freitas, N., Jasra, A.: Bayesian Policy Learning with Trans-Dimensional MCMC. NIPS (20), 665–672 (2008)
Hoffman, M., de Freitas, N., Doucet, A., Peters, J.: An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Rewards. AISTATS 5(12), 232–239 (2009)
Salakhutdinov, R., Roweis, S., Ghahramani, Z.: Optimization with EM and Expectation-Conjugate-Gradient. ICML (20), 672–679 (2003)
Fraley, C.: On Computing the Largest Fraction of Missing Information for the EM Algorithm and the Worst Linear Function for Data Augmentation. Research Report EDI-INF-RR-0934, University OF Washington (1999)
Barber, D.: Bayesian Reasoning and Machine Learning. Cambridge University Press, Cambridge (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Furmston, T., Barber, D. (2011). Lagrange Dual Decomposition for Finite Horizon Markov Decision Processes. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science(), vol 6911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23780-5_41
Download citation
DOI: https://doi.org/10.1007/978-3-642-23780-5_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23779-9
Online ISBN: 978-3-642-23780-5
eBook Packages: Computer ScienceComputer Science (R0)