Computational Economics

, Volume 27, Issue 4, pp 433–452 | Cite as

Approximate Policy Optimization and Adaptive Control in Regression Models



In this paper we use recent advances in approximate dynamic programming to develop an approximate policy optimization procedure that uses Monte Carlo simulations for numerical solution of dynamic optimization problems in economics. The procedure is applied to the classical problem of “learning by doing” in regression models, for which the value and extent of active experimentation are demonstrated in a variety of numerical studies.


dynamic programming policy iteration rollout Monte Carlo learning by doing 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aghion, P., Bolton, P. and Harris, C. (1991). Optimal learning byexperimentation. Review of Economic Studies 58, 621–654.CrossRefGoogle Scholar
  2. Anderson, T.W. and Taylor, J. (1976). Some experimental resultson the statistical properties of least squares estimates in controlproblems. Econometrica 44, 1289–1302.CrossRefGoogle Scholar
  3. Åstrom, K.J. (1983). Theory and applications of adaptive control– a survey. Automatica 19, 471–486.CrossRefGoogle Scholar
  4. Bellman, R. (1957). Dynamic Programming. Princeton, NJ:Princeton University Press.Google Scholar
  5. Bertsekas, D.P. (2000). Dynamic Programming and Optimal Control, 2nd edition. Belmont, MA: Athena Scientific.Google Scholar
  6. Bertsekas, D.P. and Tsitsiklis, J.N. (1996). Neuro-DynamicProgramming. Belmont, MA: Athena Scientific.Google Scholar
  7. Bertsekas, D.P. Tsitsiklis, J.N. and Wu, C. (1997). Rolloutalgorithms for combinatorial optimization. Journal ofHeuristics 3, 245–262.CrossRefGoogle Scholar
  8. Blume, L. and Easley, D. (1984). Rational expectationsequilibrium: An alternative approach. Journal of EconomicTheory 34, 116–129.Google Scholar
  9. Box, G.E.P. and Tiao, G.C. (1973). Bayesian Inference in Statistical Analysis. New York: Wiley.Google Scholar
  10. Easley, D. and Kiefer, N.M. (1989). Optimal learning withendogenous data. International Economic Review 30, 963–978.CrossRefGoogle Scholar
  11. Kendrick, D. (1981). Stochastic Control for EconomicModels. New York: McGraw-Hill.Google Scholar
  12. Kiefer, N.M. (1989). A value function arising in the economics ofinformation. Journal of Economic Dynamics and Control 13, 201–223.CrossRefGoogle Scholar
  13. Lai, T.L. and Robbins, H. (1979). Adaptive design and stochasticapproximation. Annals of Statistics 7, 1196–1221.Google Scholar
  14. Lai, T.L. and Robbins, H. (1982). Iterated least squares inmultiperiod control. Advances in Applied Mathematics 3, 50–73.CrossRefGoogle Scholar
  15. Lai, T.L. and Wong, S.P. (2004). Valuation of American options via basis functions. IEEE Transactions onAutomatic Control 49 374–385.CrossRefGoogle Scholar
  16. Longstaff, F.A. and Schwartz, E.S. (2001). Valuing American options by simulation: A simple least-squaresapproach. Review of Financial Studies 14, 113–147.CrossRefGoogle Scholar
  17. MacGovern, A., Moss, E. and Barto, A. (2002). Building a basic block instruction scheduler using reinforcementlearning and rollouts. Machine Learning 49, 141–160.CrossRefGoogle Scholar
  18. Prescott, E. (1972). The multi-period control problem under uncertainty. Econometrica 40, 1043–1058.CrossRefGoogle Scholar
  19. Rust, J. (1997). Using randomization to break the curse of dimensionality. Econometrica 65, 487–516.CrossRefGoogle Scholar
  20. Secomandi, N. (2001). A rollout policy for the vehicle routing problem with stochastic demands. OperationsResearch, 49, 796–802.CrossRefGoogle Scholar
  21. Secomandi, N. (2003). Analysis of a rollout approach to sequencing problems with stochastic routingapplications. Journal of Heuristics 9, 321–352.CrossRefGoogle Scholar
  22. Stokey, N. and Lucas, R.E. (1989). Recursive Methods in Economic Dynamics. Cambridge, MA: HarvardUniversity Press.Google Scholar
  23. Tesauro, G. and Galperin, G. (1996). On-line policy improvement using Monte-Carlo search. In Advances inNeural Information Processing Systems 9, 1068–1074. Cambridge, MA: MIT Press.Google Scholar
  24. Tsitsiklis, J.N. and Van Roy, B. (2001). Regression methods for pricing complex American-style options. IEEE Transactions on Neural Networks 12, 694–703.CrossRefGoogle Scholar
  25. Wieland, V. (1995). Optimal control with unknown parameters – a study of optimal learning strategies with anapplication to monetary policy. Ph.D. Thesis, Stanford University.Google Scholar
  26. Wieland, V. (2000). Learning by doing and the value of optimal experimentation. Journal of EconomicDynamics and Control 24, 501–534.CrossRefGoogle Scholar
  27. Yan, X., Diaconis, P., Rusmevichientong, P. and Van Roy, B. (2005). Solitaire: Man versus machine. In Advances in Neural Information Processing Systems 17, in press. Cambridge, MA: MIT PressGoogle Scholar

Copyright information

© Springer Science+Business Media, Inc. 2006

Authors and Affiliations

  • Jiarui Han
    • 1
  • Tze Leung Lai
    • 1
  • Viktor Spivakovsky
    • 2
  1. 1.Barclays Global InvestorsSan FranciscoUSA
  2. 2.Citadel Investment GroupChicagoUSA

Personalised recommendations