Advertisement

On the Sample Complexity of the Linear Quadratic Regulator

  • Sarah Dean
  • Horia Mania
  • Nikolai Matni
  • Benjamin RechtEmail author
  • Stephen Tu
Article

Abstract

This paper addresses the optimal control problem known as the linear quadratic regulator in the case when the dynamics are unknown. We propose a multistage procedure, called Coarse-ID control, that estimates a model from a few experimental trials, estimates the error in that model with respect to the truth, and then designs a controller using both the model and uncertainty estimate. Our technique uses contemporary tools from random matrix theory to bound the error in the estimation procedure. We also employ a recently developed approach to control synthesis called System Level Synthesis that enables robust control design by solving a quasi-convex optimization problem. We provide end-to-end bounds on the relative error in control cost that are optimal in the number of parameters and that highlight salient properties of the system to be controlled such as closed-loop sensitivity and optimal control magnitude. We show experimentally that the Coarse-ID approach enables efficient computation of a stabilizing controller in regimes where simple control schemes that do not take the model uncertainty into account fail to stabilize the true system.

Keywords

Optimal control Robust control System identification Statistical learning theory Reinforcement learning System level synthesis 

Mathematics Subject Classification

49N05 93D09 93D21 93E12 93E35 

Notes

Acknowledgements

We thank Ross Boczar, Qingqing Huang, Laurent Lessard, Michael Littman, Manfred Morari, Andrew Packard, Anders Rantzer, Daniel Russo, and Ludwig Schmidt for many helpful comments and suggestions. We also thank the anonymous referees for making several suggestions that have significantly improved the paper and its presentation.

Supplementary material

References

  1. 1.
    Y. Abbasi-Yadkori and C. Szepesvári. Regret Bounds for the Adaptive Control of Linear Quadratic Systems. In Proceedings of the 24th Annual Conference on Learning Theory, pages 1–26, 2011.Google Scholar
  2. 2.
    M. Abeille and A. Lazaric. Thompson sampling for linear-quadratic control problems. In AISTATS 2017-20th International Conference on Artificial Intelligence and Statistics, 2017.Google Scholar
  3. 3.
    M. Abeille and A. Lazaric. Improved regret bounds for thompson sampling in linear quadratic control problems. In International Conference on Machine Learning, pages 1–9, 2018.Google Scholar
  4. 4.
    J. Anderson and N. Matni. Structured state space realizations for sls distributed controllers. In 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 982–987. IEEE, 2017.Google Scholar
  5. 5.
    M. ApS. The MOSEK optimization toolbox for MATLAB manual. Version 8.1 (Revision 25)., 2015. URL http://docs.mosek.com/8.1/toolbox/index.html.
  6. 6.
    F. Borrelli, A. Bemporad, and M. Morari. Predictive control for linear and hybrid systems. Cambridge University Press, New York, NY, USA, 2017.zbMATHCrossRefGoogle Scholar
  7. 7.
    G. E. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung. Time series analysis: forecasting and control. John Wiley & Sons, 2015.Google Scholar
  8. 8.
    R. P. Braatz, P. M. Young, J. C. Doyle, and M. Morari. Computational complexity of \(\mu \) calculation. IEEE Transactions on Automatic Control, 39 (5): 1000–1002, 1994.MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    S. J. Bradtke, B. E. Ydstie, and A. G. Barto. Adaptive linear quadratic control using policy iteration. In Proceedings of 1994 American Control Conference-ACC’94, volume 3, pages 3475–3479. IEEE, 1994.Google Scholar
  10. 10.
    M. C. Campi and E. Weyer. Finite sample properties of system identification methods. IEEE Transactions on Automatic Control, 47 (8): 1329–1334, 2002.MathSciNetzbMATHCrossRefGoogle Scholar
  11. 11.
    J. Chen and G. Gu. Control-Oriented System Identification: An \(\cal{H}_\infty \) Approach. Wiley, 2000.Google Scholar
  12. 12.
    J. Chen and C. N. Nett. The Caratheodory-Fejer problem and \(H_\infty \) identification: a time domain approach. In Proceedings of 32nd IEEE Conference on Decision and Control, pages 68–73. IEEE, 1993.Google Scholar
  13. 13.
    M. A. Dahleh and I. J. Diaz-Bobillo. Control of uncertain systems: a linear programming approach. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1994.zbMATHGoogle Scholar
  14. 14.
    S. Dean, S. Tu, N. Matni, and B. Recht. Safely Learning to Control the Constrained Linear Quadratic Regulator. arXiv:1809.10121, 2018.Google Scholar
  15. 15.
    J. Doyle. Analysis of feedback systems with structured uncertainties. IEE Proceedings D - Control Theory and Applications, 129(6), 1982. ISSN 0143-7054.  https://doi.org/10.1049/ip-d.1982.0053.
  16. 16.
    Y. Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel. Benchmarking deep reinforcement learning for continuous control. In International Conference on Machine Learning, pages 1329–1338, 2016.Google Scholar
  17. 17.
    B. Dumitrescu. Positive trigonometric polynomials and signal processing applications, volume 103. Springer Science & Business Media, 2007.Google Scholar
  18. 18.
    B. Efron. Bootstrap Methods: Another Look at the Jackknife, pages 569–593. Springer New York, New York, NY, 1992.  https://doi.org/10.1007/978-1-4612-4380-9_41.
  19. 19.
    M. K. H. Fan, A. L. Tits, and J. C. Doyle. Robustness in the presence of mixed parametric uncertainty and unmodeled dynamics. IEEE Transactions on Automatic Control, 36(1), 1991. ISSN 0018-9286.  https://doi.org/10.1109/9.62265.
  20. 20.
    M. Fazel, R. Ge, S. Kakade, and M. Mesbahi. Global convergence of policy gradient methods for the linear quadratic regulator. In Proceedings of the 35th International Conference on Machine Learning, volume 80, pages 1467–1476. PMLR, 10–15 Jul 2018.Google Scholar
  21. 21.
    E. Feron. Analysis of robust \({\cal{H}}_2\) performance using multiplier theory. SIAM Journal on Control and Optimization, 35 (1): 160–177, 1997. 10.1137/S0363012994266504.MathSciNetzbMATHCrossRefGoogle Scholar
  22. 22.
    C.-N. Fiechter. PAC adaptive control of linear systems. In Proceedings of the Tenth Annual Conference on Computational Learning Theory, COLT ’97, pages 72–80, New York, NY, USA, 1997. ACM.  https://doi.org/10.1145/267460.267481.
  23. 23.
    A. Goldenshluger. Nonparametric estimation of transfer functions: rates of convergence and adaptation. IEEE Transactions on Information Theory, 44 (2): 644–658, 1998.  https://doi.org/10.1109/18.661510.MathSciNetzbMATHCrossRefGoogle Scholar
  24. 24.
    A. Goldenshluger and A. Zeevi. Nonasymptotic bounds for autoregressive time series modeling. Ann. Statist., 29 (2): 417–444, 04 2001. 10.1214/aos/1009210547.MathSciNetzbMATHCrossRefGoogle Scholar
  25. 25.
    P. Hall. The Bootstrap and Edgeworth Expansion. Springer Science & Business Media, 2013.Google Scholar
  26. 26.
    M. Hardt, T. Ma, and B. Recht. Gradient descent learns linear dynamical systems. The Journal of Machine Learning Research, 19 (1): 1025–1068, 2018.MathSciNetzbMATHGoogle Scholar
  27. 27.
    E. Hazan, K. Singh, and C. Zhang. Learning linear dynamical systems via spectral filtering. In Advances in Neural Information Processing Systems, pages 6702–6712, 2017.Google Scholar
  28. 28.
    E. Hazan, H. Lee, K. Singh, C. Zhang, and Y. Zhang. Spectral filtering for general linear dynamical systems. In Advances in Neural Information Processing Systems, pages 4634–4643, 2018.Google Scholar
  29. 29.
    A. J. Helmicki, C. A. Jacobson, and C. N. Nett. Control oriented system identification: a worst-case/deterministic approach in \(\cal{H}_\infty \). IEEE Transactions on Automatic Control, 36 (10): 1163–1176, 1991. 10.1109/9.90229.MathSciNetzbMATHCrossRefGoogle Scholar
  30. 30.
    M. Ibrahimi, A. Javanmard, and B. V. Roy. Efficient reinforcement learning for high dimensional linear quadratic systems. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 2636–2644, 2012.Google Scholar
  31. 31.
    N. Jiang, A. Krishnamurthy, A. Agarwal, J. Langford, and R. E. Schapire. Contextual decision processes with low Bellman rank are PAC-learnable. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pages 1704–1713. PMLR, 06–11 Aug 2017.Google Scholar
  32. 32.
    E. Jonas, Q. Pu, S. Venkataraman, I. Stoica, and B. Recht. Occupy the cloud: Distributed computing for the 99%. In Proceedings of the 2017 Symposium on Cloud Computing, pages 445–451. ACM, 2017. 10.1145/3127479.3128601.Google Scholar
  33. 33.
    V. Kuznetsov and M. Mohri. Generalization bounds for non-stationary mixing processes. Machine Learning, 106 (1): 93–117, 2017.  https://doi.org/10.1007/s10994-016-5588-2.MathSciNetzbMATHCrossRefGoogle Scholar
  34. 34.
    S. Levine, C. Finn, T. Darrell, and P. Abbeel. End-to-end training of deep visuomotor policies. Journal of Machine Learning Research, 17 (1): 1334–1373, Jan. 2016.MathSciNetzbMATHGoogle Scholar
  35. 35.
    W. Li and E. Todorov. Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems. In International Conference on Informatics in Control, Automation and Robotics, 2004.Google Scholar
  36. 36.
    L. Ljung. System Identification: Theory for the User. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1999.zbMATHGoogle Scholar
  37. 37.
    J. Löfberg. YALMIP : A toolbox for modeling and optimization in MATLAB. In IEEE International Symposium on Computer Aided Control System Design, 2004.Google Scholar
  38. 38.
    N. Matni, Y. Wang, and J. Anderson. Scalable system level synthesis for virtually localizable systems. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pages 3473–3480, Dec 2017.  https://doi.org/10.1109/CDC.2017.8264168.
  39. 39.
    D. J. McDonald, C. R. Shalizi, and M. Schervish. Nonparametric risk bounds for time-series forecasting. Journal of Machine Learning Research, 18 (1): 1044–1083, Jan. 2017.MathSciNetzbMATHGoogle Scholar
  40. 40.
    A. Megretski and A. Rantzer. System analysis via integral quadratic constraints. IEEE Transactions on Automatic Control, 42(6): 819–830, June 1997.  https://doi.org/10.1109/9.587335.MathSciNetzbMATHCrossRefGoogle Scholar
  41. 41.
    V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through deep reinforcement learning. Nature, 518: 529–533, 02 2015.CrossRefGoogle Scholar
  42. 42.
    M. Mohri and A. Rostamizadeh. Stability bounds for stationary \(\phi \)-mixing and \(\beta \)-mixing processes. Journal of Machine Learning Research, 11: 789–814, 2010.MathSciNetzbMATHGoogle Scholar
  43. 43.
    Y. Ouyang, M. Gagrani, and R. Jain. Control of unknown linear systems with thompson sampling. In 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 1198–1205, Oct 2017.  https://doi.org/10.1109/ALLERTON.2017.8262873.
  44. 44.
    A. Packard and J. Doyle. The complex structured singular value. Automatica, 29 (1): 71 – 109, 1993.  https://doi.org/10.1016/0005-1098(93)90175-S.MathSciNetzbMATHCrossRefGoogle Scholar
  45. 45.
    F. Paganini. Necessary and sufficient conditions for robust \({\cal{H}}_2\) performance. In Proceedings of 1995 34th IEEE Conference on Decision and Control, volume 2, pages 1970–1975 vol.2, Dec 1995.  https://doi.org/10.1109/CDC.1995.480635.
  46. 46.
    J. Pereira, M. Ibrahimi, and A. Montanari. Learning networks of stochastic differential equations. In Advances in Neural Information Processing Systems, pages 172–180, 2010.Google Scholar
  47. 47.
    R. Postoyan, L. Buşoniu, D. Nešić, and J. Daafouz. Stability analysis of discrete-time infinite-horizon optimal control with discounted cost. IEEE Transactions on Automatic Control, 62 (6): 2736–2749, June 2017.  https://doi.org/10.1109/TAC.2016.2616644.MathSciNetzbMATHCrossRefGoogle Scholar
  48. 48.
    L. Qiu, B. Bernhardsson, A. Rantzer, E. Davison, P. Young, and J. Doyle. A formula for computation of the real stability radius. Automatica, 31 (6): 879 – 890, 1995.  https://doi.org/10.1016/0005-1098(95)00024-Q.MathSciNetzbMATHCrossRefGoogle Scholar
  49. 49.
    D. J. Russo, B. Van Roy, A. Kazerouni, I. Osband, and Z. Wen. A tutorial on thompson sampling. Foundations and Trends on Machine Learning, 11(1): 1–96, 2018.  https://doi.org/10.1561/2200000070.zbMATHCrossRefGoogle Scholar
  50. 50.
    J. Shao and D. Tu. The Jackknife and Bootstrap. Springer Science & Business Media, 2012.Google Scholar
  51. 51.
    D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis. Mastering the game of go with deep neural networks and tree search. Nature, 529: 484–489, 01 2016.CrossRefGoogle Scholar
  52. 52.
    M. Simchowitz, H. Mania, S. Tu, M. I. Jordan, and B. Recht. Learning without mixing: Towards a sharp analysis of linear system identification. In Proceedings of the 31st Conference On Learning Theory, volume 75, pages 439–473. PMLR, 06–09 Jul 2018.Google Scholar
  53. 53.
    M. Sznaier, T. Amishima, P. Parrilo, and J. Tierno. A convex approach to robust \({\cal{H}}_2\) performance analysis. Automatica, 38 (6): 957 – 966, 2002.  https://doi.org/10.1016/S0005-1098(01)00299-0.MathSciNetzbMATHCrossRefGoogle Scholar
  54. 54.
    S. Tu, R. Boczar, A. Packard, and B. Recht. Non-Asymptotic Analysis of Robust Control from Coarse-Grained Identification. arXiv:1707.04791, 2017.Google Scholar
  55. 55.
    A. W. Van Der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. Springer Science & Business Media, 1996.Google Scholar
  56. 56.
    R. Vershynin. Introduction to the non-asymptotic analysis of random matrices. arXiv:1011.3027, 2010.Google Scholar
  57. 57.
    M. Vidyasagar and R. L. Karandikar. A learning theory approach to system identification and stochastic adaptive control. Journal of Process Control, 18 (3): 421 – 430, 2008.  https://doi.org/10.1016/j.jprocont.2007.10.009.CrossRefGoogle Scholar
  58. 58.
    M. J. Wainwright. High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2019.  https://doi.org/10.1017/9781108627771.
  59. 59.
    Y. Wang, N. Matni, and J. C. Doyle. A system level approach to controller synthesis. IEEE Transactions on Automatic Control, pages 1–1, 2019.  https://doi.org/10.1109/TAC.2018.2890753.
  60. 60.
    F. Wu and A. Packard. Optimal lqg performance of linear uncertain systems using state-feedback. In Proceedings of 1995 American Control Conference - ACC’95, volume 6, pages 4435–4439, June 1995.  https://doi.org/10.1109/ACC.1995.532775.
  61. 61.
    D. Youla, H. Jabr, and J. Bongiorno. Modern wiener-hopf design of optimal controllers–part ii: The multivariable case. IEEE Transactions on Automatic Control, 21 (3): 319–338, 1976.  https://doi.org/10.1109/TAC.1976.1101223.MathSciNetzbMATHCrossRefGoogle Scholar
  62. 62.
    P. M. Young, M. P. Newlin, and J. C. Doyle. \(\mu \) analysis with real parametric uncertainty. In Proceedings of the 30th IEEE Conference on Decision and Control, volume 2, pages 1251–1256, Dec 1991.  https://doi.org/10.1109/CDC.1991.261579.
  63. 63.
    B. Yu. Rates of convergence for empirical processes of stationary mixing sequences. The Annals of Probability, 22 (1): 94–116, 1994.MathSciNetzbMATHCrossRefGoogle Scholar
  64. 64.
    K. Zhou, J. C. Doyle, and K. Glover. Robust and Optimal Control. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1996.zbMATHGoogle Scholar

Copyright information

© SFoCM 2019

Authors and Affiliations

  • Sarah Dean
    • 1
  • Horia Mania
    • 1
  • Nikolai Matni
    • 2
  • Benjamin Recht
    • 1
    Email author
  • Stephen Tu
    • 1
  1. 1.Department of Electrical Engineering and Computer SciencesUniversity of CaliforniaBerkeleyUSA
  2. 2.Department of Computing and Mathematical SciencesCalifornia Institute of TechnologyPasadenaUSA

Personalised recommendations