Advertisement

Abstract

Performance optimization plays an important role in the design and operation of modern engineering systems in many areas, including communications (Internet and wireless networks), manufacturing, logistics, robotics, and bioinformatics. Most engineering systems are too complicated to be analyzed, or the parameters of the system models cannot be easily obtained. Therefore, learning techniques have to be applied.

Keywords

Optimal Policy Sample Path Perturbation Analysis Transition Probability Matrix Policy Space 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 152.
    Y. C. Ho, Q. C. Zhao, and D. Pepyne, “The No Free Lunch Theorem, Complexity and Computer Security,” IEEE Transactions on Automatic Control, Vol. 48, 783-793, 2003.CrossRefMathSciNetGoogle Scholar
  2. 80.
    C. H. Chen, S. D. Wu, and L. Dai, “Ordinal Comparison of Heuristic Algorithms Using Stochastic Optimization,” IEEE Transactions on Robotics and Automation, Vol. 15, 44-56, 1999.CrossRefGoogle Scholar
  3. 93.
    L. Dai and C. H. Chen, “Rates of Convergence of Ordinal Comparison for Dependent Discrete Event Dynamic Systems,” Journal of Optimization Theory and Applications, Vol. 94, 29-54, 1997.MATHCrossRefMathSciNetGoogle Scholar
  4. 123.
    X. H. Guan, C. Song, Y.C. Ho and Q. C. Zhao, “Constrained Ordinal Optimization - A Feasibility Model Based Approach,” Discrete Event Dynamic Systems: Theory and Applications, Vol. 16, 279-299, 2006.MATHCrossRefMathSciNetGoogle Scholar
  5. 139.
    Y. C. Ho, “Heuristics, Rules of Thumb, and the 80/20 Proposition,” IEEE Transactions on Automatic Control, Vol. 39, 1025-1027, 1994.CrossRefGoogle Scholar
  6. 140.
    Y. C. Ho, “On the Numerical Solution of Stochastic Optimization Problems,” IEEE Transactions on Automatic Control, Vol. 42, 727-729, 1997.MATHCrossRefGoogle Scholar
  7. 149.
    Y. C. Ho and D. L. Pepyne, “Simple Explanation of the No Free Lunch Theorem and its Implications,” Journal of Optimization Theory and Applications, Vol. 115, 549-570, 2002.MATHCrossRefMathSciNetGoogle Scholar
  8. 150.
    Y. C. Ho, R. Sreenivas, and P. Vakili, “Ordinal Optimization of Discrete Event Dynamic Systems,” Discrete Event Dynamic Systems: Theory and Applications, Vol. 2, 61-88, 1992.MATHCrossRefGoogle Scholar
  9. 151.
    Y. C. Ho, Q. C. Zhao, and Q. S. Jia, Ordinal Optimization: Soft Optimization for Hard Problems, Springer,2007, to appear.Google Scholar
  10. 175.
    T. W. E. Lau and Y. C. Ho, “Universal Alignment Probabilities and Subset Selection for Ordinal Optimization,” Journal of Optimization Theory and Applications, Vol. 93, 455-489, 1997.MATHCrossRefMathSciNetGoogle Scholar
  11. 181.
    L. H. Lee, T. W. E. Lau, and Y. C. Ho, “Explanation of Goal Softening in Ordinal Optimization,” IEEE Transactions on Automatic Control, Vol. 44, 94-99, 1999.MATHCrossRefMathSciNetGoogle Scholar
  12. 184.
    D. Li, L. H. Lee, and Y. C. Ho, “Constraint Ordinal Optimization,” Information Sciences, Vol. 148, 201-220, 2002.MATHCrossRefMathSciNetGoogle Scholar
  13. 266.
    Q. C. Zhao, Y. C. Ho, and Q. S. Jia, “Vector Ordinal Optimization,” Journal of Optimization Theory and Applications, Vol. 125, 259-274, 2005.MATHCrossRefMathSciNetGoogle Scholar
  14. 56.
    X. R. Cao, “From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning,” Discrete Event Dynamic Systems: Theory and Applications, Vol. 13, 9-39, 2003.MATHCrossRefMathSciNetGoogle Scholar
  15. 72.
    C. G. Cassandras and S. Lafortune, Introduction to Discrete Event Systems, Kluwer Academic Publishers, Boston, 1999.MATHGoogle Scholar
  16. 138.
    Y. C. Ho (eds.), Discrete-Event Dynamic Systems: Analyzing Complexity and Performance in the Modern World, IEEE Press, New York, 1992.Google Scholar
  17. 107.
    M. C. Fu and J. Q. Hu, Conditional Monte Carlo: Gradient Estimation and Optimization Applications, Kluwer Academic Publishers, Boston, 1997.MATHGoogle Scholar
  18. 112.
    P. Glasserman, Gradient Estimation Via Perturbation Analysis, Kluwer Academic Publishers, Boston, 1991.MATHGoogle Scholar
  19. 142.
    Y. C. Ho and X. R. Cao, Perturbation Analysis of Discrete-Event Dynamic Systems, Kluwer Academic Publisher, Boston, 1991.MATHGoogle Scholar
  20. 62.
    X. R. Cao and H. F. Chen, “Perturbation Realization, Potentials and Sensitivity Analysis of Markov Processes,” IEEE Transactions on Automatic Control, Vol. 42, 1382-1393, 1997.MATHCrossRefMathSciNetGoogle Scholar
  21. 70.
    X. R. Cao, X. M. Yuan, and L. Qiu, “A Single Sample Path-Based Performance Sensitivity Formula for Markov Chains,” IEEE Transactions on Automatic Control, Vol. 41, 1814-1817, 1996.MATHCrossRefMathSciNetGoogle Scholar
  22. 46.
    X. R. Cao, “A Sample Performance Function of Jackson Queueing Networks,” Operations Research, Vol. 36, 128-136, 1988.MATHCrossRefMathSciNetGoogle Scholar
  23. 49.
    X. R. Cao, “Realization Probability and Throughput Sensitivity in a Closed Jackson Network,” Journal of Applied Probability, Vol. 26, 615-624, 1989.MATHCrossRefMathSciNetGoogle Scholar
  24. 51.
    X. R. Cao, Realization Probabilities: The Dynamics of Queueing Systems, Springer-Verlag, New York, 1994.MATHCrossRefGoogle Scholar
  25. 82.
    E. K. P. Chong and P. J. Ramadge, “Convergence of Recursive Optimization Algorithms Using Infinitesimal Perturbation Analysis Estimates,” Discrete Event Dynamic Systems: Theory and Applications, Vol. 1, 339-372, 1992.MATHGoogle Scholar
  26. 83.
    E. K. P. Chong and P. J. Ramadge, “Optimization of Queues Using an Infinitesimal Perturbation Analysis-Based Stochastic Algorithm with General Update Times,” SIAM Journal on Control and Optimization, Vol. 31, 698-732, 1993.MATHCrossRefMathSciNetGoogle Scholar
  27. 84.
    E. K. P. Chong and P. J. Ramadge, “Stochastic Optimization of Regenerative Systems Using Infinitesimal Perturbation Analysis,” IEEE Transactions on Automatic Control, Vol. 39, 1400-1410, 1994.MATHCrossRefMathSciNetGoogle Scholar
  28. 141.
    Y. C. Ho and X. R. Cao, “Perturbation Analysis and Optimization of Queueing Networks,” Journal of Optimization Theory and Applications, Vol. 40, 559-582, 1983.MATHCrossRefMathSciNetGoogle Scholar
  29. 43.
    X. R. Cao, “First-Order Perturbation Analysis of a Single Multi-Class Finite Source Queue,” Performance Evaluation, Vol. 7, 31-41, 1987.MATHCrossRefMathSciNetGoogle Scholar
  30. 129.
    B. Heidergott, “Customer-Oriented Finite Perturbation Analysis for Queueing Networks,” Discrete Event Dynamic Systems: Theory and Applications, Vol. 10, 201-232, 2000.MATHCrossRefMathSciNetGoogle Scholar
  31. 143.
    Y. C. Ho, X. R. Cao, and C. Cassandras, “Infinitesimal and Finite Perturbation Analysis for Queueing Networks,” Automatica, Vol. 19, 439-445, 1983.MATHCrossRefGoogle Scholar
  32. 74.
    C. G. Cassandras, G. Sun, C. G. Panayiotou, and Y. Wardi, “Perturbation Analysis and Control of Two-Class Stochastic Fluid Models for Communication Networks,” IEEE Transactions on Automatic Control, Vol. 48, 770-782, 2003.CrossRefMathSciNetGoogle Scholar
  33. 75.
    C. G. Cassandras, Y. Wardi, B. Melamed, G. Sun, and C. G. Panayiotou, “Perturbation Analysis for Online Control and Optimization of Stochastic Fluid Models,” IEEE Transactions on Automatic Control, Vol. 47, 1234-1248, 2002.CrossRefMathSciNetGoogle Scholar
  34. 210.
    C. Panayiotou and C. G. Cassandras, “Infinitesimal Perturbation Analysis and Optimization for Make-to-Stock Manufacturing Systems Based on Stochastic Fluid Models,” Discrete Event Dynamic Systems: Theory and Applications, Vol. 16, 109-142, 2006.MATHCrossRefMathSciNetGoogle Scholar
  35. 231.
    G. Sun, C. G. Cassandras, and C. G. Panayiotou, “Perturbation Analysis of Multiclass Stochastic Fluid Models,” Discrete Event Dynamic Systems: Theory and Applications, Vol. 14, 267-307, 2004.MATHCrossRefMathSciNetGoogle Scholar
  36. 252.
    Y. Wardi, B. Melamed, C. G. Cassandras, and C. G. Panayiotou, “Online IPA Gradient Estimators in Stochastic Continuous Fluid Models,” Journal of Optimization Theory and Applications, Vol. 115, 369-405, 2002.MATHCrossRefMathSciNetGoogle Scholar
  37. 262.
    H. Yu and C. G. Cassandras, “Perturbation Analysis of Feedback-Controlled Stochastic Flow Systems,” IEEE Transactions on Automatic Control, Vol. 49, 1317-1332, 2004.CrossRefMathSciNetGoogle Scholar
  38. 263.
    H. Yu and C. G. Cassandras, “Perturbation Analysis of Communication Networks with Feedback Control Using Stochastic Hybrid Models,” Nonlinear Analysis - Theory Methods and Applications, Vol. 65, 1251-1280, 2006.MATHCrossRefMathSciNetGoogle Scholar
  39. 44.
    X. R. Cao, “Sensitivity Estimates Based on One Realization of a Stochastic System,” Journal of Statistical Computation and Simulation, Vol. 27, 211-232, 1987.MATHCrossRefGoogle Scholar
  40. 115.
    P. W. Glynn, “Regenerative Structure of Markov Chains Simulated Via Common Random Numbers,” Operations Research Letters, Vol. 4, 49-53, 1985.MATHCrossRefMathSciNetGoogle Scholar
  41. 116.
    P. W. Glynn, “Likelihood Ratio Gradient Estimation: An Overview,” Proceedings of the 1987 Winter Simulation Conference, Atlanta, Georgia, U.S.A, 366-375, December 1987.Google Scholar
  42. 117.
    P. W. Glynn, “Optimization of Stochastic Systems Via Simulation,” Proceedings of the 1989 Winter Simulation Conference, Washington, U.S.A, 90-105, December 1989.Google Scholar
  43. 118.
    P. W. Glynn and P. L’Ecuyer, “Likelihood Ratio Gradient Estimation for Stochastic Recursions,” Advances in Applied Probability, Vol. 27, 1019-1053, 1995.MATHCrossRefMathSciNetGoogle Scholar
  44. 130.
    B. Heidergott and X. R. Cao, “A Note on the Relation Between Weak Derivatives and Perturbation Realization,” IEEE Transactions on Automatic Control, Vol. 47, 1112-1115, 2002.MathSciNetGoogle Scholar
  45. 176.
    P. L’Ecuyer, “A Unified View of the IPA, SF, and LR Gradient Estimation Techniques,” Management Science, Vol. 36, 1364-1383, 1990.MATHCrossRefGoogle Scholar
  46. 177.
    P. L’Ecuyer, “Convergence Rate for Steady-State Derivative Estimators,” Annals of Operations Research, Vol. 39, 121-136, 1992.MATHCrossRefMathSciNetGoogle Scholar
  47. 178.
    P. L’Ecuyer, “On the Interchange of Derivative and Expectation for Likelihood Ratio Derivative Estimators,” Management Science, Vol. 41, 738-748, 1995.MATHCrossRefGoogle Scholar
  48. 179.
    P. L’Ecuyer and G. Perron, “On the Convergence Rates of IPA and FDC Derivative Estimators,” Operations Research, Vol. 42, 643-656, 1994.MATHCrossRefMathSciNetGoogle Scholar
  49. 205.
    M. K. Nakayama and P. Shahabuddin, “Likelihood Ratio Derivative Estimation for Finite-Time Performance Measures in Generalized Semi-Markov Processes,” Management Science, Vol. 44, 1426-1441, 1998.MATHCrossRefGoogle Scholar
  50. 217.
    M. I. Reiman and A. Weiss, “Sensitivity Analysis for Simulations Via Likelihood Ratios,” Operations Research, Vol. 37, 830-844, 1989.MATHCrossRefMathSciNetGoogle Scholar
  51. 132.
    B. Heidergott and A. Hordijk, “Single-Run Gradient Estimation Via Measure-Valued Differentiation,” IEEE Transactions on Automatic Control, Vol. 49, 1843-1846, 2004.CrossRefMathSciNetGoogle Scholar
  52. 134.
    B. Heidergott, A. Hordijk, and H. Weisshaupt, “Measure-Valued Differentiation for Stationary Markov Chains,” Mathematics of Operations Research, Vol. 31, 154-172, 2006.MATHCrossRefMathSciNetGoogle Scholar
  53. 8.
    A. Arapostathis, V. S. Borkar, E. Fernandez-Gaucherand, M. K. Ghosh, and S. I. Marcus, “Discrete-Time Controlled Markov Processes with Average Cost Criterion: A Survey,” SIAM Journal on Control and Optimization, Vol. 31, 282-344, 1993.MATHCrossRefMathSciNetGoogle Scholar
  54. 24.
    D. P. Bertsekas and S. E. Shreve, Stochastic Optimal Control: The Discrete Time Case, Academic Press, New York, 1978.MATHGoogle Scholar
  55. 98.
    E. A. Feinberg and A. Shwartz (eds.), Handbook of Markov Decision Processes: Methods and Application, Kluwer Academic Publishers, Boston, 2002.Google Scholar
  56. 135.
    O. Hernández-Lerma and J. B. Lasserre, Discrete-Time Markov Control Processes: Basic Optimality Criteria, Springer-Verlag, New York, 1996.Google Scholar
  57. 136.
    O. Hernández-Lerma and J. B. Lasserre, “Policy Iteration for Average Cost Markov Control Processes on Borel Spaces,” Acta Appliandae Mathematicae, Vol. 47, 125-154, 1997.MATHCrossRefGoogle Scholar
  58. 137.
    O. Hernández-Lerma and J. B. Lasserre, Markov Chains and Invariant Probabilities, Birkhäuser, Basel, 2003.MATHGoogle Scholar
  59. 163.
    L. C. M. Kallenberg, Linear Programming and Finite Markovian Control Problems, Mathematisch Centrum, Amsterdam, 1983.MATHGoogle Scholar
  60. 216.
    M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons, New York, 1994.MATHGoogle Scholar
  61. 248.
    A. F. Veinott, “On Finding Optimal Policies in Discrete Dynamic Programming with No Discounting,” The Annals of Mathematical Statistics, Vol. 37, 1284-1294, 1966.MATHCrossRefMathSciNetGoogle Scholar
  62. 249.
    A. F. Veinott, “Discrete Dynamic Programming with Sensitive Discount Optimality Criteria,” The Annals of Mathematical Statistics, Vol. 40, 1635-1660, 1969.MATHCrossRefMathSciNetGoogle Scholar
  63. 25.
    D. P. Bertsekas and T. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, Belmont, Massachusetts, 1996.Google Scholar
  64. 159.
    T. Jaakkola, S. P. Singh, and M. I. Jordan, “Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems,” In G. Tesauro, D. S. Touretzky, and T. K. Leen (eds.), Advances in Neural Information Processing Systems 7: Proceedings of the 1994 Conference, MIT Press, Cambridge, Massachusetts, 345-352, 1995.Google Scholar
  65. 223.
    G. A. Rummery and M. Niranjan, “On-Line Q-Learning Using Connectionist Systems,” Technical Report CUED/F-INFENG/TR 166, Engineering Department, Cambridge University, 1994.Google Scholar
  66. 226.
    A. Schwartz, “A Reinforcement Learning Method for Maximizing Undiscounted Rewards,” Proceedings of the Tenth International Conference on Machine Learning, Amherst, Massachusetts, U.S.A, 298-305, June 1993.Google Scholar
  67. 229.
    S. P. Singh, “Reinforcement Learning Algorithms for Average-Payoff Markovain Decision Processes,” Proceedings of the Twelfth National Conference on Artificial Intelligence, Seattle, Washington, U.S.A, 700-705, July-August 1994.Google Scholar
  68. 230.
    W. D. Smart and L. P. Kaelbling, “Practical Reinforcement Learning in Continuous Spaces,” Proceedings of the Seventeenth International Conference on Machine Learning, Stanford, California, U.S.A, 903-910, June-July 2000.Google Scholar
  69. 236.
    R. S. Sutton, “Learning to Predict by the Methods of Temporal Differences,” Machine Learning, Vol. 3, 9-44, 1988.Google Scholar
  70. 237.
    R. S. Sutton, “Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding,” in D. S. Touretzky, M. C. Mozer and M. E. Hasselmo (eds.), Advances in Neural Information Processing Systems 8: Proceedings of the 1995 Conference, MIT Press, Cambridge, Massachusetts, 1038-1044, 1996.Google Scholar
  71. 238.
    R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, Massachusetts, 1998.Google Scholar
  72. 239.
    R. S. Sutton, D. Precup, and S. Singh, “Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning,” Artificial Intelligence, Vol. 112, 181-211, 1999.MATHCrossRefMathSciNetGoogle Scholar
  73. 244.
    J. N. Tsitsiklis and B. Van Roy, “Average Cost Temporal-Difference Learning,” Automatica, Vol. 35, 1799-1808, 1999.MATHCrossRefGoogle Scholar
  74. 254.
    C. J. C. H. Watkins and P. Dayan, Q-Learning, Machine Learning, Vol. 8, 279-292, 1992.MATHGoogle Scholar
  75. 21.
    D. P. Bertsekas, Dynamic Programming and Optimal Control, Volumes I and II. Athena Scientific, Belmont, Massachusetts, 1995, 2001, 2007.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Hong Kong University of Science and TechnologyKowloonHong Kong

Personalised recommendations