Abstract
In the context of probabilistic verification, we provide a new notion of trace-equivalence divergence between pairs of Labelled Markov processes. This divergence corresponds to the optimal value of a particular derived Markov Decision Process. It can therefore be estimated by Reinforcement Learning methods. Moreover, we provide some PAC-guarantees on this estimation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bellman, R.E.: Dynamic Programming. Dover Publications, Incorporated (2003)
Blute, R., Desharnais, J., Edalat, A., Panangaden, P.: Bisimulation for labelled Markov processes. In: Proc. of the Twelfth IEEE Symposium On Logic In Computer Science, Warsaw, Poland (1997)
Censor, Y.: Parallel Optimization: Theory, Algorithms, Applications. Oxford University Press, Oxford (1997)
Cover, T.M., Thomas, J.A.: Elements of Information Theory, ch. 12. Wiley, Chichester (1991)
Even-Dar, E., Mansour, Y.: Learning rates for Q-learning. In: Helmbold, D.P., Williamson, B. (eds.) COLT 2001 and EuroCOLT 2001. LNCS (LNAI), vol. 2111, pp. 589–604. Springer, Heidelberg (2001)
Fiechter, C.N.: Design and Analysis of Efficient Reinforcement Learning Algorithms. PhD thesis, Univ.of Pittsburgh (1997)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. American Statistical Association Journal 58, 13–30 (1963)
Jaakkola, T., Jordan, M.I., Singh, S.P.: Convergence of stochastic iterative dynamic programming algorithms. In: Cowan, J.D., Tesauro, G., Alspector, J. (eds.) Advances in Neural Information Processing Systems, vol. 6, pp. 703–710. Morgan Kaufmann Publishers, San Francisco (1994)
Jou, C.-C., Smolka, S.A.: Equivalences, congruences, and complete axiomatizations for probabilistic processes. In: Baeten, J.C.M., Klop, J.W. (eds.) CONCUR 1990. LNCS, vol. 458, Springer, Heidelberg (1990)
Kaelbling, L.P., Littman, M.L., Moore, A.P.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Kearns, M., Singh, S.: Finite-sample convergence rates for q-learning and indirect algorithms. In: Proc. of the 1998 conference on Advances in neural information processing systems II, pp. 996–1002. MIT Press, Cambridge (1999)
Larsen, K.G., Skou, A.: Bisimulation through probabilistic testing. Inf. Comput. 94(1), 1–28 (1991)
Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning. MIT Press, Cambridge (1998)
Tsitsiklis, J.N.: Asynchronous stochastic approximation and Q-learning. Machine Learning 16(3), 185–202 (1994)
Watkins, C.: Learning from Delayed Rewards. PhD thesis, Univ. of Cambridge (1989)
Watkins, C., Dayan, P.: Q-learning. Machine Learning 8, 279–292 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Desharnais, J., Laviolette, F., Moturu, K.P.D., Zhioua, S. (2006). Trace Equivalence Characterization Through Reinforcement Learning. In: Lamontagne, L., Marchand, M. (eds) Advances in Artificial Intelligence. Canadian AI 2006. Lecture Notes in Computer Science(), vol 4013. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11766247_32
Download citation
DOI: https://doi.org/10.1007/11766247_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34628-9
Online ISBN: 978-3-540-34630-2
eBook Packages: Computer ScienceComputer Science (R0)