Language Resources and Evaluation

, Volume 40, Issue 1, pp 47–66 | Cite as

Evaluating the Markov assumption in Markov Decision Processes for spoken dialogue management

Original Paper


The goal of dialogue management in a spoken dialogue system is to take actions based on observations and inferred beliefs. To ensure that the actions optimize the performance or robustness of the system, researchers have turned to reinforcement learning methods to learn policies for action selection. To derive an optimal policy from data, the dynamics of the system is often represented as a Markov Decision Process (MDP), which assumes that the state of the dialogue depends only on the previous state and action. In this article, we investigate whether constraining the state space by the Markov assumption, especially when the structure of the state space may be unknown, truly affords the highest reward. In simulation experiments conducted in the context of a dialogue system for interacting with a speech-enabled web browser, models under the Markov assumption did not perform as well as an alternative model which classifies the total reward with accumulating features. We discuss the implications of the study as well as its limitations.


Spoken dialogue Dialogue management Markov assumption 


  1. Bellman, R. E. (1957). Dynamic programming. Princeton University PressGoogle Scholar
  2. Chickering, D. (2002). The WinMine Toolkit. Technical Report MSR-TR-2002-103, Microsoft, Redmond, WAGoogle Scholar
  3. Chickering, D. M., Heckerman, D., & Meek, C. (1997). A Bayesian approach to learning Bayesian networks with local structure. In Proceedings of the thirteenth conference on uncertainty in artificial intelligence (pp. 80–89), Providence, RI: Morgan Kaufmann.Google Scholar
  4. Chickering, D. M., & Paek, T. (2006). Personalizing influence diagrams: Applying online learning strategies to dialogue management. User Modeling and User-adapted Interaction, To appear.Google Scholar
  5. Clark, H. (1996). Using language. Cambridge: Cambridge University Press.Google Scholar
  6. Heckerman, D. (1995). A Bayesian approach for learning causal networks. In S. Hanks, & P. Besnard (Eds.), Proceedings of the eleventh conference on uncertainty in articial intelligence (UAI) (pp. 285–295). Morgan Kaufmann.Google Scholar
  7. Howard, R. A., & Matheson, J. (1981). Influence diagrams. In Readings on the principles and applications of decision analysis (Vol. II, pp. 721–762). Menlo Park, CA: Strategic Decisions Group.Google Scholar
  8. Kaelbling, L. P., Littman, M. L., & Moore, A. P. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237–285.Google Scholar
  9. Kearns, M. J., Mansour, Y., & Ng, A. Y. (1999). A sparse sampling algorithm for near-optimal planning in large Markov Decision Processes. In Proceedings of the international joint conference on artificial intelligence (IJCAI) (pp. 1324–1231).Google Scholar
  10. Lauritzen, S. L., & Nilsson, D. (2001). Representing and solving decision problems with limited information. Management Science, 47(9), 1235–1251.CrossRefGoogle Scholar
  11. Levin, E., Pieraccini, R., & Eckert, W. (1998). Using Markov Decision Processes for learning dialogue strategies. In IEEE Transactions on Speech and Audio Processing (Vol. 8, pp. 11–23).Google Scholar
  12. Meuleau, N., Hauskrecht, M., Kim, K.-E., Peshkin, L., Kaelbling, L.P., Dean, T., & Boutilier, C. (1998). Solving very large weakly coupled Markov Decision Processes. In Proceedings of the fifteenth national conference on artificial intelligence and the tenth conference on innovative applications of artificial intelligence (AAAI/IAAI) (pp. 165–172).Google Scholar
  13. Paek, T. & Horvitz, E. (2000). Conversation as action under uncertainty. In Proceedings of the sixteenth conference on uncertainty in articial intelligence (UAI) (pp. 455–464).Google Scholar
  14. Paek, T. & Horvitz, E. (2004). Optimizing automated call routing by integrating spoken dialog models with queuing models. In Proceedings of the human language technology conference/North American chapter of the association for computational linguistics annual meeting (HLT/NAACL) (pp. 41–48).Google Scholar
  15. Roy, N., Pineau, J., & Thrun, S. (2000). Spoken dialogue management using probabilistic reasoning. In Proceedings of the conference of the association for computational linguistics (ACL) (pp. 93–100).Google Scholar
  16. Shachter, R. D. (1988). Probabilistic inference and influence diagrams. Operations Research, 36(4), 589–604.CrossRefGoogle Scholar
  17. Singh, S., Litman, D., Kearns, M., & Walker, M. (2002). Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJ-Fun System. Journal of Artificial Intelligence Research, 16, 105–133.Google Scholar
  18. Sutton, R. & Barto, A. (1998). Reinforcement learning: An introduction. MIT Press.Google Scholar
  19. Tatman, J. A. & Shachter, R. D. (1990). Dynamic programming and influence diagrams. IEEE transactions on Systems, Man and Cybernetics, 20(2), 365–379.CrossRefGoogle Scholar
  20. Walker, M., Aberdeen, J., Boland, J., Bratt, E., Garofolo, J., Hirschman, L., Le, A., Lee, S., Narayanan, S., Papineni, K., Pellom, K., Polifroni, B., Potamianos, A., Prabhu, P., Rudnicky, A., Sanders, G., Seneff, S., Stallard, D., & Whittaker, S. (2001a). DARPA communicator dialog travel planning systems: The June 2000 data collection. In Proceedings of the European conference on speech communication and technology (Eurospeech) (pp. 1371–1374).Google Scholar
  21. Walker, M., Passonneau, R., & Boland, J. (2001b). Quantitative and qualitative evaluation of DARPA communicator spoken dialogue systems. In Proceedings of the conference of the association for computational linguistics (ACL) (pp. 515–522).Google Scholar
  22. Watkins, C., & Dayan, P. (1992). Q-Learning. Machine Learning, 8(3), 229–256.Google Scholar
  23. Williams, J. D., Poupart, P., & Young, S. (2005). Factored partially observable Markov Decision Processes for dialogue management. In Proceedings of the 4th IJCAI workshop on knowledge and reasoning in practical dialogue systems (pp. 76–82).Google Scholar
  24. Williams, J. D., & Young, S. (2005). Scaling up POMDPs for dialog management: The “Summary POMDP” method. In Proceedings of the IEEE workshop on automatic speech recognition and understanding (ASRU).Google Scholar
  25. Young, S. (2000). Probabilistic methods in spoken dialogue systems. Philosophical Transactions of the Royal Society (Series A), 358(1769), 1389–1402.CrossRefGoogle Scholar
  26. Zhang, B., Cai, Q., Mao, J., & Guo, B. (2001). Planning and acting under uncertainty: A new model for spoken dialogue systems. In proceedings of the sixteenth conference on uncertainty in artificial intelligence (pp. 572–579).Google Scholar

Copyright information

© Springer Science+Business Media 2006

Authors and Affiliations

  1. 1.Microsoft ResearchRedmondUSA

Personalised recommendations