Skip to main content
Log in

Evaluating the Markov assumption in Markov Decision Processes for spoken dialogue management

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

The goal of dialogue management in a spoken dialogue system is to take actions based on observations and inferred beliefs. To ensure that the actions optimize the performance or robustness of the system, researchers have turned to reinforcement learning methods to learn policies for action selection. To derive an optimal policy from data, the dynamics of the system is often represented as a Markov Decision Process (MDP), which assumes that the state of the dialogue depends only on the previous state and action. In this article, we investigate whether constraining the state space by the Markov assumption, especially when the structure of the state space may be unknown, truly affords the highest reward. In simulation experiments conducted in the context of a dialogue system for interacting with a speech-enabled web browser, models under the Markov assumption did not perform as well as an alternative model which classifies the total reward with accumulating features. We discuss the implications of the study as well as its limitations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Bellman, R. E. (1957). Dynamic programming. Princeton University Press

  • Chickering, D. (2002). The WinMine Toolkit. Technical Report MSR-TR-2002-103, Microsoft, Redmond, WA

  • Chickering, D. M., Heckerman, D., & Meek, C. (1997). A Bayesian approach to learning Bayesian networks with local structure. In Proceedings of the thirteenth conference on uncertainty in artificial intelligence (pp. 80–89), Providence, RI: Morgan Kaufmann.

  • Chickering, D. M., & Paek, T. (2006). Personalizing influence diagrams: Applying online learning strategies to dialogue management. User Modeling and User-adapted Interaction, To appear.

  • Clark, H. (1996). Using language. Cambridge: Cambridge University Press.

    Google Scholar 

  • Heckerman, D. (1995). A Bayesian approach for learning causal networks. In S. Hanks, & P. Besnard (Eds.), Proceedings of the eleventh conference on uncertainty in articial intelligence (UAI) (pp. 285–295). Morgan Kaufmann.

  • Howard, R. A., & Matheson, J. (1981). Influence diagrams. In Readings on the principles and applications of decision analysis (Vol. II, pp. 721–762). Menlo Park, CA: Strategic Decisions Group.

  • Kaelbling, L. P., Littman, M. L., & Moore, A. P. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237–285.

    Google Scholar 

  • Kearns, M. J., Mansour, Y., & Ng, A. Y. (1999). A sparse sampling algorithm for near-optimal planning in large Markov Decision Processes. In Proceedings of the international joint conference on artificial intelligence (IJCAI) (pp. 1324–1231).

  • Lauritzen, S. L., & Nilsson, D. (2001). Representing and solving decision problems with limited information. Management Science, 47(9), 1235–1251.

    Article  Google Scholar 

  • Levin, E., Pieraccini, R., & Eckert, W. (1998). Using Markov Decision Processes for learning dialogue strategies. In IEEE Transactions on Speech and Audio Processing (Vol. 8, pp. 11–23).

  • Meuleau, N., Hauskrecht, M., Kim, K.-E., Peshkin, L., Kaelbling, L.P., Dean, T., & Boutilier, C. (1998). Solving very large weakly coupled Markov Decision Processes. In Proceedings of the fifteenth national conference on artificial intelligence and the tenth conference on innovative applications of artificial intelligence (AAAI/IAAI) (pp. 165–172).

  • Paek, T. & Horvitz, E. (2000). Conversation as action under uncertainty. In Proceedings of the sixteenth conference on uncertainty in articial intelligence (UAI) (pp. 455–464).

  • Paek, T. & Horvitz, E. (2004). Optimizing automated call routing by integrating spoken dialog models with queuing models. In Proceedings of the human language technology conference/North American chapter of the association for computational linguistics annual meeting (HLT/NAACL) (pp. 41–48).

  • Roy, N., Pineau, J., & Thrun, S. (2000). Spoken dialogue management using probabilistic reasoning. In Proceedings of the conference of the association for computational linguistics (ACL) (pp. 93–100).

  • Shachter, R. D. (1988). Probabilistic inference and influence diagrams. Operations Research, 36(4), 589–604.

    Article  Google Scholar 

  • Singh, S., Litman, D., Kearns, M., & Walker, M. (2002). Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJ-Fun System. Journal of Artificial Intelligence Research, 16, 105–133.

    Google Scholar 

  • Sutton, R. & Barto, A. (1998). Reinforcement learning: An introduction. MIT Press.

  • Tatman, J. A. & Shachter, R. D. (1990). Dynamic programming and influence diagrams. IEEE transactions on Systems, Man and Cybernetics, 20(2), 365–379.

    Article  Google Scholar 

  • Walker, M., Aberdeen, J., Boland, J., Bratt, E., Garofolo, J., Hirschman, L., Le, A., Lee, S., Narayanan, S., Papineni, K., Pellom, K., Polifroni, B., Potamianos, A., Prabhu, P., Rudnicky, A., Sanders, G., Seneff, S., Stallard, D., & Whittaker, S. (2001a). DARPA communicator dialog travel planning systems: The June 2000 data collection. In Proceedings of the European conference on speech communication and technology (Eurospeech) (pp. 1371–1374).

  • Walker, M., Passonneau, R., & Boland, J. (2001b). Quantitative and qualitative evaluation of DARPA communicator spoken dialogue systems. In Proceedings of the conference of the association for computational linguistics (ACL) (pp. 515–522).

  • Watkins, C., & Dayan, P. (1992). Q-Learning. Machine Learning, 8(3), 229–256.

    Google Scholar 

  • Williams, J. D., Poupart, P., & Young, S. (2005). Factored partially observable Markov Decision Processes for dialogue management. In Proceedings of the 4th IJCAI workshop on knowledge and reasoning in practical dialogue systems (pp. 76–82).

  • Williams, J. D., & Young, S. (2005). Scaling up POMDPs for dialog management: The “Summary POMDP” method. In Proceedings of the IEEE workshop on automatic speech recognition and understanding (ASRU).

  • Young, S. (2000). Probabilistic methods in spoken dialogue systems. Philosophical Transactions of the Royal Society (Series A), 358(1769), 1389–1402.

    Article  Google Scholar 

  • Zhang, B., Cai, Q., Mao, J., & Guo, B. (2001). Planning and acting under uncertainty: A new model for spoken dialogue systems. In proceedings of the sixteenth conference on uncertainty in artificial intelligence (pp. 572–579).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tim Paek.

Appendices

Appendix

The following list includes some of the features utilized for the models. The term Cmd in a feature name refers to the user command, and the term Token refers to the user utterance. For example, if the user says “go back”, the Cmd will be BACK and the Token will be “go back”.

A. within-utterance ASR features (for each turn i)

  1. 1.

    {Top|Second|Third} Cmd (i): The command that occupies the {first|second|third} position in the top-n list.

  2. 2.

    {Top|Second|Third} token (i): Token that occupies the {first|second|third} position in the top-n list.

  3. 3.

    {Top|Second|Third} score (i): Confidence score that occupies the {first|second|third} position in the top-n list.

  4. 4.

    Number of false recognitions (i): Number of passes through SAPI’s word lattice that fail to recognize a phrase.

  5. 5.

    Number of interference (i): Number of times SAPI raised an event indicating that recognition might have been compromised by a particular audio distortion type.

  6. 6.

    Most freq interference (i): Most common type of audio-distortion event type raised by SAPI.

  7. 7.

    Number of sound starts (i): Number of times any sound start point is detected.

  8. 8.

    Number of sound ends (i): Number of times any sound end point is detected.

  9. 9.

    Number of phrase starts (i): Number of times a phrase is detected from an audio stream.

  10. 10.

    Maximum redundant {Cmd|Token|Combined} (i): The cardinality of the most frequently occurring {command|token|both}.

  11. 11.

    Maximum number {Cmd|Token|Combined} Matches (i): The cardinality of distinct {command|token|both} that repeat.

  12. 12.

    Score count (i): Number of items in the n-best list.

  13. 13.

    Score sum (i): Sum of all the confidence scores.

  14. 14.

    Maximum score (i): Maximum confidence score.

  15. 15.

    Minimum score (i): Minimum confidence score.

  16. 16.

    Score range (i): Difference between the maximum and minimum confidence scores.

  17. 17.

    Score median (i): Median confidence score if any.

  18. 18.

    Score mean (i): Arithmetic mean of the confidence scores.

  19. 19.

    Score geometric mean (i): Geometric mean of the confidence scores.

  20. 20.

    Greatest consecutive difference (i): Greatest difference between any two consecutive confidence scores, if there are two or more confidence scores.

  21. 21.

    Score variance (i): variance of the confidence scores.

  22. 22.

    Score Stdev (i): Standard deviation of the confidence scores.

  23. 23.

    Score Stderr (i): Standard error of the confidence scores.

  24. 24.

    Score mode (i): Mode of the confidence scores.

  25. 25.

    Cmds all same (i): Whether the commands of the current n-best list are all the same, the top two are the same, or other.

B. Between-utterance ASR features (for i = 2 and i = 3)

  1. 1.

    Index of top Cmd in previous (i): The position of the current top command in the previous n-best list, if any.

  2. 2.

    Index of top Cmd in first slice (i): The position of the current top command in the first-turn n-best list, if any.

  3. 3.

    Top Cmd same as previous (i): Whether the current top command was the previous top command.

  4. 4.

    Index of top token in previous (i): The position of the current top token in the previous n-best list, if any.

  5. 5.

    Index of top token in first slice (i): The position of the current top token in the first turn n-best list, if any.

  6. 6.

    Score more than previous (i): Whether the average confidence score is greater than the previous average confidence score.

  7. 7.

    Gap between top scores (i): Difference between the current top confidence score and the previous top confidence score.

  8. 8.

    Gap between top scores with first slice (i): Difference between the current top confidence score and the first-turn top confidence score.

C. Dialogue features

  1. 1.

    Turn: The current dialogue step.

  2. 2.

    Has Confirm: Whether or not a confirmation has been performed anytime in the previous turns.

  3. 3.

    Number of repairs so far (i): Number of repairs up to i.

  4. 4.

    Number of confirms so far (i): Number of confirmations up to i.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Paek, T., Chickering, D.M. Evaluating the Markov assumption in Markov Decision Processes for spoken dialogue management. Lang Resources & Evaluation 40, 47–66 (2006). https://doi.org/10.1007/s10579-006-9008-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-006-9008-2

Keywords

Navigation