Abstract
This chapter includes two major sections. In Sect. 3.1, we introduce sequential decision making and study the supporting mathematical framework for it. We describe the Markov decision process (MDP) and the partially observable MDP (POMDP) frameworks, and present the well-known algorithms for solving them. In Sect. 3.2, we introduce spoken dialog systems (SDSs). Then, we study the related work of sequential decision making in spoken dialog management. In particular, we study the related research on application of the POMDP framework for spoken dialog management. Finally, we review the user modeling techniques that have been used for dialog POMDPs.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Note that here we assume that the PBVI is performed on a fixed set of random points similar to the PERSEUS algorithm, the PBVI algorithm proposed by Spaan and Vlassis (2005).
References
Ai, H., & Litman, D. J. (2007). Knowledge consistent user simulations for dialog systems. In Proceedings of the 8th Annual Conference of the International Speech Communication Association (INTERSPEECH’07), Antwerp.
Atrash, A., & Pineau, J. (2010). A Bayesian method for learning POMDP observation parameters for robot interaction management systems. In The POMDP Practitioners Workshop.
Bellman, R. (1957a). Dynamic programming. Princeton: Princeton University Press.
Bellman, R. (1957b). A Markovian decision process. Journal of Mathematics and Mechanics, 6(6), 679–684
Bonet, B., & Geffner, H. (2003). Faster heuristic search algorithms for planning with uncertainty and full feedback. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI’03), Acapulco, Mexico.
Cassandra, A., Kaelbling, L., & Littman, M. (1995). Acting optimally in partially observable stochastic domains. In Proceedings of the 12th National Conference on Artificial Intelligence (AAAI’95), Seattle, Washington.
Chandramohan, S., Geist, M., Lefevre, F., & Pietquin, O. (2011). User simulation in dialogue systems using inverse reinforcement learning. In Proceedings of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH’11), Florence.
Clark, H., & Brennan, S. (1991). Grounding in communication. Perspectives on Socially Shared Cognition, 13(1991), 127–149.
Cuayáhuitl, H., Renals, S., Lemon, O., & Shimodaira, H. (2005). Human-computer dialogue simulation using hidden Markov models. In Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’05), San Juan, PR.
Dai, P., & Goldsmith, J. (2007). Topological value iteration algorithm for Markov decision processes. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI’07), Hyderabad.
Dibangoye, J. S., Shani, G., Chaib-draa, B., & Mouaddib, A. (2009). Topological order planner for POMDPs. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI’09), Pasadena, CA.
Doshi, F., & Roy, N. (2007). Efficient model learning for dialog management. In Proceedings of the 2nd ACM SIGCHI/SIGART Conference on Human-Robot Interaction (HRI’07), Arlington, VA.
Doshi, F., & Roy, N. (2008). Spoken language interaction with model uncertainty: An adaptive human-robot interaction system. Connection Science, 20(4), 299–318.
Doshi-Velez, F., Pineau, J., & Roy, N. (2012). Reinforcement learning with limited reinforcement: Using bayes risk for active learning in pomdps. Artificial Intelligence, 187. Elesiver, 115–132
Eckert, W., Levin, E., & Pieraccini, R. (1997). User modeling for spoken dialogue system evaluation. In Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’97), Santa Barbara, CA (pp. 80–87).
Frampton, M., & Lemon, O. (2009). Recent research advances in reinforcement learning in spoken dialogue systems. Knowledge Engineering Review, 24(4), 375–408.
Gašić, M., Keizer, S., Mairesse, F., Schatzmann, J., Thomson, B., Yu, K., et al. (2008). Training and evaluation of the HIS POMDP dialogue system in noise. In Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue (SIGdial’08), Columbus, OH.
Georgila, K., Henderson, J., & Lemon, O. (2005). Learning user simulations for information state update dialogue systems. In Proceedings of the 6th Annual Conference of the International Speech Communication Association (INTERSPEECH’05), Lisbon.
Georgila, K., Henderson, J., & Lemon, O. (2006). User simulation for spoken dialogue systems: Learning and evaluation. In Proceedings of the 7th Annual Conference of the International Speech Communication Association (INTERSPEECH’06), Pittsburgh, PA.
Hauskrecht, M. (2000). Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research, 13, 33–94.
Kaelbling, L., Littman, M., & Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1–2), 99–134.
Keizer, S., Gašić, M., Jurčíček, F., Mairesse, F., Thomson, B., Yu, K., et al. (2010). Parameter estimation for agenda-based user simulation. In Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue (pp. 116–123). Tokyo, Japan: Association for Computational Linguistics.
Kim, D., Kim, J., & Kim, K. (2011). Robust performance evaluation of POMDP-based dialogue systems. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 1029–1040.
Kim, D., Sim, H. S., Kim, K.-E., Kim, J. H., Kim, H., & Sung, J. W. (2008). Effects of user modeling on POMDP-based dialogue systems. In Proceedings of the 9th Annual Conference of the International Speech Communication Association (INTERSPEECH’08), Brisbane.
Lee, D., & Seung, H. (2001). Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems, 13, 556–562.
Levin, E., & Pieraccini, R. (1997). A stochastic model of computer-human interaction for learning dialogue strategies. In Proceedings of 5th European Conference on Speech Communication and Technology (Eurospeech’97), Rhodes.
Li, X., Cheung, W., Liu, J., & Wu, Z. (2007). A novel orthogonal nmf-based belief compression for POMDPs. In Proceedings of the 24th International Conference on Machine learning (ICML’07), Corvallis.
Lison, P. (2013). Model-based bayesian reinforcement learning for dialogue management. In Proceedings of 14th Annual Conference of the International Speech Communication Association (INTERSPEECH’13), Lyon.
Lusena, C., Goldsmith, J., & Mundhenk, M. (2001). Nonapproximability results for partially observable Markov decision processes. Journal of Artificial Intelligence Research, 14, 83–103.
Madani, O., Hanks, S., & Condon, A. (1999). On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. In Proceedings of the 16th National Conference on Artificial Intelligence (AAAI’99) and the 11th Innovative Applications of Artificial Intelligence Conference Innovative Applications of Artificial Intelligence, Orlando, FL.
Monahan, G. (1982). A survey of partially observable Markov decision processes: Theory, models, and algorithms. Management Science, 28, 1–16.
Papadimitriou, C., & Tsitsiklis, J. (1987). The complexity of Markov decision process. Mathematics of Operations Research, 12(3), 441–450.
Paquet, S. (2006). Distributed decision-making and task coordination in dynamic, uncertain and real-time multiagent environments. Ph.D. thesis, Université Laval.
Paquet, S., Tobin, L., & Chaib-draa, B. (2005). An online POMDP algorithm for complex multiagent environments. In Proceedings of the 4th International Joint Conference on Autonomous Agents and Multi Agent Systems (AAMAS’05), Utrecht.
Pieraccini, R., Levin, E., & Eckert, W. (1997). Learning dialogue strategies within Markov decision process framework. In Proceedings of IEEE Workshop Automatic Speech Recognition and Understanding (ASRU’97), Rhodes.
Pietquin, O. (2004). A framework for unsupervised learning of dialogue strategies. Ph.D. thesis, Faculté Polytechnique de Mons.
Pietquin, O. (2006). Consistent goal-directed user model for realistic man-machine task-oriented spoken dialogue simulation. In Proceedings of IEEE International Conference on Multimedia and Expo (ICME’06), Toronto, ON (pp. 425–428).
Pietquin, O., & Dutoit, T. (2006). A probabilistic framework for dialog simulation and optimal strategy learning. IEEE Transactions on Audio, Speech, and Language Processing, 14(2), 589–599.
Pineau, J. (2004). Tractable planning under uncertainty: Exploiting structure. Ph.D. thesis, Rutgers University.
Pineau, J., Gordon, G., & Thrun, S. (2003). Point-based value iteration: An anytime algorithm for POMDPs. In International Joint Conference on Artificial Intelligence (IJCAI’03), Acapulco.
Png, S., & Pineau, J. (2011). Bayesian reinforcement learning for POMDP-based dialogue systems. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’11), Prague.
Png, S., Pineau, J., & Chaib-Draa, B. (2012). Building adaptive dialogue systems via bayes-adaptive pomdps. IEEE Journal of Selected Topics in Signal Processing, 6(8), 917–927.
Poupart, P., & Boutilier, C. (2002). Value-directed compression of POMDPs. In Advances in Neural Information Processing Systems 14 (NIPS’02), Vancouver, BC.
Rieser, V., & Lemon, O. (2006). Cluster-based user simulations for learning dialogue strategies. In Proceedings of the 7th Annual Conference of the International Speech Communication Association (INTERSPEECH’06), Pittsburgh, PA.
Rieser, V., & Lemon, O. (2011). Reinforcement learning for adaptive dialogue systems: a data-driven methodology for dialogue management and natural language generation. Springer Science & Business Media.
Ross, S., Pineau, J., Chaib-draa, B., & Kreitmann, P. (2011). A Bayesian approach for learning and planning in partially observable Markov decision processes. Journal of Machine Learning Research, 12, 1729–1770.
Ross, S., Pineau, J., Paquet, S., & Chaib-draa, B. (2008). Online planning algorithms for POMDPs. Artificial Intelligence Research, 32(1), 663–704.
Roy, N., Gordon, J., & Thrun, S. (2005). Finding approximate POMDP solutions through belief compression. Journal of Artificial Intelligence Research, 23, 1–40.
Roy, N., Pineau, J., & Thrun, S. (2000). Spoken dialogue management using probabilistic reasoning. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics (ACL’00), Hong Kong.
Schatzmann, J., Thomson, B., Weilhammer, K., Ye, H., & Young, S. (2007). Agenda-based user simulation for bootstrapping a POMDP dialogue system. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers (pp. 149–152). Association for Computational Linguistics.
Schatzmann, J., Weilhammer, K., Stuttle, M., & Young, S. (2006). A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. Knowledge Engineering Review, 21(2), 97–126.
Schatzmann, J., & Young, S. (2009). The hidden agenda user simulation model. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 733–747.
Scheffler, K., & Young, S. (2000). Probabilistic simulation of human-machine dialogues. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’00) (Vol. 2, pp. 1217–1220).
Smallwood, R., & Sondik, E. (1973). The optimal control of partially observable Markov processes over a finite horizon. Operations Research, 21, 1071–1088.
Smith, T., & Simmons, R. (2004). Heuristic search value iteration for pomdps. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI ’04), Banff, AB.
Sondik, E. (1971). The optimal control of partially observable Markov processes. Ph.D. thesis, Stanford University.
Spaan, M., & Spaan, N. (2004). A point-based POMDP algorithm for robot planning. In Proceedings of IEEE International Conference on Robotics and Automation (ICRA’04), New Orleans, LA.
Spaan, M., & Vlassis, N. (2005). Perseus: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research, 24(1), 195–220.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge: MIT Press.
Thomson, B. (2009). Statistical methods for spoken dialogue management. Ph.D. thesis, Department of Engineering, University of Cambridge.
Thomson, B., & Young, S. (2010). Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems. Computer Speech and Language, 24(4), 562–588.
Traum, D. (1994). A computational theory of grounding in natural language conversation. Ph.D. thesis, University of Rochester.
Watkins, C. J. C. H., & Dayan, P. (1992). Technical note Q-Learning. Machine Learning, 8, 279–292.
Wierstra, D., & Wiering, M. (2004). Utile distinction hidden Markov models. In Proceedings of the Twenty-First International Conference on Machine Learning (p. 108). New York: ACM.
Williams, J. D. (2006). Partially observable Markov decision processes for spoken dialogue management. Ph.D. thesis, Department of Engineering, University of Cambridge.
Williams, J. D., & Young, S. (2007). Partially observable Markov decision processes for spoken dialog systems. Computer Speech and Language, 21, 393–422.
Young, S., Gasic, M., Thomson, B., & Williams, J. D. (2013). Pomdp-based statistical spoken dialog systems: A review. Proceedings of the IEEE, 101(5), 1160–1179.
Zhang, B., Cai, Q., Mao, J., & Guo, B. (2001b). Planning and acting under uncertainty: A new model for spoken dialogue system. In Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence (UAI’01), Seattle, Washington.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2016 The Authors
About this chapter
Cite this chapter
Chinaei, H., Chaib-draa, B. (2016). Sequential Decision Making in Spoken Dialog Management. In: Building Dialogue POMDPs from Expert Dialogues. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-26200-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-26200-0_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26198-0
Online ISBN: 978-3-319-26200-0
eBook Packages: EngineeringEngineering (R0)