Skip to main content

Part of the book series: Springer Theses ((Springer Theses))

  • 697 Accesses

Abstract

Once a suitable definition of the system’s belief state is found, the system designer must define how actions are to be taken. The policy, denoted by \(\pi \), is the component which decides the action. Section  2.3 gave a brief overview of established techniques for hand-crafting these decisions. This chapter will discuss algorithms that can be used to automate the decision making process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In the partially observable case \(b\) will be a probability distribution.

  2. 2.

    This is known to be finite because the system is episodic.

  3. 3.

    This approach is no less general than defining arbitrary basis functions, since the set of features can always be defined to include any value that is desired in a particular basis function.

  4. 4.

    Note that this approach to deciding the number of matching venues is inefficient when the database is large. An alternative approach is discussed in Chap.  7.

  5. 5.

    The TownInfo system has \(N_a=28, N_c = 10\), the total number of parameters would therefore be \(28\times (7\times 10+ 4)\).

  6. 6.

    The number of parameters for the inform summary act is unchanged at 74. The number of other parameters is \(7\times 9 + 4 = 67\). The number of remaining parameters for the request, select and confirm summary acts is \(27\times 7 = 189\). The total is therefore 330.

  7. 7.

    The occupancy frequency is also sometimes called the state distribution (Peters et al. 2005).

References

  • Amari S (1998) Natural gradient works efficiently in learning. Neural Comput 10:251–276

    Article  Google Scholar 

  • Bradtke SJ, Barto AG (1996) Linear least-squares algorithms for temporal difference learning. Mach Learn 22(1–3):33–57. ISSN 0885–6125

    Google Scholar 

  • Peters J, Vijayakumar S, Schaal S (2005) Natural actor-critic. In: Proceedings of ECML. Springer, Heidelberg, pp 280–291

    Google Scholar 

  • Schatzmann J (2008) Statistical user modeling for dialogue systems. Ph.D. thesis, University of Cambridge

    Google Scholar 

  • Schatzmann J, Thomson B, Weilhammer K, Ye H, Young S (2007) Agenda-based user simulation for bootstrapping a POMDP dialogue system. In: Proceedings of HLT/NAACL

    Google Scholar 

  • Sutton R, Barto A (1998) Reinforcement learning: an introduction. Adaptive computation and machine learning. MIT Press, Cambridge

    Google Scholar 

  • Sutton RS, McAllester D, Singh S, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. In: NIPS 12. MIT Press, Cambridge, pp 1057–1063

    Google Scholar 

  • Williams JD, Young S (2005) Scaling up POMDPs for dialog management: the “Summary POMDP" method. In: Proceedings of ASRU, pp 177–182

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Blaise Thomson .

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag London

About this chapter

Cite this chapter

Thomson, B. (2013). Policy Design. In: Statistical Methods for Spoken Dialogue Management. Springer Theses. Springer, London. https://doi.org/10.1007/978-1-4471-4923-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-4923-1_5

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-4922-4

  • Online ISBN: 978-1-4471-4923-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics