Skip to main content

Speech production in human-machine dialogue: A natural language generation perspective

  • Dialogue Units and Prosodic Aspect of Spoken Dialogue Processing
  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1236))

Abstract

This article discusses speech production in dialogue from the perspective of natural language generation, focusing on the selection of appropriate intonation. We argue that in order to assign appropriate intonation contours in speech producing systems, it is vital to acknowledge the diversity of functions that intonation fulfills and to account for communicative and immediate contexts as major factors constraining intonation selection. Bringing forward arguments from a functional-linguistically motivated natural language generation architecture, we present a model of context-to-speech as an alternative to the traditional text-to-speech and concept-to-speech approaches.

Authors appear in alphabetical order.-This work was partially funded by the European Union Programme Copernicus, Project No. 10393 (SPEAK!) under contract with the Darmstadt University of Technology. All authors have been actively involved in the project at various stages, either under employment at the Darmstadt University of Technology or GMD-IPSI.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abb, B.; Günther, C.; Herweg, M.; Maienborn, C.; and Schopp, A. 1996. Incremental syntactic and phonological encoding — an outline of the SYNPHONICS formulator. In Adorni, G., and Zock, M., eds., Trends in Natural Language Generation — An Artificial Intelligence Perspective. Berlin and New York: Springer-Verlag. 277–299.

    Google Scholar 

  2. Bateman, J. A., and Teich, E. 1995. Selective information presentation in an integrated publication system: An application of genre-driven text generation. Information Processing & Management 31(5):379–395.

    Google Scholar 

  3. Belkin, N. J.; Cool, C.; Stein, A.; and Thiel, U. 1995. Cases, scripts, and information seeking strategies: On the design of interactive information retrieval systems. Expert Systems and Application 9(3):379–395.

    Google Scholar 

  4. Berry, M. 1981. Systemic linguistics and discourse analysis: A multilayered approach to exchange structure. In Coulthard, M., and Montgomery, M., eds., Studies in Discourse Analysis. London: Routledge and Kegan Paul.

    Google Scholar 

  5. Bierwisch, M. 1973. Regeln für die Intonation deutscher Sätze. In Studia Grammatica VII: Untersuchungen über Akzent und Intonation im Deutschen. Berlin: Akademie Verlag. 99–201.

    Google Scholar 

  6. Bilange, E. 1991. A task independent oral dialogue model. In Proceedings of the European Chapter of the ACL, 83–87.

    Google Scholar 

  7. Black, A., and Campbell, N. 1995. Predicting the intonation of discourse segments from examples in dialogue speech. In Dalsgaard, P.; Larsen, L. B.; Boves, L.; and Thomsen, I., eds., Proceedings of the ESCA Workshop on Spoken Dialogue Systems—Theories and Applications (ETRW '95), Vigsø, Denmark. Aalborg, Denmark: ESCA/Aalborg University. 197–200.

    Google Scholar 

  8. Bunt, H. C. 1989. Information dialogues as communicative action in relation to partner modeling and information processing. In Taylor, M. M.; Neel, F.; and Bouwhuis, D. G., eds., The Structure of Multimodal Dialogue. Amsterdam: North-Holland. 47–73.

    Google Scholar 

  9. Bunt, H. C. 1996. Interaction management functions and context representation requirements. In LuperFoy, S.; Nijholt, A.; and van Zanten, G., eds., Dialogue Management in Natural Language Systems. Proceedings of the Eleventh Twente Workshop on Language Technology, 187–198. Enschede, NL: Universiteit Twente.

    Google Scholar 

  10. Callan, J. P.; Croft, W. B.; and Harding, S. M. 1992. The INQUERY retrieval system. In Proceedings of the 3rd International Conference on Database and Expert Systems Application. Berlin and New York: Springer-Verlag. 78–83.

    Google Scholar 

  11. Dahlbäck, N. 1997. Towards a dialogue taxonomy. In this volume.

    Google Scholar 

  12. Dalsgaard, P.; Larsen, L. B.; Boves, L.; and Thomsen, I., eds. 1995. Proceedings of the ESCA Workshop on Spoken Dialogue Systems—Theories and Applications (ETRW '95), Vigso, Denmark. Aalborg, Denmark: ESCA/Aalborg University.

    Google Scholar 

  13. Dorffner, G.; Buchberger, E.; and Kommenda, M. 1990. Integrating stress and intonation into a concept-to-speech system. In Proceedings of the 14th International Conference on Computational Linguistics (COLING '90), 89–94.

    Google Scholar 

  14. Fawcett, R. P.; van der Mije, A.; and van Wissen, C. 1988. Towards a systemic flowchart model for discourse. In New Developments in Systemic Linguistics. London: Pinter. 116–143.

    Google Scholar 

  15. Fawcett, R. P. 1990. The computer generation of speech with discoursally and semantically motivated intonation. In Proceedings of the 5th International Workshop on Natural Language Generation (INLG '90).

    Google Scholar 

  16. Grote, B. 1995. Specifications of grammar/semantic extensions for inclusion of intonation within the KOMET grammar of German. COPERNICUS '93 Project No. 10393, SPEAK!, Deliverable R2.1.1.

    Google Scholar 

  17. Hagen, E., and Stein, A. 1996. Automatic generation of a complex dialogue history. In McCalla, G., ed., Advances in Artificial Intelligence. Proceedings of the Eleventh Biennial of the Canadian Society for Computational Studies of Intelligence (AI '96). Berlin and New York: Springer-Verlag. 84–96.

    Google Scholar 

  18. Halliday, M. 1967. Intonation and Grammar in British English. The Hague: Mouton.

    Google Scholar 

  19. Halliday, M. 1985. An Introduction to Functional Grammar. London: Edward Arnold.

    Google Scholar 

  20. Hasan, R. 1978. Text in the systemic-functional model. In Dressler, W., ed., Current Trends in Text Linguistics. Berlin: de Gruyter. 228–246.

    Google Scholar 

  21. Hemert, J.; Adriaens-Porzig, U.; and Adriaens, L. 1987. Speech synthesis in the SPICOS project. In Tillmann, H., and Willee, G., eds., Analyse und Synthese gesprochener Sprache. Jahrestagung der GLDV. Hildesheim: Georg Olms. 34–39.

    Google Scholar 

  22. Hirschberg, J.; Nakatani, C.; and Grosz, B. 1995. Conveying discourse structure through intonation variation. In Dalsgaard, P.; Larsen, L.; Boves, L.; and Thomsen, I., eds., Proceedings of the ESCA Workshop on Spoken Dialogue Systems—Theories and Applications (ETRW '95), Vigso, Denmark. Aalborg, Denmark: ESCA/Aalborg University. 189–192.

    Google Scholar 

  23. Hirschberg, J. 1992. Using discourse context to guide pitch accent decisions in synthetic speech. In Bailly, G., and Benoit, C., eds., Talking machines: Theory, Models and Design. Amsterdam: North Holland. 367–376.

    Google Scholar 

  24. Huber, K.; Hunker, H.; Pfister, B.; Russi, T.; and Traber, C. 1987. Sprachsynthese ab Text. In Tillmann, H. G., and Willee, G., eds., Analyse und Synthese gesprochener Sprache. Jahrestagung der GLDV. Hildesheim: Georg Olms. 26–33.

    Google Scholar 

  25. LuperFoy, S.; Nijholt, A.; and van Zanten, G. V., eds. 1996. Dialogue Management in Natural Language Systems. Proceedings of the Eleventh Twente Workshop on Language Technology. Enschede, NL: Universiteit Twente.

    Google Scholar 

  26. Martin, J. R. 1992. English Text: System and Structure. Amsterdam: Benjamins. chapter 7, 493–573.

    Google Scholar 

  27. Matthiessen, C. M. I. M. 1988. Semantics for a systemic grammar: The chooser and inquiry framework. In Benson, J.; Cummings, M.; and Greaves, W., eds., Linguistics in a Systemic Perspective. Amsterdam: Benjamins.

    Google Scholar 

  28. Matthiessen, C. M. I. M. 1995. Lexicogrammatical Cartography: English Systems. Tokyo: International Language Science Publishers.

    Google Scholar 

  29. Nakatani, C. 1995. Discourse structural constraints on accent in narrative. In van Santen, J.; Sproat, R.; Olive, J.; and Hirschberg, J., eds., Progress in Speech Synthesis. Berlin and New York: Springer-Verlag.

    Google Scholar 

  30. O'Donnell, M. 1990. A dynamic model of exchange. Word 41(3):293–327.

    Google Scholar 

  31. Olaszy, G.; Nemeth, G.; Tihanyi, A.; and Szentivanyi, G. 1995. Implementation of the interface language in the SPEAK! dialogue system. COPERNICUS '93 Project No. 10393, SPEAK!, Deliverable P2.3.1.

    Google Scholar 

  32. Olaszy, G.; Gordos, G.; and Nemeth, G. 1992. The MULTIVOX multilingual text-to-speech converter. In Bailly, G., and Benoit, C., eds., Talking Machines: Theory, Models and Design. Amsterdam: North Holland. 385–411.

    Google Scholar 

  33. PENMAN Project. 1989. PENMAN documentation: the Primer, the User Guide, the Reference Manual, and the Nigel manual. Technical report, University of Southern California/Information Sciences Institute, Marina del Rey, CA.

    Google Scholar 

  34. Pheby, J. 1969. Intonation und Grammatik im Deutschen. Berlin: Akademie-Verlag, (2nd. edition, 1980) edition.

    Google Scholar 

  35. Prevost, S., and Steedman, M. 1994. Specifying intonation from context for speech synthesis. Speech Communication 15(1–2):139–153. Also available as http://xxx.lanl.gov/cmp-lg/9407015.

    Google Scholar 

  36. Searle, J. R. 1979. Expression and Meaning. Studies in the Theory of Speech Acts. Cambridge, MA: Cambridge University Press. chapter A Taxonomy of Illocutionary Acts, 1–29.

    Google Scholar 

  37. Sitter, S., and Stein, A. 1996. Modeling information-seeking dialogues: The conversational roles (COR) model. RIS: Review of Information Science 1(1, Pilot Issue). Online Journal. Available from http://www.inf-wiss.unikonstanz.de/RIS/.

    Google Scholar 

  38. Stein, A.; Gulla, J. A.; Müller, A.; and Thiel, U. 1997. Conversational interaction for semantic access to multimedia information. In Maybury, M. T., ed., Intelligent Multimedia Information Retrieval. Menlo Park, CA: AAAI/The MIT Press. chapter 20. (in press).

    Google Scholar 

  39. Teich, E.; Hagen, E.; Grote, B.; and Bateman, J. A. 1997. From communicative context to speech: Integrating dialogue processing, speech production and natural language generation. Speech Communication. (in press).

    Google Scholar 

  40. Teich, E. 1992. KOMET: Grammar documentation. Technical Report, GMD-IPSI (Institut für integrierte Publikations-und Informationssysteme), Darmstadt.

    Google Scholar 

  41. Traum, D. R., and Hinkelman, E. 1992. Conversation acts in task-oriented spoken dialogue. Computational Intelligence 8(3):575–599.

    Google Scholar 

  42. Ventola, E. 1987. The Structure of Social Interaction: A Systemic Approach to the Semiotics of Service Encounters. London: Pinter.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Elisabeth Maier Marion Mast Susann LuperFoy

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Grote, B., Hagen, E., Stein, A., Teich, E. (1997). Speech production in human-machine dialogue: A natural language generation perspective. In: Maier, E., Mast, M., LuperFoy, S. (eds) Dialogue Processing in Spoken Language Systems. DPSLS 1996. Lecture Notes in Computer Science, vol 1236. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63175-5_38

Download citation

  • DOI: https://doi.org/10.1007/3-540-63175-5_38

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63175-0

  • Online ISBN: 978-3-540-69206-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics