Skip to main content

Resolving Structural Ambiguity in Generated Speech

  • Conference paper
Book cover Natural Language Generation (INLG 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3123))

Included in the following conference series:

  • 478 Accesses

Abstract

Ambiguity in the output is a concern for NLG in general. This paper considers the case of structural ambiguity in spoken language generation. We present an algorithm which inserts pauses in spoken text in order to attempt to resolve potential structural ambiguities. This is based on a simple model of the human parser and a characterisation of a subset of places where local ambiguity can arise. A preliminary evaluation contrasts the success of this method with that of some already proposed algorithms for inserting pauses for this purpose.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abney, S., Johnson, M.: Memory Requirements and Local Ambiguities for Parsing Strategies. Journal of Psycholinguistic Research 20(3), 233–250 (1991)

    Article  Google Scholar 

  2. Cutler, A., Dahan, D., van Donselaar, W.: Prosody in the Comprehension of Spoke n Language: A Literature Review. Language and Speech 20(2), 141–201 (1997)

    Google Scholar 

  3. Fitzpatrick, D.: Towards Accessible Technical Documents: Production of Speech and Braille Output from Formatted Documents. PhD thesis, School of Computer Applications, Dublin City University (1999)

    Google Scholar 

  4. Hirschberg, J.: Communication and Prosody: Functional Aspects of Prosody. Speech Communication 36, 31–43 (2002)

    Article  Google Scholar 

  5. Hirschberg, J., Prieto, P.: Training Intonational Phrasing Automatically for English and Spanish Text-to-Speech. Speech Communication 18, 281–290 (1996)

    Article  Google Scholar 

  6. Holm, B., Bailly, G., Laborde, C.: Performance structures of mathematical formulae. In: Proceedings of the International Congress of Phonetic Sciences, San Francisco, USA, pp. 1297–1300 (1999)

    Google Scholar 

  7. Koehn, P., Abney, S., Hirschberg, J., Collins, M.: Improving Intonational Phrasing with Syntactic Information. In: Proceedings of ICASSP 2000 (2000)

    Google Scholar 

  8. Ladd, D.: Intonational Phonology. Cambridge University Press, Cambridge (1996)

    Google Scholar 

  9. Paris, C., Thomas, M., Gilson, R., Kincaid, J.: Linguistuc cues and memory for synthetic and natural speech. Human Factors 42(3), 421–431 (2000)

    Article  Google Scholar 

  10. Pereira, F.: A New Characterisation of Attachment Preferences. In: Dowty, D., Karttunen, L., Zwicky, A. (eds.) Natural Language Parsing, pp. 307–319. Cambridge University Press, Cambridge (1985)

    Chapter  Google Scholar 

  11. Prevost, S.: An Information Structural Approach To Spoken Language Generation. In: Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, pp. 294–301 (1996)

    Google Scholar 

  12. Price, P., Ostendorf, M., Shattuck-Hufnagel, S., Fong, C.: The Use of Prosody in Syntactic Disambiguation. J of the Acoustical Society of America 90(6), 2956–2970 (1991)

    Article  Google Scholar 

  13. Pulman, S.: Grammars, Parsers and Memory Limitations. Language and Cognitive Processes 1(3), 197–225 (1986)

    Article  Google Scholar 

  14. Sanderman, A., Coller, R.: Prosodic Phrasing and Comprehension. Language and Speech 40(4), 391–409 (1997)

    Google Scholar 

  15. Stevens, R.: Principles for the Design of Auditory Interfaces to Present Complex Information to Blind People. PhD thesis, University of York (1996)

    Google Scholar 

  16. Stevens, R., Edwards, A., Harling, P.: Access to Mathematics for Visually Disabl edStudents through Multimodal Interaction. Human-Computer Interaction 12, 47–92 (1997)

    Article  Google Scholar 

  17. Streeter, L.: Acoustic Determinants of Phrase Boundary Perception. J of the Acoustical Society of America 64(6), 1582–1592 (1978)

    Article  Google Scholar 

  18. Theune, M.: From Data to Speech: Language Generation in Context. PhD thesis, University of Eindhoven (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mellish, C. (2004). Resolving Structural Ambiguity in Generated Speech. In: Belz, A., Evans, R., Piwek, P. (eds) Natural Language Generation. INLG 2004. Lecture Notes in Computer Science(), vol 3123. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27823-8_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-27823-8_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22340-5

  • Online ISBN: 978-3-540-27823-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics