Resolving Structural Ambiguity in Generated Speech

Mellish, Chris

doi:10.1007/978-3-540-27823-8_12

Chris Mellish²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3123))

Included in the following conference series:

International Conference on Natural Language Generation

478 Accesses

Abstract

Ambiguity in the output is a concern for NLG in general. This paper considers the case of structural ambiguity in spoken language generation. We present an algorithm which inserts pauses in spoken text in order to attempt to resolve potential structural ambiguities. This is based on a simple model of the human parser and a characterisation of a subset of places where local ambiguity can arise. A preliminary evaluation contrasts the success of this method with that of some already proposed algorithms for inserting pauses for this purpose.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abney, S., Johnson, M.: Memory Requirements and Local Ambiguities for Parsing Strategies. Journal of Psycholinguistic Research 20(3), 233–250 (1991)
Article Google Scholar
Cutler, A., Dahan, D., van Donselaar, W.: Prosody in the Comprehension of Spoke n Language: A Literature Review. Language and Speech 20(2), 141–201 (1997)
Google Scholar
Fitzpatrick, D.: Towards Accessible Technical Documents: Production of Speech and Braille Output from Formatted Documents. PhD thesis, School of Computer Applications, Dublin City University (1999)
Google Scholar
Hirschberg, J.: Communication and Prosody: Functional Aspects of Prosody. Speech Communication 36, 31–43 (2002)
Article Google Scholar
Hirschberg, J., Prieto, P.: Training Intonational Phrasing Automatically for English and Spanish Text-to-Speech. Speech Communication 18, 281–290 (1996)
Article Google Scholar
Holm, B., Bailly, G., Laborde, C.: Performance structures of mathematical formulae. In: Proceedings of the International Congress of Phonetic Sciences, San Francisco, USA, pp. 1297–1300 (1999)
Google Scholar
Koehn, P., Abney, S., Hirschberg, J., Collins, M.: Improving Intonational Phrasing with Syntactic Information. In: Proceedings of ICASSP 2000 (2000)
Google Scholar
Ladd, D.: Intonational Phonology. Cambridge University Press, Cambridge (1996)
Google Scholar
Paris, C., Thomas, M., Gilson, R., Kincaid, J.: Linguistuc cues and memory for synthetic and natural speech. Human Factors 42(3), 421–431 (2000)
Article Google Scholar
Pereira, F.: A New Characterisation of Attachment Preferences. In: Dowty, D., Karttunen, L., Zwicky, A. (eds.) Natural Language Parsing, pp. 307–319. Cambridge University Press, Cambridge (1985)
Chapter Google Scholar
Prevost, S.: An Information Structural Approach To Spoken Language Generation. In: Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, pp. 294–301 (1996)
Google Scholar
Price, P., Ostendorf, M., Shattuck-Hufnagel, S., Fong, C.: The Use of Prosody in Syntactic Disambiguation. J of the Acoustical Society of America 90(6), 2956–2970 (1991)
Article Google Scholar
Pulman, S.: Grammars, Parsers and Memory Limitations. Language and Cognitive Processes 1(3), 197–225 (1986)
Article Google Scholar
Sanderman, A., Coller, R.: Prosodic Phrasing and Comprehension. Language and Speech 40(4), 391–409 (1997)
Google Scholar
Stevens, R.: Principles for the Design of Auditory Interfaces to Present Complex Information to Blind People. PhD thesis, University of York (1996)
Google Scholar
Stevens, R., Edwards, A., Harling, P.: Access to Mathematics for Visually Disabl edStudents through Multimodal Interaction. Human-Computer Interaction 12, 47–92 (1997)
Article Google Scholar
Streeter, L.: Acoustic Determinants of Phrase Boundary Perception. J of the Acoustical Society of America 64(6), 1582–1592 (1978)
Article Google Scholar
Theune, M.: From Data to Speech: Language Generation in Context. PhD thesis, University of Eindhoven (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing Science, University of Aberdeen, King’s College, ABERDEEN, AB24 3UE, UK
Chris Mellish

Authors

Chris Mellish
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Information Technology Research Institute, University of Brighton, Lewes Road, BN2 4GJ, Brighton, UK
Anja Belz
University of Brighton, Brighton, UK
Roger Evans
NLG Group, Centre for Research in Computing, The Open University, Walton Hall, MK7 6AA, Milton Keynes, UK
Paul Piwek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mellish, C. (2004). Resolving Structural Ambiguity in Generated Speech. In: Belz, A., Evans, R., Piwek, P. (eds) Natural Language Generation. INLG 2004. Lecture Notes in Computer Science(), vol 3123. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27823-8_12

Download citation

DOI: https://doi.org/10.1007/978-3-540-27823-8_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22340-5
Online ISBN: 978-3-540-27823-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics