Skip to main content

Linguistic Processing for Speech Synthesis

  • Chapter

Part of the book series: Springer Handbooks ((SHB))

Abstract

An important part of any text-to-speech synthesis system is the linguistic processing component that takes input text and converts it into a feature representation from which actual synthesis can proceed. Linguistic analysis is hard, in a large measure because written language massively underspecifies linguistic information. This chapter reviews several issues in linguistic analysis starting from low-level text normalization issues, and ending with higher-level problems such as accent prediction and document-level analysis. We end with some prognosis of the future prospects for improvements over current technology.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   579.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   729.00
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Abbreviations

ASR:

automatic speech recognition

NATO:

North Atlantic Treaty Organization

POS:

part-of-speech

TTS:

text-to-speech

WFST:

weighted finite-state transducer

References

  1. T. Buckwalter: Issues in Arabic morphological analysis. In: Arabic Computational Morphology: Knowledge-Based and Empirical Methods, ed. by A. Soudi, G. Neumann, A. van den Bosch (ACM, New York 2006)

    Google Scholar 

  2. W. Hannas: Asiaʼs Orthographic Dilemma (University Hawaii Press, Honolulu 1997)

    Google Scholar 

  3. K. W. Gan: Integrating word boundary identification with sentence understanding, Ph.D. Dissertation (National University of Singapore, Singapore 1995)

    Google Scholar 

  4. R. Sproat, A. Black, S. Chen, S. Kumar, M. Ostendorf, C. Richards: Normalization of non-standard words, Comput. Speech Lang. 15(3), 287-333 (2001)

    Article  Google Scholar 

  5. R. Sproat (Ed.): Multilingual Text-to-Speech Synthesis: The Bell Labs Approach (Kluwer, Boston 1997)

    Google Scholar 

  6. J. Allen, M.S. Hunnicutt, D. Klatt: From Text to Speech: The MITalk System (Cambridge Univ. Press, Cambridge 1987)

    Google Scholar 

  7. C. Coker, K. Church, M. Liberman: Morphology and rhyming: Two powerful alternatives to letter-to-sound rules for speech synthesis, Proc. ESCA Workshop on Speech Synthesis, ed. by G. Bailly, C. Benoit (ESCA, Autrans 1990) pp. 83-86

    Google Scholar 

  8. K. Koskenniemi: Two-level morphology: a general computational model for word-form recognition and production, Ph.D. dissertation (University of Helsinki, Helsinki 1983)

    Google Scholar 

  9. J. Goldsmith: Unsupervised acquisition of the morphology of a natural language, Comput. Linguist. 27(2), 153-198 (2001)

    Article  MathSciNet  Google Scholar 

  10. P. Schone, D. Jurafsky: Knowledge-free induction of morphology using latent semantic analysis, Proc. Comput. Nat. Lang. Learning Conf., Lisbon (2000) pp. 67-72

    Google Scholar 

  11. D. Yarowsky, R. Wicentowski: Minimally supervised morphological analysis by multimodal alignment, Proc. ACL-2000, Hong Kong (2001) pp. 207-216

    Google Scholar 

  12. K. Vijay-Shanker, C.-N. Huang: Minimally supervised morphological analysis by multimodal alignment, Proc. 38th Meeting of the Association for Computational Linguistics, Hong Kong (2000) pp. 207-216

    Google Scholar 

  13. J. Pierrehumbert, M. Beckman: Japanese Tone Structure, Linguistic Inquiry Monograph Series (MIT Press, Cambridge 1988)

    Google Scholar 

  14. R. Ladd: Intonational Phonology (Cambridge Univ. Press, Cambridge 1996)

    Google Scholar 

  15. C. Shih: Prosody Learning and Generation (Springer, Berlin, Heidleberg 2007)

    Google Scholar 

  16. K. Church: A stochastic parts program and noun phrase parser for unrestricted text, Proc. Second Conf. Applied Natural Language Processing (ACL, Austin 1988) pp. 136-143

    Google Scholar 

  17. D. Cutting, J. Kupiec, J. Pedersen, P. Sibun: A practical part-of-speech tagger, Proc. Third Conf. Applied Natural Language Processing (1992)

    Google Scholar 

  18. E. Brill: A simple rule-based part of speech tagger, Proc. Third Conf. Applied Natural Language Processing (ACL, Trento 1992)

    Google Scholar 

  19. A. Ratnaparkhi: A maximum entropy part-of-speech tagger, Proc. First Empirical Methods in Natural Language Processing Conference, Philadelphia (1996)

    Google Scholar 

  20. J. Hirschberg: Pitch accent in context: Predicting intonational prominence from text, Artificial Intelligence 63, 305-340 (1993)

    Article  Google Scholar 

  21. S. Pan, K. McKeown: Word informativeness and automatic pitch accent modeling, EMNLP/VLC 99 (Association for Computational Linguistics, College Park 1999)

    Google Scholar 

  22. E. Fudge: English Word-Stress (Allen Unwin, London 1984)

    Google Scholar 

  23. M. Liberman, R. Sproat: The stress and structure of modified noun phrases in English. In: Lexical Matters, ed. by A. Szabolcsi, I. Sag (CSLI University of Chicago Press, Chicago 1992)

    Google Scholar 

  24. G. Cinque: A null theory of phrase and compound stress, Linguistic Inquiry 24(2), 239-297 (1993)

    Google Scholar 

  25. A. Monaghan: Rhythm and stress-shift in speech synthesis, Comput. Speech Lang. 4, 71-78 (1990)

    Article  Google Scholar 

  26. R. Sproat: English noun-phrase accent prediction for text-to-speech, Comput. Speech Lang. 8, 79-94 (1994)

    Article  Google Scholar 

  27. M. Liberman, A. Prince: On stress and linguistic rhythm, Linguistic Inquiry 8, 249-336 (1977)

    Google Scholar 

  28. D. Klatt: Review of text-to-speech conversion for English, J. Acoust. Soc. Am. 82, 737-793 (1987)

    Article  Google Scholar 

  29. D. OʼShaughnessy: Parsing with a small dictionary for applications such as text to speech, Comput. Linguist. 15, 97-108 (1989)

    Google Scholar 

  30. J. Bachenko, E. Fitzpatrick: A computational grammar of discourse-neutral prosodic phrasing in English, Comput. Linguist. 16, 155-170 (1990)

    Google Scholar 

  31. D. Hindle: A parser for text corpora. In: Computational Approaches to the Lexicon, ed. by B.T.S. Atkins, A. Zampolli (Oxford Univ. Press, New York 1994)

    Google Scholar 

  32. J.P. Gee, F. Grosjean: Performance structures: A psycholinguistic and linguistic appraisal, Cognitive Psychology 15, 411-458 (1983)

    Article  Google Scholar 

  33. M. Wang, J. Hirschberg: Automatic classification of intonational phrase boundaries, Comput. Speech Lang. 6, 175-196 (1992)

    Article  Google Scholar 

  34. L. Breiman, J. Friedman, R. Olshen, C. Stone: Classification and Regression Trees (Wadsworth Brooks, Pacific Grove 1984)

    MATH  Google Scholar 

  35. P. Taylor, A. Black: Assigning phrase breaks from part-of-speech sequences, Comput. Speech Lang. 12, 99-117 (1998)

    Article  Google Scholar 

  36. N. Ide, J. Véronis: Word sense disambiguation: The state of the art (1998) pp. 1-4

    Google Scholar 

  37. P. Edmonds, R. Mihalcea, P. Saint-Dizier (Eds.): ACL-02 Workshop on Word Sense Disambiguation: Recent Successes and Future Directions (Association for Computational Linguistics, Philadelphia 2002)

    Google Scholar 

  38. D. Yarowsky: Three machine learning algorithms for lexical ambiguity resolution, Ph.D. dissertation (University of Pennsylvania 1996)

    Google Scholar 

  39. D. Yarowsky: Homograph disambiguation in text-to-speech synthesis. In: Progress in Speech Synthesis, ed. by J. van Santen, R. Sproat, J. Olive, J. Hirschberg (Springer, New York 1997) pp. 157-172

    Chapter  Google Scholar 

  40. P. Taylor, A. Black, R. Caley: The architecture of the Festival speech synthesis system, Proc. Third ESCA Workshop on Speech Synthesis, Jenolan Caves (1998) pp. 147-151

    Google Scholar 

  41. R. Sproat: Computational morphology. In: Handbook of Natural Language Processing, ed. by R. Dale, H. Moisl, H. Somers (Dekker, New York 1997), forthcoming

    Google Scholar 

  42. W. Skut: Finite-state text processing in a speech synthesis system. In: Language Technology for Business Applications, ed. by J. Piskorski, A. Przepiorkowski (Poznan 2004)

    Google Scholar 

  43. M. Mohri: Finite-state transducers in language and speech processing, Comput. Linguist. 23, 1 (1997)

    MathSciNet  Google Scholar 

  44. F. Pereira, M. Riley: Speech recognition by composition of weighted finite automata. In: Finite-State Language Processing, ed. by E. Roche, Y. Schabes (MIT Press, Cambridge 1997)

    Google Scholar 

  45. M. Mohri, F. Pereira, M. Riley: A rational design for a weighted finite-state transducer library, Lecture Notes in Computer Science, Vol. 1436 (1998)

    Google Scholar 

  46. R. Kaplan, M. Kay: Regular models of phonological rule systems, Comput. Linguist. 20, 331-378 (1994)

    Google Scholar 

  47. K. Beesley, L. Karttunen: Finite State Morphology, CSLI Publications (University of Chicago Press, Chicago 2003)

    Google Scholar 

  48. M. Mohri, R. Sproat: An efficient compiler for weighted rewrite rules, 34th Annual Meeting of the Association for Computational Linguistics (ACL, Santa Cruz 1996) pp. 231-238

    Google Scholar 

  49. R. Sproat, J. Hu, H. Chen: EMU: An e-mail preprocessor for text-to-speech, IEEE Signal Processing Society 1998 Workshop on Multimedia Signal Processing, Los Angeles (1998)

    Google Scholar 

  50. H. Chen, J. Hu, R. Sproat: E-mail signature block analysis, ICPRʼ98, Brisbane (1998)

    Google Scholar 

  51. T.V. Raman: Audio system for technical readings, Ph.D. dissertation (Cornell University 1994)

    Google Scholar 

  52. S. Harabagiu, D. Farwell (Eds.): Workshop on Reference Resolution and its Applications, ACL 2004, Barcelona (2004)

    Google Scholar 

  53. C. O. Alm, D. Roth, R. Sproat: Emotions from text: machine learning for text-based emotion prediction, HLT/EMNLP 2005, Vancouver (2005)

    Google Scholar 

  54. C. O. Alm, R. Sproat: Emotional sequencing and development in fairy tales, First Int. Conf. Affective Computing and Intelligent Interaction, Beijing (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Richard Sproat Ph.D .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Sproat, R. (2008). Linguistic Processing for Speech Synthesis. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-49127-9_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49125-5

  • Online ISBN: 978-3-540-49127-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics