Linguistic Processing for Speech Synthesis

Sproat, Richard

doi:10.1007/978-3-540-49127-9_22

Linguistic Processing for Speech Synthesis

Richard Sproat Ph.D⁴

Chapter

7911 Accesses
1 Citations

Part of the book series: Springer Handbooks ((SHB))

Abstract

An important part of any text-to-speech synthesis system is the linguistic processing component that takes input text and converts it into a feature representation from which actual synthesis can proceed. Linguistic analysis is hard, in a large measure because written language massively underspecifies linguistic information. This chapter reviews several issues in linguistic analysis starting from low-level text normalization issues, and ending with higher-level problems such as accent prediction and document-level analysis. We end with some prognosis of the future prospects for improvements over current technology.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 579.00; Price excludes VAT (USA)

Hardcover Book: USD 729.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Abbreviations

ASR:: automatic speech recognition
NATO:: North Atlantic Treaty Organization
POS:: part-of-speech
TTS:: text-to-speech
WFST:: weighted finite-state transducer

References

T. Buckwalter: Issues in Arabic morphological analysis. In: Arabic Computational Morphology: Knowledge-Based and Empirical Methods, ed. by A. Soudi, G. Neumann, A. van den Bosch (ACM, New York 2006)
Google Scholar
W. Hannas: Asiaʼs Orthographic Dilemma (University Hawaii Press, Honolulu 1997)
Google Scholar
K. W. Gan: Integrating word boundary identification with sentence understanding, Ph.D. Dissertation (National University of Singapore, Singapore 1995)
Google Scholar
R. Sproat, A. Black, S. Chen, S. Kumar, M. Ostendorf, C. Richards: Normalization of non-standard words, Comput. Speech Lang. 15(3), 287-333 (2001)
Article Google Scholar
R. Sproat (Ed.): Multilingual Text-to-Speech Synthesis: The Bell Labs Approach (Kluwer, Boston 1997)
Google Scholar
J. Allen, M.S. Hunnicutt, D. Klatt: From Text to Speech: The MITalk System (Cambridge Univ. Press, Cambridge 1987)
Google Scholar
C. Coker, K. Church, M. Liberman: Morphology and rhyming: Two powerful alternatives to letter-to-sound rules for speech synthesis, Proc. ESCA Workshop on Speech Synthesis, ed. by G. Bailly, C. Benoit (ESCA, Autrans 1990) pp. 83-86
Google Scholar
K. Koskenniemi: Two-level morphology: a general computational model for word-form recognition and production, Ph.D. dissertation (University of Helsinki, Helsinki 1983)
Google Scholar
J. Goldsmith: Unsupervised acquisition of the morphology of a natural language, Comput. Linguist. 27(2), 153-198 (2001)
Article MathSciNet Google Scholar
P. Schone, D. Jurafsky: Knowledge-free induction of morphology using latent semantic analysis, Proc. Comput. Nat. Lang. Learning Conf., Lisbon (2000) pp. 67-72
Google Scholar
D. Yarowsky, R. Wicentowski: Minimally supervised morphological analysis by multimodal alignment, Proc. ACL-2000, Hong Kong (2001) pp. 207-216
Google Scholar
K. Vijay-Shanker, C.-N. Huang: Minimally supervised morphological analysis by multimodal alignment, Proc. 38th Meeting of the Association for Computational Linguistics, Hong Kong (2000) pp. 207-216
Google Scholar
J. Pierrehumbert, M. Beckman: Japanese Tone Structure, Linguistic Inquiry Monograph Series (MIT Press, Cambridge 1988)
Google Scholar
R. Ladd: Intonational Phonology (Cambridge Univ. Press, Cambridge 1996)
Google Scholar
C. Shih: Prosody Learning and Generation (Springer, Berlin, Heidleberg 2007)
Google Scholar
K. Church: A stochastic parts program and noun phrase parser for unrestricted text, Proc. Second Conf. Applied Natural Language Processing (ACL, Austin 1988) pp. 136-143
Google Scholar
D. Cutting, J. Kupiec, J. Pedersen, P. Sibun: A practical part-of-speech tagger, Proc. Third Conf. Applied Natural Language Processing (1992)
Google Scholar
E. Brill: A simple rule-based part of speech tagger, Proc. Third Conf. Applied Natural Language Processing (ACL, Trento 1992)
Google Scholar
A. Ratnaparkhi: A maximum entropy part-of-speech tagger, Proc. First Empirical Methods in Natural Language Processing Conference, Philadelphia (1996)
Google Scholar
J. Hirschberg: Pitch accent in context: Predicting intonational prominence from text, Artificial Intelligence 63, 305-340 (1993)
Article Google Scholar
S. Pan, K. McKeown: Word informativeness and automatic pitch accent modeling, EMNLP/VLC 99 (Association for Computational Linguistics, College Park 1999)
Google Scholar
E. Fudge: English Word-Stress (Allen Unwin, London 1984)
Google Scholar
M. Liberman, R. Sproat: The stress and structure of modified noun phrases in English. In: Lexical Matters, ed. by A. Szabolcsi, I. Sag (CSLI University of Chicago Press, Chicago 1992)
Google Scholar
G. Cinque: A null theory of phrase and compound stress, Linguistic Inquiry 24(2), 239-297 (1993)
Google Scholar
A. Monaghan: Rhythm and stress-shift in speech synthesis, Comput. Speech Lang. 4, 71-78 (1990)
Article Google Scholar
R. Sproat: English noun-phrase accent prediction for text-to-speech, Comput. Speech Lang. 8, 79-94 (1994)
Article Google Scholar
M. Liberman, A. Prince: On stress and linguistic rhythm, Linguistic Inquiry 8, 249-336 (1977)
Google Scholar
D. Klatt: Review of text-to-speech conversion for English, J. Acoust. Soc. Am. 82, 737-793 (1987)
Article Google Scholar
D. OʼShaughnessy: Parsing with a small dictionary for applications such as text to speech, Comput. Linguist. 15, 97-108 (1989)
Google Scholar
J. Bachenko, E. Fitzpatrick: A computational grammar of discourse-neutral prosodic phrasing in English, Comput. Linguist. 16, 155-170 (1990)
Google Scholar
D. Hindle: A parser for text corpora. In: Computational Approaches to the Lexicon, ed. by B.T.S. Atkins, A. Zampolli (Oxford Univ. Press, New York 1994)
Google Scholar
J.P. Gee, F. Grosjean: Performance structures: A psycholinguistic and linguistic appraisal, Cognitive Psychology 15, 411-458 (1983)
Article Google Scholar
M. Wang, J. Hirschberg: Automatic classification of intonational phrase boundaries, Comput. Speech Lang. 6, 175-196 (1992)
Article Google Scholar
L. Breiman, J. Friedman, R. Olshen, C. Stone: Classification and Regression Trees (Wadsworth Brooks, Pacific Grove 1984)
MATH Google Scholar
P. Taylor, A. Black: Assigning phrase breaks from part-of-speech sequences, Comput. Speech Lang. 12, 99-117 (1998)
Article Google Scholar
N. Ide, J. Véronis: Word sense disambiguation: The state of the art (1998) pp. 1-4
Google Scholar
P. Edmonds, R. Mihalcea, P. Saint-Dizier (Eds.): ACL-02 Workshop on Word Sense Disambiguation: Recent Successes and Future Directions (Association for Computational Linguistics, Philadelphia 2002)
Google Scholar
D. Yarowsky: Three machine learning algorithms for lexical ambiguity resolution, Ph.D. dissertation (University of Pennsylvania 1996)
Google Scholar
D. Yarowsky: Homograph disambiguation in text-to-speech synthesis. In: Progress in Speech Synthesis, ed. by J. van Santen, R. Sproat, J. Olive, J. Hirschberg (Springer, New York 1997) pp. 157-172
Chapter Google Scholar
P. Taylor, A. Black, R. Caley: The architecture of the Festival speech synthesis system, Proc. Third ESCA Workshop on Speech Synthesis, Jenolan Caves (1998) pp. 147-151
Google Scholar
R. Sproat: Computational morphology. In: Handbook of Natural Language Processing, ed. by R. Dale, H. Moisl, H. Somers (Dekker, New York 1997), forthcoming
Google Scholar
W. Skut: Finite-state text processing in a speech synthesis system. In: Language Technology for Business Applications, ed. by J. Piskorski, A. Przepiorkowski (Poznan 2004)
Google Scholar
M. Mohri: Finite-state transducers in language and speech processing, Comput. Linguist. 23, 1 (1997)
MathSciNet Google Scholar
F. Pereira, M. Riley: Speech recognition by composition of weighted finite automata. In: Finite-State Language Processing, ed. by E. Roche, Y. Schabes (MIT Press, Cambridge 1997)
Google Scholar
M. Mohri, F. Pereira, M. Riley: A rational design for a weighted finite-state transducer library, Lecture Notes in Computer Science, Vol. 1436 (1998)
Google Scholar
R. Kaplan, M. Kay: Regular models of phonological rule systems, Comput. Linguist. 20, 331-378 (1994)
Google Scholar
K. Beesley, L. Karttunen: Finite State Morphology, CSLI Publications (University of Chicago Press, Chicago 2003)
Google Scholar
M. Mohri, R. Sproat: An efficient compiler for weighted rewrite rules, 34th Annual Meeting of the Association for Computational Linguistics (ACL, Santa Cruz 1996) pp. 231-238
Google Scholar
R. Sproat, J. Hu, H. Chen: EMU: An e-mail preprocessor for text-to-speech, IEEE Signal Processing Society 1998 Workshop on Multimedia Signal Processing, Los Angeles (1998)
Google Scholar
H. Chen, J. Hu, R. Sproat: E-mail signature block analysis, ICPRʼ98, Brisbane (1998)
Google Scholar
T.V. Raman: Audio system for technical readings, Ph.D. dissertation (Cornell University 1994)
Google Scholar
S. Harabagiu, D. Farwell (Eds.): Workshop on Reference Resolution and its Applications, ACL 2004, Barcelona (2004)
Google Scholar
C. O. Alm, D. Roth, R. Sproat: Emotions from text: machine learning for text-based emotion prediction, HLT/EMNLP 2005, Vancouver (2005)
Google Scholar
C. O. Alm, R. Sproat: Emotional sequencing and development in fairy tales, First Int. Conf. Affective Computing and Intelligent Interaction, Beijing (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Linguistics, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, 707 South Mathews Avenue, 61801, Urbana, IL, USA
Richard Sproat Ph.D

Authors

Richard Sproat Ph.D
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Richard Sproat Ph.D .

Editor information

Editors and Affiliations

INRS-EMT, University of Quebec, 800 de la Gauchetiere Ouest, H5A 1K6, Montreal, Quebec, Canada
Jacob Benesty Dr.
Avayalabs Research, 233 Mount Airy Road, 07920, Basking Ridge, NJ, USA
M. Mohan Sondhi Ph.D.
Alcatel-Lucent, Bell Laboratories, 600 Mountain Avenue, 07974, Murray Hill, NJ, USA
Yiteng Arden Huang Dr.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sproat, R. (2008). Linguistic Processing for Speech Synthesis. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_22

Download citation

DOI: https://doi.org/10.1007/978-3-540-49127-9_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics