Abstract
An important part of any text-to-speech synthesis system is the linguistic processing component that takes input text and converts it into a feature representation from which actual synthesis can proceed. Linguistic analysis is hard, in a large measure because written language massively underspecifies linguistic information. This chapter reviews several issues in linguistic analysis starting from low-level text normalization issues, and ending with higher-level problems such as accent prediction and document-level analysis. We end with some prognosis of the future prospects for improvements over current technology.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsAbbreviations
- ASR:
-
automatic speech recognition
- NATO:
-
North Atlantic Treaty Organization
- POS:
-
part-of-speech
- TTS:
-
text-to-speech
- WFST:
-
weighted finite-state transducer
References
T. Buckwalter: Issues in Arabic morphological analysis. In: Arabic Computational Morphology: Knowledge-Based and Empirical Methods, ed. by A. Soudi, G. Neumann, A. van den Bosch (ACM, New York 2006)
W. Hannas: Asiaʼs Orthographic Dilemma (University Hawaii Press, Honolulu 1997)
K. W. Gan: Integrating word boundary identification with sentence understanding, Ph.D. Dissertation (National University of Singapore, Singapore 1995)
R. Sproat, A. Black, S. Chen, S. Kumar, M. Ostendorf, C. Richards: Normalization of non-standard words, Comput. Speech Lang. 15(3), 287-333 (2001)
R. Sproat (Ed.): Multilingual Text-to-Speech Synthesis: The Bell Labs Approach (Kluwer, Boston 1997)
J. Allen, M.S. Hunnicutt, D. Klatt: From Text to Speech: The MITalk System (Cambridge Univ. Press, Cambridge 1987)
C. Coker, K. Church, M. Liberman: Morphology and rhyming: Two powerful alternatives to letter-to-sound rules for speech synthesis, Proc. ESCA Workshop on Speech Synthesis, ed. by G. Bailly, C. Benoit (ESCA, Autrans 1990) pp. 83-86
K. Koskenniemi: Two-level morphology: a general computational model for word-form recognition and production, Ph.D. dissertation (University of Helsinki, Helsinki 1983)
J. Goldsmith: Unsupervised acquisition of the morphology of a natural language, Comput. Linguist. 27(2), 153-198 (2001)
P. Schone, D. Jurafsky: Knowledge-free induction of morphology using latent semantic analysis, Proc. Comput. Nat. Lang. Learning Conf., Lisbon (2000) pp. 67-72
D. Yarowsky, R. Wicentowski: Minimally supervised morphological analysis by multimodal alignment, Proc. ACL-2000, Hong Kong (2001) pp. 207-216
K. Vijay-Shanker, C.-N. Huang: Minimally supervised morphological analysis by multimodal alignment, Proc. 38th Meeting of the Association for Computational Linguistics, Hong Kong (2000) pp. 207-216
J. Pierrehumbert, M. Beckman: Japanese Tone Structure, Linguistic Inquiry Monograph Series (MIT Press, Cambridge 1988)
R. Ladd: Intonational Phonology (Cambridge Univ. Press, Cambridge 1996)
C. Shih: Prosody Learning and Generation (Springer, Berlin, Heidleberg 2007)
K. Church: A stochastic parts program and noun phrase parser for unrestricted text, Proc. Second Conf. Applied Natural Language Processing (ACL, Austin 1988) pp. 136-143
D. Cutting, J. Kupiec, J. Pedersen, P. Sibun: A practical part-of-speech tagger, Proc. Third Conf. Applied Natural Language Processing (1992)
E. Brill: A simple rule-based part of speech tagger, Proc. Third Conf. Applied Natural Language Processing (ACL, Trento 1992)
A. Ratnaparkhi: A maximum entropy part-of-speech tagger, Proc. First Empirical Methods in Natural Language Processing Conference, Philadelphia (1996)
J. Hirschberg: Pitch accent in context: Predicting intonational prominence from text, Artificial Intelligence 63, 305-340 (1993)
S. Pan, K. McKeown: Word informativeness and automatic pitch accent modeling, EMNLP/VLC 99 (Association for Computational Linguistics, College Park 1999)
E. Fudge: English Word-Stress (Allen Unwin, London 1984)
M. Liberman, R. Sproat: The stress and structure of modified noun phrases in English. In: Lexical Matters, ed. by A. Szabolcsi, I. Sag (CSLI University of Chicago Press, Chicago 1992)
G. Cinque: A null theory of phrase and compound stress, Linguistic Inquiry 24(2), 239-297 (1993)
A. Monaghan: Rhythm and stress-shift in speech synthesis, Comput. Speech Lang. 4, 71-78 (1990)
R. Sproat: English noun-phrase accent prediction for text-to-speech, Comput. Speech Lang. 8, 79-94 (1994)
M. Liberman, A. Prince: On stress and linguistic rhythm, Linguistic Inquiry 8, 249-336 (1977)
D. Klatt: Review of text-to-speech conversion for English, J. Acoust. Soc. Am. 82, 737-793 (1987)
D. OʼShaughnessy: Parsing with a small dictionary for applications such as text to speech, Comput. Linguist. 15, 97-108 (1989)
J. Bachenko, E. Fitzpatrick: A computational grammar of discourse-neutral prosodic phrasing in English, Comput. Linguist. 16, 155-170 (1990)
D. Hindle: A parser for text corpora. In: Computational Approaches to the Lexicon, ed. by B.T.S. Atkins, A. Zampolli (Oxford Univ. Press, New York 1994)
J.P. Gee, F. Grosjean: Performance structures: A psycholinguistic and linguistic appraisal, Cognitive Psychology 15, 411-458 (1983)
M. Wang, J. Hirschberg: Automatic classification of intonational phrase boundaries, Comput. Speech Lang. 6, 175-196 (1992)
L. Breiman, J. Friedman, R. Olshen, C. Stone: Classification and Regression Trees (Wadsworth Brooks, Pacific Grove 1984)
P. Taylor, A. Black: Assigning phrase breaks from part-of-speech sequences, Comput. Speech Lang. 12, 99-117 (1998)
N. Ide, J. Véronis: Word sense disambiguation: The state of the art (1998) pp. 1-4
P. Edmonds, R. Mihalcea, P. Saint-Dizier (Eds.): ACL-02 Workshop on Word Sense Disambiguation: Recent Successes and Future Directions (Association for Computational Linguistics, Philadelphia 2002)
D. Yarowsky: Three machine learning algorithms for lexical ambiguity resolution, Ph.D. dissertation (University of Pennsylvania 1996)
D. Yarowsky: Homograph disambiguation in text-to-speech synthesis. In: Progress in Speech Synthesis, ed. by J. van Santen, R. Sproat, J. Olive, J. Hirschberg (Springer, New York 1997) pp. 157-172
P. Taylor, A. Black, R. Caley: The architecture of the Festival speech synthesis system, Proc. Third ESCA Workshop on Speech Synthesis, Jenolan Caves (1998) pp. 147-151
R. Sproat: Computational morphology. In: Handbook of Natural Language Processing, ed. by R. Dale, H. Moisl, H. Somers (Dekker, New York 1997), forthcoming
W. Skut: Finite-state text processing in a speech synthesis system. In: Language Technology for Business Applications, ed. by J. Piskorski, A. Przepiorkowski (Poznan 2004)
M. Mohri: Finite-state transducers in language and speech processing, Comput. Linguist. 23, 1 (1997)
F. Pereira, M. Riley: Speech recognition by composition of weighted finite automata. In: Finite-State Language Processing, ed. by E. Roche, Y. Schabes (MIT Press, Cambridge 1997)
M. Mohri, F. Pereira, M. Riley: A rational design for a weighted finite-state transducer library, Lecture Notes in Computer Science, Vol. 1436 (1998)
R. Kaplan, M. Kay: Regular models of phonological rule systems, Comput. Linguist. 20, 331-378 (1994)
K. Beesley, L. Karttunen: Finite State Morphology, CSLI Publications (University of Chicago Press, Chicago 2003)
M. Mohri, R. Sproat: An efficient compiler for weighted rewrite rules, 34th Annual Meeting of the Association for Computational Linguistics (ACL, Santa Cruz 1996) pp. 231-238
R. Sproat, J. Hu, H. Chen: EMU: An e-mail preprocessor for text-to-speech, IEEE Signal Processing Society 1998 Workshop on Multimedia Signal Processing, Los Angeles (1998)
H. Chen, J. Hu, R. Sproat: E-mail signature block analysis, ICPRʼ98, Brisbane (1998)
T.V. Raman: Audio system for technical readings, Ph.D. dissertation (Cornell University 1994)
S. Harabagiu, D. Farwell (Eds.): Workshop on Reference Resolution and its Applications, ACL 2004, Barcelona (2004)
C. O. Alm, D. Roth, R. Sproat: Emotions from text: machine learning for text-based emotion prediction, HLT/EMNLP 2005, Vancouver (2005)
C. O. Alm, R. Sproat: Emotional sequencing and development in fairy tales, First Int. Conf. Affective Computing and Intelligent Interaction, Beijing (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Sproat, R. (2008). Linguistic Processing for Speech Synthesis. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-49127-9_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)