On the Extension of the Formal Prosody Model for TTS

Jůzová, Markéta; Tihelka, Daniel; Volín, Jan

doi:10.1007/978-3-030-00794-2_38

Markéta Jůzová¹⁹,
Daniel Tihelka¹⁹ &
Jan Volín²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11107))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1359 Accesses
5 Citations

Abstract

The formal prosody grammar used for TTS focuses mainly on the description of final prosodic words in phrases/sentences which characterize a special prosodic phenomenon representing a certain communication function within the language system. This paper introduces an extension of the prosody model which also takes into account the importance and distinction of the first prosodic words in the prosodic phrases. This phenomenon can not change the semantic interpretation of the phrase, but for higher naturalness, the beginnings of the prosodic phrases differ from subsequent words and should be, based on the phonetic background, dealt with separately.

This research was supported by the Czech Science Foundation (GA CR), project No. GA16-04420S, and by the grant of the University of West Bohemia, project No. SGS-2016-039.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Based on authors’ knowledge, it is much easier for the listeners to be concentrated and to compare 2 short sentences in the listening test rather then compare 2 long compound sentences.

References

Christophe, A., Gout, A., Peperkamp, S., Morgan, J.: The elastic phrase: modelling the dynamics of boundary-adjacent lengthening. J. Phon. 31, 585–598 (2003)
Article Google Scholar
Cutler, A., Dahan, D., Donselaar, W.V.: Prosody in the comprehension of spoken language: a literature review. Lang. Speech 40, 141–201 (1997)
Article Google Scholar
Cutler, A., Otake, T.: The elastic phrase: modelling the dynamics of boundary-adjacent lengthening. J. Mem. Lang. 33, 824–844 (1994)
Article Google Scholar
Gee, J., Grosjean, F.: Performance structures: a psycholinguistic appraisal. Cogn. Psychol. 15, 411–458 (1983)
Article Google Scholar
Hanzlíček, Z.: Czech HMM-based speech synthesis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS (LNAI), vol. 6231, pp. 291–298. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15760-8_37
Chapter Google Scholar
Jůzová, M., Tihelka, D., Skarnitzl, R.: Last syllable unit penalization in unit selection TTS. In: Ekštein, K., Matoušek, V. (eds.) TSD 2017. LNCS (LNAI), vol. 10415, pp. 317–325. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64206-2_36
Chapter Google Scholar
Nooteboom, S.G.: Perceptual goals of speech production. In: Proceedings of the 12th International Congress of Phonetic Sciences, Aix-en-Provence, vol. 1, pp. 107–110 (1991)
Google Scholar
Palková, Z.: Rytmická výstavba prozaického textu. Studia ČSAV; čís. 13/1974. Academia (1974)
Google Scholar
Romportl, J.: Structural data-driven prosody model for TTS synthesis. In: Proceedings of the Speech Prosody 2006 Conference, pp. 549–552. TUD Press, Dresden (2006)
Google Scholar
Romportl, J., Matoušek, J.: Formal prosodic structures and their application in NLP. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 371–378. Springer, Heidelberg (2005). https://doi.org/10.1007/11551874_48
Chapter Google Scholar
Saltzman, E., Byrd, D.: The elastic phrase: modelling the dynamics of boundary-adjacent lengthening. J. Phon. 31, 149–180 (2003)
Article Google Scholar
Taylor, P.: Text-to-Speech Synthesis, 1st edn. Cambridge University Press, New York (2009)
Book Google Scholar
Tihelka, D., Grůber, M., Hanzlíček, Z.: Robust methodology for TTS enhancement evaluation. In: Habernal, I., Matoušek, V. (eds.) TSD 2013. LNCS (LNAI), vol. 8082, pp. 442–449. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40585-3_56
Chapter Google Scholar
Tihelka, D., Hanzlíček, Z., Jůzová, M., Vít, J., Matoušek, J., Grůber, M.: Current state of text-to-speech system ARTIC: a decade of research on the field of speech technologies. In: Sojka, P. (ed.) TSD 2018. LNAI, vol. 11107, pp. 369–378. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00794-2_z
Chapter Google Scholar
Tihelka, D., Kala, J., Matoušek, J.: Enhancements of Viterbi search for fast unit selection synthesis. In: Proceedings of Interspeech 2010, pp. 174–177. ISCA, Makuhari (2010)
Google Scholar
Tihelka, D., Matoušek, J.: Unit selection and its relation to symbolic prosody: a new approach. In: Proceedings of Interspeech 2006, vol. 1, pp. 2042–2045. ISCA, Bonn (2006)
Google Scholar
Volín, J.: Extrakce základní hlasové frekvence a intonační gravitace v češtině. Naše řeč 92(5), 227–239 (2009)
Google Scholar
Volín, J., Skarnitzl, R.: Temporal downtrends in Czech read speech. In: Proceedings of Interspeech 2007, pp. 442–445. ISCA (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Applied Sciences, New Technologies for the Information Society and Department of Cybernetics, University of West Bohemia, Pilsen, Czech Republic
Markéta Jůzová & Daniel Tihelka
Faculty of Arts, Institute of Phonetics, Charles University, Prague, Czech Republic
Jan Volín

Authors

Markéta Jůzová
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Tihelka
View author publications
You can also search for this author in PubMed Google Scholar
Jan Volín
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Markéta Jůzová .

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Aleš Horák
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Ivan Kopeček
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jůzová, M., Tihelka, D., Volín, J. (2018). On the Extension of the Formal Prosody Model for TTS. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2018. Lecture Notes in Computer Science(), vol 11107. Springer, Cham. https://doi.org/10.1007/978-3-030-00794-2_38

Download citation

DOI: https://doi.org/10.1007/978-3-030-00794-2_38
Published: 08 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00793-5
Online ISBN: 978-3-030-00794-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics