Advertisement

On the Extension of the Formal Prosody Model for TTS

  • Markéta Jůzová
  • Daniel Tihelka
  • Jan Volín
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11107)

Abstract

The formal prosody grammar used for TTS focuses mainly on the description of final prosodic words in phrases/sentences which characterize a special prosodic phenomenon representing a certain communication function within the language system. This paper introduces an extension of the prosody model which also takes into account the importance and distinction of the first prosodic words in the prosodic phrases. This phenomenon can not change the semantic interpretation of the phrase, but for higher naturalness, the beginnings of the prosodic phrases differ from subsequent words and should be, based on the phonetic background, dealt with separately.

Keywords

Unit selection Formal prosody grammar Prosodeme 

References

  1. 1.
    Christophe, A., Gout, A., Peperkamp, S., Morgan, J.: The elastic phrase: modelling the dynamics of boundary-adjacent lengthening. J. Phon. 31, 585–598 (2003)CrossRefGoogle Scholar
  2. 2.
    Cutler, A., Dahan, D., Donselaar, W.V.: Prosody in the comprehension of spoken language: a literature review. Lang. Speech 40, 141–201 (1997)CrossRefGoogle Scholar
  3. 3.
    Cutler, A., Otake, T.: The elastic phrase: modelling the dynamics of boundary-adjacent lengthening. J. Mem. Lang. 33, 824–844 (1994)CrossRefGoogle Scholar
  4. 4.
    Gee, J., Grosjean, F.: Performance structures: a psycholinguistic appraisal. Cogn. Psychol. 15, 411–458 (1983)CrossRefGoogle Scholar
  5. 5.
    Hanzlíček, Z.: Czech HMM-based speech synthesis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS (LNAI), vol. 6231, pp. 291–298. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15760-8_37CrossRefGoogle Scholar
  6. 6.
    Jůzová, M., Tihelka, D., Skarnitzl, R.: Last syllable unit penalization in unit selection TTS. In: Ekštein, K., Matoušek, V. (eds.) TSD 2017. LNCS (LNAI), vol. 10415, pp. 317–325. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-64206-2_36CrossRefGoogle Scholar
  7. 7.
    Nooteboom, S.G.: Perceptual goals of speech production. In: Proceedings of the 12th International Congress of Phonetic Sciences, Aix-en-Provence, vol. 1, pp. 107–110 (1991)Google Scholar
  8. 8.
    Palková, Z.: Rytmická výstavba prozaického textu. Studia ČSAV; čís. 13/1974. Academia (1974)Google Scholar
  9. 9.
    Romportl, J.: Structural data-driven prosody model for TTS synthesis. In: Proceedings of the Speech Prosody 2006 Conference, pp. 549–552. TUD Press, Dresden (2006)Google Scholar
  10. 10.
    Romportl, J., Matoušek, J.: Formal prosodic structures and their application in NLP. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 371–378. Springer, Heidelberg (2005).  https://doi.org/10.1007/11551874_48CrossRefGoogle Scholar
  11. 11.
    Saltzman, E., Byrd, D.: The elastic phrase: modelling the dynamics of boundary-adjacent lengthening. J. Phon. 31, 149–180 (2003)CrossRefGoogle Scholar
  12. 12.
    Taylor, P.: Text-to-Speech Synthesis, 1st edn. Cambridge University Press, New York (2009)CrossRefGoogle Scholar
  13. 13.
    Tihelka, D., Grůber, M., Hanzlíček, Z.: Robust methodology for TTS enhancement evaluation. In: Habernal, I., Matoušek, V. (eds.) TSD 2013. LNCS (LNAI), vol. 8082, pp. 442–449. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-40585-3_56CrossRefGoogle Scholar
  14. 14.
    Tihelka, D., Hanzlíček, Z., Jůzová, M., Vít, J., Matoušek, J., Grůber, M.: Current state of text-to-speech system ARTIC: a decade of research on the field of speech technologies. In: Sojka, P. (ed.) TSD 2018. LNAI, vol. 11107, pp. 369–378. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-00794-2_zCrossRefGoogle Scholar
  15. 15.
    Tihelka, D., Kala, J., Matoušek, J.: Enhancements of Viterbi search for fast unit selection synthesis. In: Proceedings of Interspeech 2010, pp. 174–177. ISCA, Makuhari (2010)Google Scholar
  16. 16.
    Tihelka, D., Matoušek, J.: Unit selection and its relation to symbolic prosody: a new approach. In: Proceedings of Interspeech 2006, vol. 1, pp. 2042–2045. ISCA, Bonn (2006)Google Scholar
  17. 17.
    Volín, J.: Extrakce základní hlasové frekvence a intonační gravitace v češtině. Naše řeč 92(5), 227–239 (2009)Google Scholar
  18. 18.
    Volín, J., Skarnitzl, R.: Temporal downtrends in Czech read speech. In: Proceedings of Interspeech 2007, pp. 442–445. ISCA (2007)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Faculty of Applied Sciences, New Technologies for the Information Society and Department of CyberneticsUniversity of West BohemiaPilsenCzech Republic
  2. 2.Faculty of Arts, Institute of PhoneticsCharles UniversityPragueCzech Republic

Personalised recommendations