Prosodic Phrase Boundary Classification Based on Czech Speech Corpora

  • Markéta JůzováEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10415)


The correct usage of phrase boundaries is an important issue for ensuring a natural sounding and easily intelligible speech. Therefore, it is not surprising that the boundary detection is also a part of text-to-speech systems. In the presented paper, large speech corpora are used for a classification based approach in order to improve the phrasing of synthesized sentences. The paper compares results of different classifiers to the deterministic approaches based on punctuation and conjunctions and shows that they are able to outperform the simple algorithms.


Phrase boundary Classification Speech corpus Speech synthesis 


  1. 1.
    Grůber, M., Matoušek, J.: Listening-test-based annotation of communicative functions for expressive speech synthesis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 283–290. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-15760-8_36 CrossRefGoogle Scholar
  2. 2.
    Hirschberg, J., Prieto, P.: Training intonational phrasing rules automatically for English and Spanish text-to-speech. Speech Commun. 18(3), 281–290 (1996)CrossRefGoogle Scholar
  3. 3.
    Legát, M., Matoušek, J., Tihelka, D.: A robust multi-phase pitch-mark detection algorithm. In: Proceedings of Interspeech 2007, vol. 1641–1644 (2007)Google Scholar
  4. 4.
    Matoušek, J., Romportl, J.: Automatic pitch-synchronous phonetic segmentation. In: Proceedings of Interspeech 2008, pp. 1626–1629. ISCA, Brisbane (2008)Google Scholar
  5. 5.
    Matoušek, J., Tihelka, D., Romportl, J.: Current state of Czech text-to-speech system ARTIC. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS, vol. 4188, pp. 439–446. Springer, Heidelberg (2006). doi: 10.1007/11846406_55 CrossRefGoogle Scholar
  6. 6.
    Matoušek, J., Romportl, J.: Recording and annotation of speech corpus for czech unit selection speech synthesis. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS, vol. 4629, pp. 326–333. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-74628-7_43 CrossRefGoogle Scholar
  7. 7.
    Oparin, I.: Robust rule-based method for automatic break assignment in Russian texts. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) Text, Speech and Dialogue, pp. 356–363. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  8. 8.
    Palková, Z.: Rytmická výstavba prozaického textu. Studia ČSAV 13/1974, Academia (1974)Google Scholar
  9. 9.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Romportl, J.: Prosodic phrases and semantic accents in speech corpus for Czech TTS synthesis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2008. LNCS, vol. 5246, pp. 493–500. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-87391-4_63 CrossRefGoogle Scholar
  11. 11.
    Romportl, J.: Statistical evaluation of prosodic phrases in the Czech language. In: Proceedings of the Speech Prosody 2008, pp. 755–758. Editora RG/CNPq, Campinas, Brazil (2008)Google Scholar
  12. 12.
    Romportl, J.: Structural data-driven prosody model for TTS synthesis. In: Proceedings of the Speech Prosody 2006, pp. 549–552. TUDpress, Dresden (2006)Google Scholar
  13. 13.
    Romportl, J.: Automatic prosodic phrase annotation in a corpus for speech synthesis. In: Proceedings of Speech Prosody 2010. University of Illionois, Chicago, IL, USA (2010)Google Scholar
  14. 14.
    Romportl, J., Matoušek, J.: Several aspects of machine-driven phrasing in text-to-speech systems. Prague Bull. Math. Linguist. 95, 51–61 (2011)CrossRefGoogle Scholar
  15. 15.
    Sun, X., Applebaum, T.H.: Intonational phrase break prediction using decision tree and n-gram model. In: Proceedings of Eurospeech 2001, pp. 3–7 (2001)Google Scholar
  16. 16.
    Taylor, P.: Text-to-Speech Synthesis, 1st edn. Cambridge University Press, New York (2009)CrossRefGoogle Scholar
  17. 17.
    Taylor, P., Black, A.W.: Assigning phrase breaks from part-of-speech sequences. Comput. Speech Lang. 12(2), 99–117 (1998)CrossRefGoogle Scholar
  18. 18.
    Tihelka, D., Grůber, M., Hanzlíček, Z.: Robust methodology for TTS enhancement evaluation. In: Habernal, I., Matoušek, V. (eds.) TSD 2013. LNCS, vol. 8082, pp. 442–449. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-40585-3_56 Google Scholar
  19. 19.
    Tihelka, D., Matoušek, J.: Unit selection and its relation to symbolic prosody: a new approach. In: Proceedings of Interspeech 2006, vol. 1, pp. 2042–2045. ISCA, Bonn (2006)Google Scholar
  20. 20.
    Žabokrtský, Z., Ptáček, J., Pajas, P.: TectoMT: highly modular MT system with tectogrammatics used as transfer layer. In: Proceedings of StatMT 2008, pp. 167–170. Association for Computational Linguistics (2008)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.New Technologies for the Information Society (NTIS) and Department of Cybernetics, Faculty of Applied SciencesUniversity of West BohemiaPilsenCzech Republic

Personalised recommendations