Using Random Forests for Prosodic Break Prediction Based on Automatic Speech Labeling

Khomitsevich, Olga; Chistikov, Pavel; Zakharov, Dmitriy

doi:10.1007/978-3-319-11581-8_58

Olga Khomitsevich^22,23,
Pavel Chistikov^22,23 &
Dmitriy Zakharov²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8773))

Included in the following conference series:

International Conference on Speech and Computer

1308 Accesses
2 Citations

Abstract

In this paper we present a system for automatically predicting prosodic breaks in synthesized speech using the Random Forests classifier. In our experiments the classifier is trained on a large dataset consisting of audiobooks, which is automatically labeled with phone, word, and pause labels. To provide part of speech (POS) tags in the text, a rule-based POS tagger is used. We use crossvalidation in order to be able to examine not only the results for a specific subset of data but also the systems reliability across the dataset. The experimental results demonstrate that the system shows good and consistent results on the audiobook database; the results are poorer and less robust on a smaller database of read speech even though part of that database was labeled manually.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Improving Prosodic Break Detection in a Russian TTS System

Automatic Detection of Prosodic Boundaries in Brazilian Portuguese Spontaneous Speech

Vocal-based emotion recognition using random forests and decision tree

Article 09 February 2017

References

Atterer M.: Assigning Prosodic Structure for Speech Synthesis: A Rule-based Approach. In: Speech Prosody 2002, pp. 147–150 (2002)
Google Scholar
Khomitsevich, O., Solomennik, M.: Automatic pause placement in a Russian TTS system. In: Computational Linguistics and Intellectual Technologies, vol. 9, pp. 531–537. RGGU, Moscow (2010) (in Russian)
Google Scholar
Black, A.W., Taylor, P.: Assigning phrase breaks from part-of-speech sequences. Computer Speech & Language 12(2), 99–117 (1998)
Article Google Scholar
Busser B., Daelemans W., Bosch A.V.D.: Predicting phrase breaks with memory-based learning. In: 4th ISCA Tutorial and Research Workshop on Speech Synthesis, pp. 29–34 (2001)
Google Scholar
Parlikar A., Black A.W.: Modeling Pause-Duration for Style-Specific Speech Synthesis. In: Interspeech 2012, pp. 446–449 (2012)
Google Scholar
Parlikar A., Black A.W.: Minimum Error Rate Training for Phrasing in Speech Synthesis. In: 8th ISCA Speech Synthesis Workshop, pp. 13–17 (2013)
Google Scholar
Breiman L., Cutler A.: Random Forests, http://www.stat.berkeley.edu/breiman/RandomForests/cc_home.htm
Chistikov, P., Khomitsevich, O.: Improving prosodic break detection in a Russian TTS system. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 181–188. Springer, Heidelberg (2013)
Chapter Google Scholar
Caruana, R., Niculescu-Mizil, A.: An Empirical Comparison of Supervised Learning Algorithms Using Different Performance Metrics. In: 23rd International Conference on Machine Learning, pp. 161–168 (2006)
Google Scholar
Giménez, J., Márquez, L.: Svmtool: A general pos tagger generator based on support vector machines. In: 4th International Conference on Language Resources and Evaluation, pp. 43–46 (2004)
Google Scholar
Manning, C.D.: Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? In: Gelbukh, A.F. (ed.) CICLing 2011, Part I. LNCS, vol. 6608, pp. 171–189. Springer, Heidelberg (2011)
Chapter Google Scholar
Sun, M.: Bellegarda J.R.: Improved pos tagging for text-to-speech synthesis. In: IEEE International Conference ICASSP 2011, pp. 5384–5387 (2011)
Google Scholar
Ide N., Suderman K.: The American National Corpus First Release. In: 4th International Conference on Language Resources and Evaluation, pp. 1681–1684 (2004)
Google Scholar
King S., Karaiskos V.: The Blizzard Challenge 2013. In: Blizzard Challenge 2013 Workshop (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

National Research University of Information Technologies, Mechanics and Optics, 49 Kronverkskiy pr., Saint-Petersburg, Russia, 197101
Olga Khomitsevich & Pavel Chistikov
Speech Technology Center Ltd., 4 Krasutskogo st., Saint-Petersburg, Russia, 196084
Olga Khomitsevich, Pavel Chistikov & Dmitriy Zakharov

Authors

Olga Khomitsevich
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Chistikov
View author publications
You can also search for this author in PubMed Google Scholar
Dmitriy Zakharov
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Speech and Multimodal Interfaces Laboratory, St. Petersburg Institute of Informatics and Automation of the Russian Academy of Sciences, 39, 14th line, 199178, St. Petersburg, Russia
Andrey Ronzhin
Institute of Applied and Mathematical Linguistics, Moscow State Linguistic University, 38, Ostozhenka, 119034, Moscow, Russia
Rodmonga Potapova
Faculty of Technical Sciences, University of Novi Sad, 6, Trg Dositeja Obradovića, 21000, Novi Sad, Serbia
Vlado Delic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khomitsevich, O., Chistikov, P., Zakharov, D. (2014). Using Random Forests for Prosodic Break Prediction Based on Automatic Speech Labeling. In: Ronzhin, A., Potapova, R., Delic, V. (eds) Speech and Computer. SPECOM 2014. Lecture Notes in Computer Science(), vol 8773. Springer, Cham. https://doi.org/10.1007/978-3-319-11581-8_58

Download citation

DOI: https://doi.org/10.1007/978-3-319-11581-8_58
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11580-1
Online ISBN: 978-3-319-11581-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Using Random Forests for Prosodic Break Prediction Based on Automatic Speech Labeling

Abstract

Access this chapter

Preview

Similar content being viewed by others

Improving Prosodic Break Detection in a Russian TTS System

Automatic Detection of Prosodic Boundaries in Brazilian Portuguese Spontaneous Speech

Vocal-based emotion recognition using random forests and decision tree

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Using Random Forests for Prosodic Break Prediction Based on Automatic Speech Labeling

Abstract

Access this chapter

Preview

Similar content being viewed by others

Improving Prosodic Break Detection in a Russian TTS System

Automatic Detection of Prosodic Boundaries in Brazilian Portuguese Spontaneous Speech

Vocal-based emotion recognition using random forests and decision tree

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation