Abstract
We propose “tailor-made speech synthesis,” the speech synthesis technique which enables users to control the synthetic speech naturally and intuitively. As a first step to realizing tailor-made speech synthesis, we introduce F0 context into speaker model training of speech synthesis based on deep neural networks (DNNs). F0 context represents relative log F0 at the mora or the accent-phrase level of training data. It allows users to control the F0 of synthetic speech steplessly on the contrary to conventional F0 context in HMM-based technique. Experiments showed that F0 context was effective to control the F0 because the F0 of synthetic voice followed the value of F0 context.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Apple Inc.: iOS - Siri - Apple, http://www.apple.com/ios/siri/
Google Inc.: Google Now, https://www.google.com/search/about/learn-more/now/
Kawahara, H., Masuda-Katsuse, I., de Cheveigné, A.: Restructuring speech representations using a pitch-adaptive time–frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds. Speech Communication 27(3–4), 187–207 (1999)
Kingma, D., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Maeno, Y., Nose, T., Kobayashi, T., Koriyama, T., Ijima, Y., Nakajima, H., Mizuno, H., Yoshioka, O.: Prosodic variation enhancement using unsupervised context labeling for HMM-based expressive speech synthesis. Speech Communication 57, 144–154 (2014)
Nishigaki, Y., Takamichi, S., Toda, T., Neubig, G., Sakti, S., Nakamura, S.: Prosody-controllable HMM-based speech synthesis using speech input. In: Proc. MLSLP (2015)
Nose, T., Yamagishi, J., Masuko, T., Kobayashi, T.: A style control technique for HMM-based expressive speech synthesis. IEICE Trans. Inf. & Syst. E90-D(9), 1406–1413 (2007)
Watts, O., Wu, Z., King, S.: Sentence-level control vectors for deep neural network speech synthesis. In: Proc. Interspeech. pp. 2217–2221 (2015)
Zen, H., Senior, A., Schuster, M.: Statistical parametric speech synthesis using deep neural networks. In: Proc. ICASSP. pp. 7962–7966 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Yamada, S., Nose, T., Ito, A. (2017). A Study on Tailor-Made Speech Synthesis Based on Deep Neural Networks. In: Pan, JS., Tsai, PW., Huang, HC. (eds) Advances in Intelligent Information Hiding and Multimedia Signal Processing. Smart Innovation, Systems and Technologies, vol 63. Springer, Cham. https://doi.org/10.1007/978-3-319-50209-0_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-50209-0_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50208-3
Online ISBN: 978-3-319-50209-0
eBook Packages: EngineeringEngineering (R0)