Abstract
We compare the performance of two approaches when using cross-lingual data from different speakers to build bilingual speech synthesis systems capable of producing speech with the same speaker identity. One approach treats data from both languages as monolingual, by labeling all data with a manually joined phoneme set. Speaker independent voice is trained using the joined data, and adapted to the target speaker using the CMLLR adaptation.
In the second approach, speaker independent voices are trained for each language separately. State mapping between these voices is derived automatically from minimum Kullback–Leibler divergence between state distributions. The mapping is used to apply the adaptation transformations calculated within one language across languages to the other speaker independent voice.
We evaluate the quality of speech on MOS scale and similarity of synthesized speech characteristics to the target speaker using DMOS on the example of Croatian-Slovene language pair.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Traber, C., Huber, K., Nedir, K., Pfister, B., Keller, E., Zellner, B.: From multilingual to polyglot speech synthesis. In: Proc. of the Eurospeech, vol. 99, pp. 835–838 (1999)
Justin, T., Pobar, M., Ipšić, I., Mihelič, F., Žibert, J.: A bilingual HMM-based speech synthesis system for closely related languages. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 543–550. Springer, Heidelberg (2012)
Wu, Y.J., Nankaku, Y., Tokuda, K.: State mapping based method for cross-lingual speaker adaptation in hmm-based speech synthesis. In: Proc. of Interspeech, pp. 528–531 (2009)
Yamagishi, J., Masuko, T., Tokuda, K., Kobayashi, T.: A training method for average voice model based on shared decision tree context clustering and speaker adaptive training. In: Proceedings of ICASSP 2003, vol. 1, I–716–I–719 (2003)
Liang, H., Qian, Y., Soong, F.K., Liu, G.: A cross-language state mapping approach to bilingual (mandarin-english) tts. In: ICASSP 2008, pp. 4641–4644. IEEE (2008)
Martincic-Ipsic, S., Ipsic, I.: Veprad: a croatian speech database of weather forecasts. In: Information Technology Interfaces, ITI 2003, pp. 321–326 (2003)
Žibert, J., Mihelič, F.: Slovenian weather forecast speech database. In: Proc, Softcom, vol. 1, pp. 199–206 (October 2000)
Zen, H., Nose, T., Yamagishi, J., Sako, S., Masuko, T., Black, A.W., Tokuda, K.: The hmm-based speech synthesis system (hts) version 2.0. In: Proc. of Sixth ISCA Workshop on Speech Synthesis, pp. 294–299 (2007)
Imai, S., Sumita, K., Furuichi, C.: Mel log spectrum approximation (MLSA) filter for speech synthesis. Electronics and Communications in Japan (Part I: Communications) 66(2), 10–18 (1983)
Wells, J.C.: SAMPA computer readable phonetic alphabet. In: Handbook of Standards and Resources for Spoken Language Systems. Walter de Gruyter, Berlin (1997)
Yamagishi, J., Ogata, K., Nakano, Y., Isogai, J., Kobayashi, T.: Hsmm-based model adaptation algorithms for average-voice-based speech synthesis. In: ICASSP 2006 Proceedings, vol. 1, p. 1 (2006)
Latorre, J., Iwano, K., Furui, S.: New approach to the polyglot speech generation by means of an hmm-based speaker adaptable synthesizer. Speech Communication 48(10), 1227–1242 (2006)
International Telecommunication Union: ITU-T Recommendation P.800.1: Mean Opinion Score (MOS) terminology. Technical report (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pobar, M., Justin, T., Žibert, J., Mihelič, F., Ipšić, I. (2013). A Comparison of Two Approaches to Bilingual HMM-Based Speech Synthesis. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-40585-3_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40584-6
Online ISBN: 978-3-642-40585-3
eBook Packages: Computer ScienceComputer Science (R0)