A Comparison of Two Approaches to Bilingual HMM-Based Speech Synthesis

Pobar, Miran; Justin, Tadej; Žibert, Janez; Mihelič, France; Ipšić, Ivo

doi:10.1007/978-3-642-40585-3_7

A Comparison of Two Approaches to Bilingual HMM-Based Speech Synthesis

Miran Pobar²⁰,
Tadej Justin²¹,
Janez Žibert²²,
France Mihelič²¹ &
…
Ivo Ipšić²⁰

Conference paper

2404 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8082))

Abstract

We compare the performance of two approaches when using cross-lingual data from different speakers to build bilingual speech synthesis systems capable of producing speech with the same speaker identity. One approach treats data from both languages as monolingual, by labeling all data with a manually joined phoneme set. Speaker independent voice is trained using the joined data, and adapted to the target speaker using the CMLLR adaptation.

In the second approach, speaker independent voices are trained for each language separately. State mapping between these voices is derived automatically from minimum Kullback–Leibler divergence between state distributions. The mapping is used to apply the adaptation transformations calculated within one language across languages to the other speaker independent voice.

We evaluate the quality of speech on MOS scale and similarity of synthesized speech characteristics to the target speaker using DMOS on the example of Croatian-Slovene language pair.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Traber, C., Huber, K., Nedir, K., Pfister, B., Keller, E., Zellner, B.: From multilingual to polyglot speech synthesis. In: Proc. of the Eurospeech, vol. 99, pp. 835–838 (1999)
Google Scholar
Justin, T., Pobar, M., Ipšić, I., Mihelič, F., Žibert, J.: A bilingual HMM-based speech synthesis system for closely related languages. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 543–550. Springer, Heidelberg (2012)
Chapter Google Scholar
Wu, Y.J., Nankaku, Y., Tokuda, K.: State mapping based method for cross-lingual speaker adaptation in hmm-based speech synthesis. In: Proc. of Interspeech, pp. 528–531 (2009)
Google Scholar
Yamagishi, J., Masuko, T., Tokuda, K., Kobayashi, T.: A training method for average voice model based on shared decision tree context clustering and speaker adaptive training. In: Proceedings of ICASSP 2003, vol. 1, I–716–I–719 (2003)
Google Scholar
Liang, H., Qian, Y., Soong, F.K., Liu, G.: A cross-language state mapping approach to bilingual (mandarin-english) tts. In: ICASSP 2008, pp. 4641–4644. IEEE (2008)
Google Scholar
Martincic-Ipsic, S., Ipsic, I.: Veprad: a croatian speech database of weather forecasts. In: Information Technology Interfaces, ITI 2003, pp. 321–326 (2003)
Google Scholar
Žibert, J., Mihelič, F.: Slovenian weather forecast speech database. In: Proc, Softcom, vol. 1, pp. 199–206 (October 2000)
Google Scholar
Zen, H., Nose, T., Yamagishi, J., Sako, S., Masuko, T., Black, A.W., Tokuda, K.: The hmm-based speech synthesis system (hts) version 2.0. In: Proc. of Sixth ISCA Workshop on Speech Synthesis, pp. 294–299 (2007)
Google Scholar
Imai, S., Sumita, K., Furuichi, C.: Mel log spectrum approximation (MLSA) filter for speech synthesis. Electronics and Communications in Japan (Part I: Communications) 66(2), 10–18 (1983)
Article Google Scholar
Wells, J.C.: SAMPA computer readable phonetic alphabet. In: Handbook of Standards and Resources for Spoken Language Systems. Walter de Gruyter, Berlin (1997)
Google Scholar
Yamagishi, J., Ogata, K., Nakano, Y., Isogai, J., Kobayashi, T.: Hsmm-based model adaptation algorithms for average-voice-based speech synthesis. In: ICASSP 2006 Proceedings, vol. 1, p. 1 (2006)
Google Scholar
Latorre, J., Iwano, K., Furui, S.: New approach to the polyglot speech generation by means of an hmm-based speaker adaptable synthesizer. Speech Communication 48(10), 1227–1242 (2006)
Article Google Scholar
International Telecommunication Union: ITU-T Recommendation P.800.1: Mean Opinion Score (MOS) terminology. Technical report (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, University of Rijeka, Radmile Matejčić 2, 51000, Rijeka, Croatia
Miran Pobar & Ivo Ipšić
Faculty of Electrical Engineering, University of Ljubljana, Tržaška 25, 1000, Ljubljana, Slovenia
Tadej Justin & France Mihelič
Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška 8, 6000, Koper, Slovenia
Janez Žibert

Authors

Miran Pobar
View author publications
You can also search for this author in PubMed Google Scholar
Tadej Justin
View author publications
You can also search for this author in PubMed Google Scholar
Janez Žibert
View author publications
You can also search for this author in PubMed Google Scholar
France Mihelič
View author publications
You can also search for this author in PubMed Google Scholar
Ivo Ipšić
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of West Bohemia, 306 14, Pilsen, Czech Republic
Ivan Habernal & Václav Matoušek &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pobar, M., Justin, T., Žibert, J., Mihelič, F., Ipšić, I. (2013). A Comparison of Two Approaches to Bilingual HMM-Based Speech Synthesis. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-40585-3_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40584-6
Online ISBN: 978-3-642-40585-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics