Language-Independent Acoustic Cloning of HTS Voices: An Objective Evaluation

Magariños, Carmen; Erro, Daniel; Lopez-Otero, Paula; Banga, Eduardo R.

doi:10.1007/978-3-319-49169-1_6

Carmen Magariños²¹,
Daniel Erro²²,
Paula Lopez-Otero²¹ &
…
Eduardo R. Banga²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10077))

Included in the following conference series:

International Conference on Advances in Speech and Language Technologies for Iberian Languages

716 Accesses

Abstract

In a previous work we presented a method to combine the acoustic characteristics of a speech synthesis model with the linguistic characteristics of another one. This paper presents a more extensive evaluation of the method when applied to cross-lingual adaptation. A large number of voices from a database in Spanish are adapted to Basque, Catalan, English and Galician. Using a state-of-the-art speaker identification system, we show that the proposed method captures the identity of the target speakers almost as well as standard intra-lingual adaptation techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Commun. 51(11), 1039–1064 (2009)
Article Google Scholar
Yamagishi, J.: Average-voice-based speech synthesis. Ph.d. dissertation, Tokyo Institute of Technology, Yokohama, Japan (2006)
Google Scholar
Yamagishi, J., Nose, T., Zen, H., Ling, Z.H., Toda, T., Tokuda, K., King, S., Renals, S.: Robust speaker-adaptive HMM-based text-to-speech synthesis. IEEE Trans. Audio Speech Lang. Process. 17(6), 1208–1230 (2009)
Article Google Scholar
Latorre, J., Iwano, K., Furui, S.: New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer. Speech Commun. 48, 1227–1242 (2006)
Article Google Scholar
Wu, Y.J., Nankaku, Y., Tokuda, K.: State mapping based method for cross-lingual speaker adaptation in HMM-based speech synthesis. In: Proceedings of Interspeech, pp. 528–531 (2009)
Google Scholar
Oura, K., Yamagishi, J., Wester, M., King, S., Tokuda, K.: Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping. Speech Commun. 54, 703–714 (2012)
Article Google Scholar
Dines, J., Liang, H., Saheer, L., Gibson, M., Byrne, W., Oura, K., Tokuda, K., Yamagishi, J., King, S., Wester, M., Hirsimki, T., Karhila, R., Kurimo, M.: Personalising speech-to-speech translation: unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis. Comput. Speech Lang. 27, 420–437 (2013)
Article Google Scholar
Zen, H., Braunschweiler, N., Buchholz, S., Gales, M., Knill, K., Krstulovic, S., Latorre, J.: Statistical parametric speech synthesis based on speaker and language factorization. IEEE Trans. Audio Speech Lang. Process. 20(6), 1713–1724 (2012)
Article Google Scholar
Magariños, C., Erro, D., Banga, E.R.: Language-independent acoustic cloning of HTS voices: a preliminary study. In: Proceedings of ICASSP, pp. 5615–5619 (2016)
Google Scholar
Zen, H., Nose, T., Yamagishi, J., Sako, S., Masuko, T., Black, A.W., Tokuda, K.: The HMM-based speech synthesis system (HTS) version 2.0. In: Proceedings of 6th ISCA Speech Synthesis Workshop, pp. 294–299. ISCA (2007)
Google Scholar
Erro, D., Moreno, A., Bonafonte, A.: INCA algorithm for training voice conversion systems from nonparallel corpora. IEEE Trans. Audio Speech Lang. Process. 18(5), 944–953 (2010)
Article Google Scholar
Agiomyrgiannakis, Y.: The matching-minimization algorithm, the INCA algorithm and a mathematical framework for voice conversion with unaligned corpora. In: Proceedings of ICASSP, Shanghai, pp. 5645–5649 (2016)
Google Scholar
Hansen, J., Hasan, T.: Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process. Mag. 32, 74–99 (2015)
Article Google Scholar
Cumani, S., Brümmer, N., Burget, L., Laface, P.: Fast discriminative speaker verification in the i-vector space. In: Proceedings of ICASSP, pp. 4852–4855 (2011)
Google Scholar
Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19, 788–798 (2011)
Article Google Scholar
Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Mariño, J.B., Nadeu, C.: Albayzin speech database: design of the phonetic corpus. In: EUROSPEECH (1993)
Google Scholar
Sainz, I., Erro, D., Navas, E., Hernáez, I., Sánchez, J., Saratxaga, I., Odriozola, I., Luengo, I.: Aholab speech synthesizers for albayzin2010. In: Proceedings of FALA 2010, pp. 343–348 (2010)
Google Scholar
Bonafonte, A., Aguilar, L., Esquerra, I., Oller, S., Moreno, A.: Recent work on the FESTCAT database for speech synthesis. In: Proceedings of the I Iberian SLTech, pp. 131–132 (2009)
Google Scholar
Taylor, P., Black, A.W., Caley, R.: The architecture of the festival speech synthesis system. In: Proceedings of the ESCA Workshop in Speech Synthesis, pp. 141–151 (1998)
Google Scholar
Rodríguez-Banga, E., García-Mateo, C., Méndez-Pazó, F., González-González, M., Magariños, C.: Cotovía: an open source TTS for Galician and Spanish. In: Proceedings of IberSPEECH, pp. 308–315. RTTH and SIG-IL (2012)
Google Scholar
Erro, D., Sainz, I., Navas, E., Hernáez, I.: Harmonics plus noise model based vocoder for statistical parametric speech synthesis. IEEE J. Sel. Top. Signal Process. 8(2), 184–194 (2014)
Article Google Scholar
Ortega-Garcia, J., Fierrez, J., Alonso-Fernandez, F., Galbally, J., Freire, M.R., Gonzalez-Rodriguez, J., Garcia-Mateo, C., Alba-Castro, J.L., Gonzalez-Agulla, E., Otero-Muras, E., Garcia-Salicetti, S., Allano, L., Ly-Van, B., Dorizzi, B., Kittler, J., Bourlai, T., Poh, N., Deravi, F., Ng, M.W.R., Fairhurst, M., Hennebert, J., Humm, A., Tistarelli, M., Brodo, L., Richiardi, J., Drygajlo, A., Ganster, H., Sukno, F., Pavani, S.K., Frangi, A., Akarun, L., Savran, A.: The multi-scenario multi-environment BioSecure multimodal database (BMDB). IEEE Trans. Pattern Anal. Mach. Intell. 32(4), 1097–1111 (2009)
Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: IEEE Workshop on Automatic Speech Recognition and Understanding (2011)
Google Scholar

Download references

Acknowledgments

This research was funded by the Spanish Government (project TEC2015-65345-P and BES-2013-063708), the Galician Government through the research contract GRC2014/024 (Modalidade: Grupos de Referencia Competitiva 2014) and ‘AtlantTIC’ CN2012/160, the European Regional Development Fund (ERDF) and the COST Action IC1206.

Author information

Authors and Affiliations

Multimedia Technology Group (GTM), AtlantTIC, University of Vigo, Vigo, Spain
Carmen Magariños, Paula Lopez-Otero & Eduardo R. Banga
IKERBASQUE – Aholab, University of the Basque Country, Bilbao, Spain
Daniel Erro

Authors

Carmen Magariños
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Erro
View author publications
You can also search for this author in PubMed Google Scholar
Paula Lopez-Otero
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo R. Banga
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carmen Magariños .

Editor information

Editors and Affiliations

INESC-ID/IST, Universidade de Lisboa, Lisbon, Portugal
Alberto Abad
I3A/University of Zaragoza, Zaragoza, Spain
Alfonso Ortega
DETI/IEETA, University of Aveiro, Aveiro, Portugal
António Teixeira
AtlantTIC Research Center, Universidad de Vigo, Vigo, Spain
Carmen García Mateo
Universitat Politècnica de València, Valencia, Spain
Carlos D. Martínez Hinarejos
University of Coimbra, Coimbra, Portugal
Fernando Perdigão
INESC-ID/ISCTE-IUL, Lisbon, Portugal
Fernando Batista
INESC-ID/IST, Universidade de Lisboa, Lisbon, Portugal
Nuno Mamede

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Magariños, C., Erro, D., Lopez-Otero, P., Banga, E.R. (2016). Language-Independent Acoustic Cloning of HTS Voices: An Objective Evaluation. In: Abad, A., et al. Advances in Speech and Language Technologies for Iberian Languages. IberSPEECH 2016. Lecture Notes in Computer Science(), vol 10077. Springer, Cham. https://doi.org/10.1007/978-3-319-49169-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-49169-1_6
Published: 04 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49168-4
Online ISBN: 978-3-319-49169-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics