Unit Selection for Speech Synthesis Based on Acoustic Criteria

Rouibia, Soufiane; Rosec, Olivier; Moudenc, Thierry

doi:10.1007/11551874_36

Soufiane Rouibia¹⁹,
Olivier Rosec¹⁹ &
Thierry Moudenc¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3658))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

690 Accesses
1 Citations

Abstract

This paper presents a new approach to unit selection for corpus-based speech synthesis, in which the units are selected according to acoustic criteria. In a training stage, an acoustic clustering is carried out using context dependent HMMs. In the synthesis stage, an acoustic target is generated and divided into segments corresponding to the required unit sequence. Then, the acoustic unit sequence that best matches the target is selected. Tests are carried out which show the relevance of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Black, A.W., Campbell, N.: Optimising Selection of Units from Speech Database for Concatenative Synthesis. In: Proc. Eurospeech, Madrid, pp. 581–584 (1995)
Google Scholar
Blouin, C., Rosec, O., Bagshaw, P.C., d’Alessandro, C.: Concatenation Cost Calculation and Optimization for Unit Selection in TTS. In: IEEEWorkshop on Speech Synthesis, SantaMonica CA, USA (2002)
Google Scholar
Campbell, N., Isard, S.D.: Segment Durations in a Syllable Frame. Journal of Phonetics, 19 Special issue on Speech Synthesis, 37–47 (1991)
Google Scholar
De Tournemire, S.: Identification et Génération Automatique de Contours Prosodiques pour la Synthése Vocale à Partir du Texte en Franca̧is. PhD. Thesis, Ecole Nationale Supérieure des Télécommunication, Paris (1998)
Google Scholar
Donovan, R.E.: Trainable Speech synthesis. PhD. Thesis, Cambridge University Engineering Department (1996)
Google Scholar
Donovan, R.E., et al.: Current Status of the IBM Trainable Speech Synthesis System. In: Proc. 4th ESCA Tutorial and Research Workshop on Speech Synthesis, Scotland, UK (2001)
Google Scholar
Donovan, R.E.: A new distance measure for costing spectral discontinuities in concatenative speech synthesisers. In: The 4th ISCA Tutorial and Research Workshop on Speech Synthesis (2001)
Google Scholar
Eide, E., Aron, A., Bakis, R., Cohen, P., Donovan, R., Hamza, W., Mathes, T., Picheny, M., Smith, M., Viswanathan, M.: Recent Improvements to the IBM Trainable Speech Synthesis System. In: Proc ICASSP, Hong Kong, China (2003)
Google Scholar
Huang, X., Acero, A., Ju, Y., Liu, J., Meredith, S., Plumpe, M.: Recent Improvements on Microsoft’s Trainable Text-To-Speech System - Whistler. In: Proc. ICASSP, Munich, Germany, pp. 959–962 (1997)
Google Scholar
http://htk.eng.cam.ac.uk
Moulines, E., Charpentier, F.: Pitch-SynchronousWaveform Processing Techniques for Textto- Speech Synthesis Using. Speech Communication 9, 453–467 (1990)
Article Google Scholar
Odell, J.J.: The Use of Context in Large Vocabulary Speech Recognition. PhD. Thesis, Queen’s College (March 1995)
Google Scholar
Sakoe, H., Chiba, S.: A Dynamic Programming Algorithm Optimization for Spoken Word Recognition. IEEE Transactions on Acoustics, Speech and Signal Processing ASSP–26(1), 43–49 (1978)
Article Google Scholar
Pierrehumbert, J.: The Phonology and Phonetics of English Intonation. PhD. Thesis, MIT, Boston (1980)
Google Scholar
Toda, T., Kawai, H., Tsuzaki, M.: Optimizing Sub-Cost Functions for Segment Selection Based on Perceptual Evaluations in Concatenative Speech Synthesis. In: Proc. ICASSP, Montreal, Quebec, Canada, pp. 657–660 (2004)
Google Scholar
Tokuda, K., Masuko, T., Yamada, T., Kobayashi, T., Imai, S.: An Algorithm for Speech Parameters Generation from Continuous Mixture HMMs with Dynamic Features. In: Proc. Eurospeech, pp. 757–760 (1995)
Google Scholar
Tokuda, K., Zen, H., Black, A.: An HMM-based Speech Synthesis Applied to English. In: Proc. of IEEEWorkshop on Speech Synthesis, Santa Monica (September 2002)
Google Scholar

Download references

Author information

Authors and Affiliations

France Telecom, R&D Division, 2 avenue Pierre Marzin, 22307, Lannion Cedex, France
Soufiane Rouibia, Olivier Rosec & Thierry Moudenc

Authors

Soufiane Rouibia
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Rosec
View author publications
You can also search for this author in PubMed Google Scholar
Thierry Moudenc
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of West Bohemia in Pilsen, Univerzitni 8, 30614, Plzen, Czech Republic
Václav Matoušek , Pavel Mautner & Tomáš Pavelka , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rouibia, S., Rosec, O., Moudenc, T. (2005). Unit Selection for Speech Synthesis Based on Acoustic Criteria. In: Matoušek, V., Mautner, P., Pavelka, T. (eds) Text, Speech and Dialogue. TSD 2005. Lecture Notes in Computer Science(), vol 3658. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551874_36

Download citation

DOI: https://doi.org/10.1007/11551874_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28789-6
Online ISBN: 978-3-540-31817-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics