Comparison of Phonetic Segmentation Tools for European Portuguese

Figueira, Luís; Oliveira, Luís C.

doi:10.1007/978-3-540-85980-2_32

Luís Figueira¹ &
Luís C. Oliveira¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5190))

Included in the following conference series:

International Conference on Computational Processing of the Portuguese Language

567 Accesses
2 Citations

Abstract

Currently, the majority of the text-to-speech synthesis systems that provide the most natural output are based on the selection and concatenation of variable size speech units chosen from an inventory of recordings. There are many different approaches to perform automatic speech segmentation. The most used are based on (Hidden Markov Models) HMM [1,2,3] or Artificial Neural Networks (ANN) [4], though Dynamic Time Warping (DTW) [3,4,5] based algorithms are also popular. Techniques involving speaker adaptation of acoustic models are usually more precise, but demand larger amounts of training data, which is not always available.

In this work we compare several phonetic segmentation tools, based in different technologies, and study the transition types where each segmentation tool achieves better results. To evaluate the segmentation tools we chose the criterion of the number of phonetic transitions (phone borders) with an error below 20ms when compared to the manual segmentation. This value is of common use in the literature [6] as a majorant of a phone error. Afterwards, we combine the individual segmentation tools, taking advantage of their differentiate behavior accordingly to the phonetic transition type. This approach improves the results obtained with any standalone tool used by itself. Since the goal of this work is the evaluation of fully automatic tools, we did not use any manual segmentation data to train models. The only manual information used during this study was the phonetic sequence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Toledano, D.T., Gómez, L.A., Grande, L.V.: Automatic phonetic segmentation. IEEE Transactions on Speech and Audio Processing 11 (November 2003)
Google Scholar
Huggins-Daines, D., Rudnicky, A.I.: A Constrained Baum-Welch Algorithm for Improved and Efficient Training. In: Proc. Interspeech 2006s-9th International Conference on Spoken Language Processing, Pittsburgh, USA (2006)
Google Scholar
Black, A.W., Kominek, J., Bennett, C.: Evaluating and Correcting Phoneme Segmentation for Unit Selection Synthesis. In: Proc. Eurospeech, Geneva, Switzerland, pp. 313–316 (2003)
Google Scholar
Malfrre, F., Deroo, O., Dutoit, T.: Phonetic alignment: speech synthesis based vs. hybrid HMM/ANN. In: Proc. 5th International Conference on Spoken Language Processing (1998)
Google Scholar
Paulo, S., Oliveira, L.C.: DTW-based Phonetic Alignment Using Multiple Acoustic Features. In: Proc. Eurospeech, Geneva, Switzerland, pp. 309–312 (2003)
Google Scholar
Adell, J., Bonafonte, A.: Toward Phone Segmentation for Concatenative Speech Synthesis. In: Proc. 5th ISCA Workshop on Speech Synthesis (2004)
Google Scholar
Neto, J.P., Martins, C., Meinedo, H., Almeida, L.B.: AUDIMUS — Sistema de Reconhecimento de Fala Contínua para o Português Europeu. In: PROPOR 1999 - IV Encontro para o Processamento Computacional da Língua Portuguesa Escrita e Falada, Évora (1999)
Google Scholar
Meinedo, H., Caseiro, D., Neto, J.P., Trancoso, I.: AUDIMUS.media: a Broadcast News speech recognition system for the European Portuguese language. In: Mamede, N.J., Baptista, J., Trancoso, I., das Graças Volpe Nunes, M. (eds.) PROPOR 2003. LNCS, vol. 2721, pp. 9–17. Springer, Heidelberg (2003)
Chapter Google Scholar
Young, S., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.2). Cambridge University Engineering Department (2002)
Google Scholar
Prahallad, K., Black, A.W., Ravishankar, M.: Sub-phonetic Modeling for Capturing Pronunciation Variations for Conversational Speech Synthesis. In: Proc. ICASSP (2006)
Google Scholar
Black, A.W., Lenzo, K.A.: Building Synthetic Voices, For FestVox, 2.1 edn. Language Technologies Institute, Carnegie Mellon University and Cepstral, LLC (2006), http://www.festvox.org

Download references

Author information

Authors and Affiliations

L2F Spoken Language Systems Lab., INESC-ID/IST, Rua Alves Redol 9, 1000-029, Lisbon, Portugal
Luís Figueira & Luís C. Oliveira

Authors

Luís Figueira
View author publications
You can also search for this author in PubMed Google Scholar
Luís C. Oliveira
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

António Teixeira Vera Lúcia Strube de Lima Luís Caldas de Oliveira Paulo Quaresma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Figueira, L., Oliveira, L.C. (2008). Comparison of Phonetic Segmentation Tools for European Portuguese. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds) Computational Processing of the Portuguese Language. PROPOR 2008. Lecture Notes in Computer Science(), vol 5190. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85980-2_32

Download citation

DOI: https://doi.org/10.1007/978-3-540-85980-2_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85979-6
Online ISBN: 978-3-540-85980-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics