Skip to main content

Boundary Refining Aiming at Speech Synthesis Applications

  • Conference paper
  • 564 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5190))

Abstract

In concatenative synthesis, speech is produced by joining segments automatically selected among units contained in a previously segmented database. The synthetic speech resulting from such a technique is often improved when accurate segmentation tools are considered. The performance of these tools is often enhanced by a hybrid approach resulting from the association of an HMM modeling with a boundary refining process. Such a refining has been carried out sucessfully by using techniques based on neural networks. This paper presents a set of networks that outperform other topologies discussed in the literature. These networks are trained by performing a clusterization of the training set taking into consideration phonetic transitions with similarities to each other.

This work was partially supported by the Brazilian National Council for Scientific and Technological Development (CNPq), Studies and Projects Funding Body (FINEP), and Dígitro Tecnologia Ltda.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chou, F.-C., Tseng, C.-Y., Lee, L.-S.: An Evaluation of Cost Functions Sensitively Capturing Local Degradation of Naturalness for Segment Selection in Concatenative Speech Synthesis. Speech Communication 48(1), 45–56 (2006)

    Article  Google Scholar 

  2. Hunt, A.J., Black, A.W.: Unit Selection in a Concatenative Speech Synthesis System Using a Large Speech Database. In: ICASSP, pp. 373–376. IEEE Press, Atlanta (1996)

    Google Scholar 

  3. Kawai, H., Toda, H., Ni, J.: Ximera: A New TTS from ATR Based on Corpus-Based Technologies. In: SSW, pp. 179–184. ISCA Press, Pittsburg (2004)

    Google Scholar 

  4. Lee, K.-S.: MLP-Based Phone Boundary Refining for a TTS Database. IEEE Trans. Audio, Speech, Language Processing 14(3), 981–989 (2006)

    Article  Google Scholar 

  5. Rabiner, L.R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE 77(2), 257–286 (1989)

    Article  Google Scholar 

  6. Huang, X., Acero, A., Hon, H.: Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice Hall, Upper Saddle River (2001)

    Google Scholar 

  7. Toledano, D.T.: Neural Network Boundary Refining for Automatic Speech Segmentation. In: ICASSP, pp. 3438–3441. IEEE Press, Istanbul (2000)

    Google Scholar 

  8. Deller Jr., J.R., Hansen, J.H.L., Proakis, J.G.: Discrete-Time Processing of Speech Signals. IEEE Press, New York (2000)

    Google Scholar 

  9. Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.1). Cambridge University, Cambridge (2001)

    Google Scholar 

  10. Athaudage, C.R.N., Lech, M.: On Optimal Modeling of Speech Spectral Transitions. In: ICICS, pp. 1330–1334. IEEE Press, Singapore (2003)

    Google Scholar 

  11. Klabbers, E., Veldhuis, R.: Reducing Audible Spectral Discontinuities. IEEE Trans. Speech Audio Processing 9(1), 39–51 (2001)

    Article  Google Scholar 

  12. Silva, T.C.: Phonetic and Phonology of the Portuguese Language: Study Script and Exercise Guide. Contexto, Sao Paulo (in Portuguese) (1999)

    Google Scholar 

  13. Wang, L., Zhao, Y., Chu, M., Soong, F.K., Zhou, J., Cao, Z.: Context Dependent Boundary Model for Refining Boundaries Segmentation of TTS Units. IEICE Trans. Information and Systems E89-D 3, 1082–1091 (2006)

    Article  Google Scholar 

  14. Molau, S., Pitz, M., Schluter, R., Ney, H.: Computing Mel-Frequency Cepstral Coefficients on the Power Spectrum. In: ICASSP, pp. 73–76. IEEE Press, Salt Lake City (2001)

    Google Scholar 

  15. Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice-Hall, Englewood Cliffs (1998)

    Google Scholar 

  16. Nissen, S., Spilca, A., Zabot, A.: Fast Artificial Neural Networks (FANN), http://leenissen.dk/fann/

Download references

Author information

Authors and Affiliations

Authors

Editor information

António Teixeira Vera Lúcia Strube de Lima Luís Caldas de Oliveira Paulo Quaresma

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nicodem, M.V., Kafka, S.G., Seara, R., Seara, R. (2008). Boundary Refining Aiming at Speech Synthesis Applications. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds) Computational Processing of the Portuguese Language. PROPOR 2008. Lecture Notes in Computer Science(), vol 5190. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85980-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85980-2_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85979-6

  • Online ISBN: 978-3-540-85980-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics