Skip to main content

Pitch-Asynchronous Overlap-Add Waveform-Concatenation Speech Synthesis by Optimizing Phase Spectrum in Frequency Domain

  • Chapter
Complex-Valued Neural Networks

Part of the book series: Studies in Computational Intelligence ((SCI,volume 400))

  • 1965 Accesses

Abstract

To obtain high-quality results in speech synthesis, we should record utterance elements, which is to be concatenated, as temporally long as possible to avoid a sense of discomfort in listeners. In the case of announcements in trains, for example, we prepare word- or segment-long utterances, and concatenate them to generate simple sentences. However, when we try to synthesize free sentences required in daily life with this method, we need such a huge database that we cannot construct it by recording real utterances. Instead, we may be able to synthesize speech by sampling short elements of utterance, and memorize their short-time spectra to be concatenated. In practice, however, it is very difficult to tune the way of concatenation since, to yield reasonable speech, it is crucial to reproduce the features in waveforms such as pulse sharpness. In this chapter, we present a complex-valued neural network that adjusts phase values in frequency spectra adaptively to realize an ideal concatenation. The network functions in the frequency domain to obtain desired waveforms in the time domain. Phase shift in the frequency domain corresponds to temporal shift in the time domain. Such frequency-domain processing using complex-valued neural networks is useful in various fields such as image processing where we deal with spatial frequency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akira Hirose .

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Hirose, A. (2012). Pitch-Asynchronous Overlap-Add Waveform-Concatenation Speech Synthesis by Optimizing Phase Spectrum in Frequency Domain. In: Complex-Valued Neural Networks. Studies in Computational Intelligence, vol 400. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27632-3_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-27632-3_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-27631-6

  • Online ISBN: 978-3-642-27632-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics