Pitch-Asynchronous Overlap-Add Waveform-Concatenation Speech Synthesis by Optimizing Phase Spectrum in Frequency Domain

Hirose, Akira

doi:10.1007/978-3-642-27632-3_11

Akira Hirose²

Part of the book series: Studies in Computational Intelligence ((SCI,volume 400))

1965 Accesses

Abstract

To obtain high-quality results in speech synthesis, we should record utterance elements, which is to be concatenated, as temporally long as possible to avoid a sense of discomfort in listeners. In the case of announcements in trains, for example, we prepare word- or segment-long utterances, and concatenate them to generate simple sentences. However, when we try to synthesize free sentences required in daily life with this method, we need such a huge database that we cannot construct it by recording real utterances. Instead, we may be able to synthesize speech by sampling short elements of utterance, and memorize their short-time spectra to be concatenated. In practice, however, it is very difficult to tune the way of concatenation since, to yield reasonable speech, it is crucial to reproduce the features in waveforms such as pulse sharpness. In this chapter, we present a complex-valued neural network that adjusts phase values in frequency spectra adaptively to realize an ideal concatenation. The network functions in the frequency domain to obtain desired waveforms in the time domain. Phase shift in the frequency domain corresponds to temporal shift in the time domain. Such frequency-domain processing using complex-valued neural networks is useful in various fields such as image processing where we deal with spatial frequency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Author information

Authors and Affiliations

Department of Electrical Engineering, The University of Tokyo, Hongo 7-3-1, 113-8656, Tokyo, Japan
Akira Hirose

Authors

Akira Hirose
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Akira Hirose .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hirose, A. (2012). Pitch-Asynchronous Overlap-Add Waveform-Concatenation Speech Synthesis by Optimizing Phase Spectrum in Frequency Domain. In: Complex-Valued Neural Networks. Studies in Computational Intelligence, vol 400. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27632-3_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-27632-3_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27631-6
Online ISBN: 978-3-642-27632-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics