Improvement of time alignment of the speech signals to be used in voice conversion

Mozaffari, Fatemeh; Sayadian, Abolghasem

doi:10.1007/s10772-018-9490-0

Improvement of time alignment of the speech signals to be used in voice conversion

Published: 15 January 2018

Volume 21, pages 79–84, (2018)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

186 Accesses
1 Citation
Explore all metrics

Abstract

One of the main applications of time alignment is parallel corpus based voice conversion. In the literature, various methods such as dynamic time warping (DTW) and hidden Markov model have been suggested for time alignment of two speech signals. In this paper, we introduce some modifications to DTW in order to decrease the time alignment error. These modifications are refinement, which is done by exerting a threshold, normalization, and comparisons between the preceding and the following frames to make sound correspondence between two different parallel corpus-based speakers’ speeches. Evaluation of this approach which has been done on some corpus sentences indicates a significant improvement of time alignment. At least about 4% and in some cases 15% decrease of error in comparison with DTW has been achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Arslan, L. M., & Talkin, D. (1998). Speaker transformation using sentence HMM based alignments and detailed prosody modification. ICASSP.
Dengï, Y., & Byrne, W. (2008). HMM word and phrase alignment for statistical machine translation. IEEE Transactions on Audio, Speech and Language Processing, 16, 494–507.
Article Google Scholar
Homayounpour, M. (2009) Text to speech conversion. Tehran: Amirkabir University of Technology.
Google Scholar
Latsch, V. L., & Sergio, L. N. (2011). Pitch-synchronous time alignment of speech signals for prosody transplantation. IEEE international symposium on circuits and systems (ISCAS).
Rabiner, L., & Juang, B. H. (1993). Fundamentals of Speech Recognition. Upper Saddle: Prentice Hall.
MATH Google Scholar
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech. Proceedings of the IEEE.
Sayadian, A., & Mozaffari, F. (2017). A novel method for voice conversion based on non-parallel corpus. International Journal of Speech Technology. https://doi.org/10.1007/s10772-017-9430-4
Google Scholar
Seara, R., et al. (2016). Enhanced CORILGA: introducing the automatic phonetic alignment tool for continuous speech. LREC.
Stainhaouer, G. N., & Carayannis, G. (1990). New parallel implementations for DTW algorithms. IEEE Transactions on Acoustics Speech Signal Processing, 38, 4.
Article Google Scholar
Tinati, M., & Farhid, M. (2007) A novel method for improvement of the quality of voice conversion systems. 13th national computer engineering conference of Iran.
Torkkola, K. (1988). Automatic alignment of speech with phonetic transcriptions in real time. Proceedings of IEEE.
Wang, T., & Cuperman, V. (1998). Robust voicing estimation with dynamic time warping. Proceedings of IEEE..
Yfantis, E. A., Lazarakis, T., & Angelopoulos, A. (1998). On time alignment and metric algorithms for speech recognition. Proceedings of IEEE.

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Amirkabir University of Technology, Tehran, Iran
Fatemeh Mozaffari & Abolghasem Sayadian

Authors

Fatemeh Mozaffari
View author publications
You can also search for this author in PubMed Google Scholar
Abolghasem Sayadian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fatemeh Mozaffari.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mozaffari, F., Sayadian, A. Improvement of time alignment of the speech signals to be used in voice conversion. Int J Speech Technol 21, 79–84 (2018). https://doi.org/10.1007/s10772-018-9490-0

Download citation

Received: 06 July 2017
Accepted: 07 January 2018
Published: 15 January 2018
Issue Date: March 2018
DOI: https://doi.org/10.1007/s10772-018-9490-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improvement of time alignment of the speech signals to be used in voice conversion

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Conventional and contemporary approaches used in text to speech synthesis: a review

Milestones in speaker recognition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improvement of time alignment of the speech signals to be used in voice conversion

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Conventional and contemporary approaches used in text to speech synthesis: a review

Milestones in speaker recognition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation