Advertisement

A Method of Real-Time Non-uniform Speech Stretching

  • Adam Kupryjanow
  • Andrzej Czyzewski
Part of the Communications in Computer and Information Science book series (CCIS, volume 314)

Abstract

Developed method of real-time non-uniform speech stretching is presented. The proposed solution is based on the well-known SOLA algorithm (Synchronous Overlap and Add). Non-uniform time-scale modification is achieved by the adjustment of time scaling factor values in accordance with the signal content. Dependently on the speech unit (vowels/consonants), instantaneous rate of speech (ROS), and speech signal presence, values of the scaling factor are selected. This provides as low as possible difference in the duration of the input and output signal and high naturalness and quality of the modified speech. In the experimental part of the paper accuracy of the proposed ROS estimator is examined. Quality of the speech stretched using the proposed method is assessed in the subjective tests.

Keywords

Time-scale Modification Voice Detection Vowels Detection Rate of Speech Estimation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Demol, M., Verhelst, W., Struye, K., Verhoeve, P.: Efficient Non-Uniform Time-Scaling of Speech with WSOLA. In: Speech and Computers, SPECOM (2005)Google Scholar
  2. 2.
    Grofit, S., Lavner, Y.: Time-Scale Modification of Audio Signals Using Enhanced WSOLA with Management of Transients. IEEE Trans. on Audio, Speech, and Language Processing 16(1) (2008)Google Scholar
  3. 3.
    Kupryjanow, A., Czyzewski, A.: Real-time speech-rate modification experiments. Audio Engineering Society Convention Paper, Preprint No. 8052, London (2010)Google Scholar
  4. 4.
    Kupryjanow, A., Czyzewski, A.: Time-scale modification of speech signals for supporting hearing impaired schoolchildren. In: Proc. of the International Conference NTAV/SPA, New Trends in Audio and Video, Signal Processing: Algorithms, Architectures, Arrangements and Applications, Poznan, pp. 159–162 (2009)Google Scholar
  5. 5.
    Le Beux, S., Doval, B., d’Alessandro, C.: Issues and solutions related to real-time TD-PSOLA implementation. Audio Engineering Society Convention Paper, Preprint No. 8085 (2010)Google Scholar
  6. 6.
    Mirghafori, N., Fosler, E., Morgan, N.: Towards Robustness to Fast Speech in ASR. In: Proc. ICASSP 1996, pp. I335–I338 (1996)Google Scholar
  7. 7.
    Morgan, N., Fosler-Lussier, E.: Combining multiple estimators of speaking rate. In: ICASSP, Seattle (1998)Google Scholar
  8. 8.
    Moulines, E., Laroche, J.: Non-parametric techniques for pitch-scale and time-scale modification of speech. Speech Communication 16(2), 175–205 (1995)CrossRefGoogle Scholar
  9. 9.
    Narayanan, S., Wang, D.: Speech rate estimation via temporal correlation andselected sub-band correlation. In: ICASSP (2005)Google Scholar
  10. 10.
    Pesce, F.: Realtime-stretching of speech signals. In: DAFX, Italy (2000)Google Scholar
  11. 11.
    Pfau, T., Ruske, G.: Estimating the speaking rate by vowel detection. In: ICASSP 1998, Seattle (1998)Google Scholar
  12. 12.
    Tallal, P., et al.: Language Comprehension in Language-Learning Impaired Children Improved with acoustically modified speech. Science 271 (1996)Google Scholar
  13. 13.
    Verhelst, W., Roelands, M.: An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1993 (1993)Google Scholar
  14. 14.
    Yoo, I.C., Yook, D.: Robust Voice Activity Detection Using the Spectral Peaks of Vowel Sounds. ETRI Journal 31(4), s. 451–s. 453 (2009)CrossRefGoogle Scholar
  15. 15.
    Zheng, J., Franco, H., Stolcke, A.: Rate of Speech Modeling for Large Vocabulary Conversational Speech Recognition (2000)Google Scholar
  16. 16.
    Zheng, J., Franco, H., Weng, F., Sankar, A., Bratt, H.: Word-level rate-of-speech modeling using rate-specificphones and pronunciations. In: Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Istanbul, vol. 3, pp. 1775–1778 (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Adam Kupryjanow
    • 1
  • Andrzej Czyzewski
    • 1
  1. 1.Multimedia Systems DepartmentGdansk University of TechnologyGdanskPoland

Personalised recommendations