A Voice Morphing Model Based on the Gaussian Mixture Model and Generative Topographic Mapping

  • Murad A. RassamEmail author
  • Rasha Almekhlafi
  • Eman Alosaily
  • Haneen Hassan
  • Reem Hassan
  • Eman Saeed
  • Elham Alqershi
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1073)


In this paper, a new model for voice morphing is proposed. The spectral characteristics of a source speaker’s speech have been transferred to speech as it was spoken by another designated target speaker. The proposed model performs a phoneme segmentation of the voice signal and then transforms the spectral characteristics of each segment using a Linear Prediction model. The spectral features extracted using the Linear Prediction Coding (LPC) technique are aligned using the Dynamic Time Wrapping (DTW). The Generative Topographic Mapping (GTM) method was used for modeling the LPC features. Then, the transformation is achieved using the Gaussian Mixture Model (GMM). The transformed code-books are finally converted to prediction coefficients, and the excitation signal is filtered in order to synthesis the speech. A correlation test is performed between the source, and target signals showed a high correlation. The results reveal that the proposed model is promising in terms of recognizing full sentences in addition to individual words.


Voice morphing DTW GTM GMM Signal processing Correlation 


  1. 1.
    Hutchinson, M.: Methods for voice conversion (2012)Google Scholar
  2. 2.
    Saundade, M., Kurle, P.: Speech recognition using digital signal processing. Int. J. Electron. Commun. Soft Comput. Sci. Eng. 2, 31 (2013)Google Scholar
  3. 3.
    Orphanidou, C., et al.: Voice morphing using the generative topographic mapping (2003)Google Scholar
  4. 4.
    Kain, A., Macon, M.W.: Spectral voice conversion for text-to-speech synthesis (1998)Google Scholar
  5. 5.
    Mccree, A.: Low-Bit-Rate Speech Coding. Information Systems Technology Group, MIT Lincoln Laboratory (2008)Google Scholar
  6. 6.
    Abe, M., Nakamura, S., Shikano, K., Kuwabara, H.: Voice conversion through vector quantization. In: Proceedings of IEEE ICASSP (1988)Google Scholar
  7. 7.
    Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals. Prentice-Hall Signal Processing Series (1978)Google Scholar
  8. 8.
    Drioli, C.: Radial basis function networks for conversion of sound spectra. EURASIP J. Appl. Signal Process. 2001, 36–44 (2001)Google Scholar
  9. 9.
    Orphanidou, C., Moroz, I.M., Roberts, S.J.: Wavelet-based voice morphing (2004)Google Scholar
  10. 10.
    Garofolo, J.S.: TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1. Web Download. Linguistic Data Consortium, Philadelphia (1993)Google Scholar
  11. 11.
    Songar, A., Harita, M.B.: MATLAB based voice conversion model using PSOLA algorithm. Int. J. Digit. Appl. Contemp. Res. 1, 2319–4863 (2013)Google Scholar
  12. 12.
    Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 64, 561–580 (1975)CrossRefGoogle Scholar
  13. 13.
    Hosom, J.-P.: Automatic time alignment of phonemes using acoustic-phonetic information, May 2000Google Scholar
  14. 14.
    Markus, J.F.: GTM: the generative topographic mapping, April 1998Google Scholar
  15. 15.

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Murad A. Rassam
    • 1
    • 2
    Email author
  • Rasha Almekhlafi
    • 2
  • Eman Alosaily
    • 2
  • Haneen Hassan
    • 2
  • Reem Hassan
    • 2
  • Eman Saeed
    • 2
  • Elham Alqershi
    • 2
  1. 1.Information Technology Department, College of ComputerQassim UniversityBuraidahKingdom of Saudi Arabia
  2. 2.Faculty of Engineering and Information TechnologyTaiz UniversityTaizYemen

Personalised recommendations