Personalized Voice Assignment Techniques for Synchronized Scenario Speech Output in Entertainment Systems

  • Shin-ichi Kawamoto
  • Tatsuo Yotsukura
  • Satoshi Nakamura
  • Shigeo Morishima
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6774)


The paper describes voice assignment techniques for synchronized scenario speech output in an instant casting movie system that enables anyone to be a movie star using his or her own voice and face. Two prototype systems were implemented, and both systems worked well for various participants, ranging from children to the elderly.


Instant casting movie system post-recording speaker similarity voice morphing synchronized speech output 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Maejima, A., Wemler, S., Machida, T., Takebayahashi, M., Morishima, S.: Instant Casting Movie Theater: The Future Cast System. The IEICE Transactions on Information and Systems E91-D(4), 1135–1148 (2008)CrossRefGoogle Scholar
  2. 2.
    Zen, H., Nose, T., Yamagishi, J., Sako, S., Masuko, T., Black, A.W., Tokuda, K.: The HMM-based speech synthesis system version 2.0. In: Proc. of ISCA SSW6, Bonn, Germany (2007)Google Scholar
  3. 3.
    Kawai, H., Toda, T., Yamagishi, J., Hirai, T., Ni, J., Nishizawa, N., Tsuzaki, M., Tokuda, K.: XIMERA: A Concatenative Speech Synthesis System with Large Scale Corpora. IEICE Trans. J89-D-II(12), 2688–2698 (2006)Google Scholar
  4. 4.
    Hunt, A., Black, A.: Unit selection in a concatenative speech synthesis system using a large speech database. In: Proc. ICASSP, pp. 373–376 (1996)Google Scholar
  5. 5.
    Clark, R.A.K., Richmond, K., King, S.: Multisyn: Open-domain unit selection for the Festival speech synthesis system. Speech Communication 49(4), 317–330 (2007)CrossRefGoogle Scholar
  6. 6.
    Reynolds, D.: Robust text-independent speaker identication using gaussian mixture speaker models. IEEE Trans. On Acoust. Speech and Audio Processing 3(1) (1995)Google Scholar
  7. 7.
    Kitamura, T., Saitou, T.: Contribution of acoustic features of sustained vowels on perception of speaker characteristic. In: Proc. of Acoustical Society of Japan 2007 Spring Meeting, pp. 443–444 (2007)Google Scholar
  8. 8.
    Saitou, T., Kitamura, T.: Factors in /vvv/ concatenated vowels affecting perception of speaker individuality. In: Proc. of Acoustical Society of Japan 2007 Spring Meeting, pp. 441–442 (2007)Google Scholar
  9. 9.
    Higuchi, N., Hashimoto, M.: Analysis of acoustic features affecting speaker identification. In: Proc. of EUROSPEECH, pp. 435–438 (1995)Google Scholar
  10. 10.
    Kawahara, H.: Straight: An extremely high-quality vocoder for auditory and speech perception research. In: Greenberg, Slaney (eds.) Computational Models of Auditory Function, pp. 343–354 (2001)Google Scholar
  11. 11.
    Kawahara, H., Matsui, H.: Auditory morphing based on an elastic perceptual distance metric in an interference-free time-frequency representation. In: Proc. of ICASSP, vol. 1, pp. 256–259 (2003)Google Scholar
  12. 12.
    Slaney, M., Covell, M., Lassiter, B.: Automatic audio morphing. In: Proc. of ICASSP, pp. 1001–1004 (1995)Google Scholar
  13. 13.
    Takahashi, T., Nishi, M., Irino, T., Kawahara, H.: Average voice synthesis using multiple speech morphing. In: Proc. of Acoustical Society of Japan 2006 Spring Meeting, pp. 229–230 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Shin-ichi Kawamoto
    • 1
    • 3
  • Tatsuo Yotsukura
    • 2
  • Satoshi Nakamura
    • 3
  • Shigeo Morishima
    • 4
  1. 1.Japan Advanced Institute of Science and TechnologyNomiJapan
  2. 2.OLM Digital Inc.Setagaya-kuJapan
  3. 3.National Institute of Information and Communications TechnologySoraku-gunJapan
  4. 4.Waseda UniversityShinjuku-kuJapan

Personalised recommendations