Skip to main content

Analysis on Effects of Text-to-Speech and Avatar Agent in Evoking Users’ Spontaneous Listener’s Reactions

  • Conference paper
  • First Online:

Abstract

This paper reports an analysis on effect of text-to-speech (TTS) and avatar agent in evoking user’s user’s spontaneous backchannels. We construct an HMMbased dialogue-style TTS system that generates human-like cues that evoke users’ backchannels. We also constructed an avatar agent that can make several listener’s reactions. A spoken dialogue system for information navigation was implemented and was evaluated in terms of evoked user backchannels. We conducted user experiments and the results indicated that (1) the user backchannels evoked by our TTS are more informative for the system in detecting users’ feelings than those by conventional reading-style TTS and (2) use of avatar agent can invite more user backchannels.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A Gravano and J Hirschberg (2009) Backchannel-inviting cues in task-oriented dialogue. In: Proc. Interspeech, pp 1019–1022

    Google Scholar 

  2. Abe M, Sagisaka Y, Umeda T, Kuwabara H (1990) Speech Database User Manual. ATR Technical Report TR-I-0166

    Google Scholar 

  3. Andersson S, Georgila K, Traum D, M Aylett RC (2010) Prediction and Realisation of Conversational Characteristics by Utilising Spontaneous Speech for Unit Selection. In: Proc. Speech Prosody

    Google Scholar 

  4. Bohus D, Horvitz E (2009) Models for Multiparty Engagement in Open-World Dialog. In: Proc. SIGDIAL, pp 225–234

    Google Scholar 

  5. Campbell N (2006) Conversational speech synthesis and the need for some laughter. IEEE Trans on Audio, Speech and Language Processing 14(4):1171–1178

    Article  Google Scholar 

  6. Fujie S, Fukushima K, Kobayashi T (2005) Back-channel feedback generation using linguistic and nonlinguistic information and its application to spoken dialogue system. In: Proc. Interspeech, pp 889–892

    Google Scholar 

  7. Hori C, Ohtake K, Misu T, Kashioka H, Nakamura S (2008) Dialog Management using Weighted Finite-state Transducers. In: Proc. Interspeech, pp 211–214

    Google Scholar 

  8. J Cassell MBLCKCHV T Bickmore, Yan H (1999) Embodiment in conversational interfaces: Rea. In: Proc. of Conference on Human Factors in Computing Systems, pp 520–527

    Google Scholar 

  9. Kawahara T, Toyokura M, Misu T, Hori C (2008) Detection of Feeling Through Back- Channels in Spoken Dialogue. In: Proc. Interspeech, pp 1696–1696

    Google Scholar 

  10. Kayama K, Kobayashi A, Mizukami E, Misu T, Kashioka H, Kawai H, Nakamura S (2010) Spoken Dialog System on Plasma Display Panel Estimating User’s Interest by Image Processing. In: Proc. 1st International Workshop on Human-Centric Interfaces for Ambient Intelligence (HCIAmi)

    Google Scholar 

  11. Koiso H, Horiuchi Y, Tutiya S, Ichikawa A, Den Y (1998) An Analysis of Turn-Taking and Backchannels based on Prosodic and Syntactic Features in Japanese Map Task Dialogue. Language and Speech 41(3–4):295–322

    Google Scholar 

  12. Marge M, Miranda J, Black A, Rudnicky AI (2010) Towards Improving the Naturalness of Social Conversations with Dialogue Systems. In: Proc. SIGDIAL, pp 91–94

    Google Scholar 

  13. Maynard S (1986) On back-channel behavior in japanese and english casual conversation. Linguistics 24(6):1079–1108

    Article  Google Scholar 

  14. Misu T, Ohtake K, Hori C, Kashioka H, Nakamura S (2009) Annotating Communicative Function and Semantic Content in Dialogue Act for Construction of Consulting Dialogue Systems. In: Proc. Interspeech

    Google Scholar 

  15. Misu T, Sugiura K, Ohtake K, Hori C, Kashioka H, Kawai H, Nakamura S (2010) Dialogue Strategy Optimization to Assist User’s Decision for Spoken Consulting Dialogue Systems. In: Proc. IEEE-SLT, pp 342–347

    Google Scholar 

  16. Okato Y, Kato K, Yamamoto M, Itahashi S (1996) Insertion of interjectory response based on prosodic information. In: Proc. of IEEE Workshop Interactive Voice Technology for Telecommunication Applications, pp 85–88

    Google Scholar 

  17. Reeves B, Nass C (1996) The Media Equation: How People Treat Computers, Television, and New Media Like Real People and Places. Cambridge University Press

    Google Scholar 

  18. SPTK (2011) Speech Signal Processing Toolkit (SPTK). http://sp-tk.sourceforge.net/

  19. Ward N, TsukaharaW(2000) Prosodic features which cue backchannel responses in English and Japanese. Journal of Pragmatics 32(8):1177–1207

    Google Scholar 

  20. Y Matsuyama and S Fujie and H Taniyama and T Kobayashi (2010) Psychological Evaluation of a Group Communication Activation Robot in a Party Game. In: Proc. Interspeech, pp 3046–3049

    Google Scholar 

  21. Zen H, Nose T, Yamagishi J, Sako S, Masuko T, Black A, Tokuda K (2007) The HMM-based speech synthesis system version 2.0. In: Proc. ISCA SSW6

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Teruhisa Misu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this paper

Cite this paper

Misu, T., Mizukami, E., Shiga, Y., Kawamoto, S., Kawai, H., Nakamura, S. (2011). Analysis on Effects of Text-to-Speech and Avatar Agent in Evoking Users’ Spontaneous Listener’s Reactions. In: Delgado, RC., Kobayashi, T. (eds) Proceedings of the Paralinguistic Information and its Integration in Spoken Dialogue Systems Workshop. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1335-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-1335-6_10

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-1334-9

  • Online ISBN: 978-1-4614-1335-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics