Skip to main content

Predicting When People Will Speak to a Humanoid Robot

  • Conference paper
  • First Online:
Natural Interaction with Robots, Knowbots and Smartphones

Abstract

We tackle the novel problem of predicting when a user is likely to begin speaking to a humanoid robot. Human speakers usually take the state of their addressee into consideration and choose when to begin speaking to the addressee, and our idea is to use this convention with a system that interprets audio input. The proposed method predicts when a user is likely to begin speaking to a humanoid robot by machine learning that uses the robot’s behaviors—such as its posture, motion, and utterance—as input features. We create a data set manually annotated by three human participants indicating in real time whether or not they would be likely to begin speaking to the robot. We collect the parts to which the three commonly give the same labels and use these parts as the training and evaluation data for machine learning. Results of an experimental evaluation showed that our model correctly predicted 88.5% of the common parts in an open test. This result is similar to the results of a cross-validation, demonstrating that our model is not dependent on a specific training data set. A possible application of the model is the elimination of environmental noises that occur at timing when a cooperative user is not likely to begin speaking to a robot.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.aldebaran-robotics.com/.

  2. 2.

    http://voicetext.jp/.

References

  1. Bregler, C.: Eigenlips for robust speech recognition. Int. Comput. Sci. Inst. 2, 669–672 (1994)

    Google Scholar 

  2. Duncan, S.: Some signals and rules for taking speaking turns in conversations. J. Pers. Soc. Psychol.  23,  283–292 (1972)

    Article  Google Scholar 

  3. Ishiguro, H., Nishio, S.: Building artificial humans to understand humans. The JPN Soc. Artif. Organs  10(3),  133–142 (2007)

    Article  Google Scholar 

  4. Kanda, T., Ishiguro, H., Imai, M., Ono, T.: Development and evaluation of interactive humanoid robots. Int. Conf. Robot. Autom.  92(11),  1839–1850 (2004)

    Google Scholar 

  5. Kendon, A.: Some functions of gaze direction in social interaction. Acta Psychol.  26,  22–63 (1967)

    Article  Google Scholar 

  6. Kim, W., Ko, H.: Noise variance estimation for Kalman filtering of noisy speech. IEICE Trans. Inf. Syst.  E84-D(1),  155–160 (2001)

    Google Scholar 

  7. Lee, A., Nakamura, K., Nisimura, R., Saruwatari, H., Shikano, K.: Noise robust real world spoken dialogue system using GMM based rejection of unintended inputs. In: Proceedings of INTERSPEECH, pp. 173–176 (2004)

    Google Scholar 

  8. Minato, T., Shimada, M., Ishiguro, H., Itakura, S.: Development of an android robot for studying human-robot interaction. In: Proceedings of IEA/AIE Conference, pp. 424–434 (2004)

    Google Scholar 

  9. Mori, M., Macdorman, F., Kageki, N.: The uncanny valley. The Robot. Autom. Mag.  19(2),  98–100 (2012)

    Article  Google Scholar 

  10. Reeves, B., Nass, C.: The Media Equation: How People Treat Computers, Televisions, and New Media as Real People and Places. Cambridge University Press, Cambridge (1996)

    Google Scholar 

  11. Sacks, H., Schegloff, A., Jefferson, G.: A simplest systematics for the organization of turn-taking for conversation. Language  50(4),  696–735 (1974)

    Article  Google Scholar 

  12. Skantze, G., Gustafson, J.: Attention and interaction control in a human-human-computer dialogue setting. In: Proceedings of the SIGDIAL Conference, pp. 310–313 (2009)

    Google Scholar 

  13. Vertegaal, R., Slagter, R., Veer, G., Nijholt, A.: Eye gaze patterns in conversations: there is more to conversational agents than meets the eyes. In: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 301–308 (2001)

    Google Scholar 

  14. Yoon, S., Chang., D.: Speech enhancement based on speech/noise-dominant decision. IEICE Trans. Inf. Syst.  E85-D(4),  744–750 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Takaaki Sugiyama .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this paper

Cite this paper

Sugiyama, T., Komatani, K., Sato, S. (2014). Predicting When People Will Speak to a Humanoid Robot. In: Mariani, J., Rosset, S., Garnier-Rizet, M., Devillers, L. (eds) Natural Interaction with Robots, Knowbots and Smartphones. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8280-2_17

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-8280-2_17

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-8279-6

  • Online ISBN: 978-1-4614-8280-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics