Predicting When People Will Speak to a Humanoid Robot

Sugiyama, Takaaki; Komatani, Kazunori; Sato, Satoshi

doi:10.1007/978-1-4614-8280-2_17

Takaaki Sugiyama⁵,
Kazunori Komatani⁵ &
Satoshi Sato⁵

1513 Accesses
1 Citations

Abstract

We tackle the novel problem of predicting when a user is likely to begin speaking to a humanoid robot. Human speakers usually take the state of their addressee into consideration and choose when to begin speaking to the addressee, and our idea is to use this convention with a system that interprets audio input. The proposed method predicts when a user is likely to begin speaking to a humanoid robot by machine learning that uses the robot’s behaviors—such as its posture, motion, and utterance—as input features. We create a data set manually annotated by three human participants indicating in real time whether or not they would be likely to begin speaking to the robot. We collect the parts to which the three commonly give the same labels and use these parts as the training and evaluation data for machine learning. Results of an experimental evaluation showed that our model correctly predicted 88.5% of the common parts in an open test. This result is similar to the results of a cross-validation, demonstrating that our model is not dependent on a specific training data set. A possible application of the model is the elimination of environmental noises that occur at timing when a cooperative user is not likely to begin speaking to a robot.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Bregler, C.: Eigenlips for robust speech recognition. Int. Comput. Sci. Inst. 2, 669–672 (1994)
Google Scholar
Duncan, S.: Some signals and rules for taking speaking turns in conversations. J. Pers. Soc. Psychol. 23, 283–292 (1972)
Article Google Scholar
Ishiguro, H., Nishio, S.: Building artificial humans to understand humans. The JPN Soc. Artif. Organs 10(3), 133–142 (2007)
Article Google Scholar
Kanda, T., Ishiguro, H., Imai, M., Ono, T.: Development and evaluation of interactive humanoid robots. Int. Conf. Robot. Autom. 92(11), 1839–1850 (2004)
Google Scholar
Kendon, A.: Some functions of gaze direction in social interaction. Acta Psychol. 26, 22–63 (1967)
Article Google Scholar
Kim, W., Ko, H.: Noise variance estimation for Kalman filtering of noisy speech. IEICE Trans. Inf. Syst. E84-D(1), 155–160 (2001)
Google Scholar
Lee, A., Nakamura, K., Nisimura, R., Saruwatari, H., Shikano, K.: Noise robust real world spoken dialogue system using GMM based rejection of unintended inputs. In: Proceedings of INTERSPEECH, pp. 173–176 (2004)
Google Scholar
Minato, T., Shimada, M., Ishiguro, H., Itakura, S.: Development of an android robot for studying human-robot interaction. In: Proceedings of IEA/AIE Conference, pp. 424–434 (2004)
Google Scholar
Mori, M., Macdorman, F., Kageki, N.: The uncanny valley. The Robot. Autom. Mag. 19(2), 98–100 (2012)
Article Google Scholar
Reeves, B., Nass, C.: The Media Equation: How People Treat Computers, Televisions, and New Media as Real People and Places. Cambridge University Press, Cambridge (1996)
Google Scholar
Sacks, H., Schegloff, A., Jefferson, G.: A simplest systematics for the organization of turn-taking for conversation. Language 50(4), 696–735 (1974)
Article Google Scholar
Skantze, G., Gustafson, J.: Attention and interaction control in a human-human-computer dialogue setting. In: Proceedings of the SIGDIAL Conference, pp. 310–313 (2009)
Google Scholar
Vertegaal, R., Slagter, R., Veer, G., Nijholt, A.: Eye gaze patterns in conversations: there is more to conversational agents than meets the eyes. In: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 301–308 (2001)
Google Scholar
Yoon, S., Chang., D.: Speech enhancement based on speech/noise-dominant decision. IEICE Trans. Inf. Syst. E85-D(4), 744–750 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Engineering, Nagoya University, Nagoya, Japan
Takaaki Sugiyama, Kazunori Komatani & Satoshi Sato

Authors

Takaaki Sugiyama
View author publications
You can also search for this author in PubMed Google Scholar
Kazunori Komatani
View author publications
You can also search for this author in PubMed Google Scholar
Satoshi Sato
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Takaaki Sugiyama .

Editor information

Editors and Affiliations

IMMI-CNRS, Orsay, France
Joseph Mariani
LIMSI-CNRS, Orsay, France
Sophie Rosset
IMMI-CNRS, Orsay, France
Martine Garnier-Rizet
LIMSI-CNRS, Orsay, France
Laurence Devillers

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sugiyama, T., Komatani, K., Sato, S. (2014). Predicting When People Will Speak to a Humanoid Robot. In: Mariani, J., Rosset, S., Garnier-Rizet, M., Devillers, L. (eds) Natural Interaction with Robots, Knowbots and Smartphones. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8280-2_17

Download citation

DOI: https://doi.org/10.1007/978-1-4614-8280-2_17
Published: 28 August 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8279-6
Online ISBN: 978-1-4614-8280-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics