Voice Speech Interfaces

Cangelosi, Angelo; Ogata, Tetsuya

doi:10.1007/978-3-642-41610-1_28-1

Voice Speech Interfaces

Angelo Cangelosi⁴ &
Tetsuya Ogata⁵

Living reference work entry
First Online: 16 May 2018

335 Accesses

Synonyms

Human-robot communication; Language; Symbol grounding

Definition

Voice speech interfaces concerns the design and use of algorithms and tools based on natural language and machine-learning methods for human-robot communication.

Overview

A fundamental behavioral and cognitive capability of a robot interacting with a human user is speech, since spoken language is the primary means used by people to communicate with each other. Moreover, communication between people, and between humans and robots, is not only based on speech. Rather, communication is based on a rich multimodal process that combines spoken language with a variety of nonverbal behaviors such as eye gaze, hand gestures, tactile interaction, and emotional cues (Mavridis 2015; Cangelosi and Schlesinger 2015). Speech-based interfaces, complemented by multimodal communication, can contribute to forming a consistent and robust recognition process for the robot (and humans) by reducing ambiguity about the sensory...

This is a preview of subscription content, log in via an institution.

References

Antunes A, Saponaro G, Morse A, Jamone L, Santos-Victor J, Cangelosi A (2017) Learn, plan, remember: a developmental robot architecture for task solving. In: Proceedings of 2017 IEEE joint international conference on development and learning and epigenetic robotics (ICDL-EpiRob), Lisbon
Google Scholar
Araki T, Nakamura T, Nagai T, Funakoshi K, Nakano M, Iwahashi N (2011) Autonomous acquisition of multimodal information for online object concept formation by robots. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 1540–1547
Google Scholar
Cangelosi A (2010) Grounding language in action and perception: from cognitive agents to humanoid robots. Phys Life Rev 7(2):139–151
Article Google Scholar
Cangelosi A, Ogata T (2017) Language and speech in humanoid robots. In: Vadakkepat P, Goswami A (eds) Humanoid robotics: a reference. Springer
Google Scholar
Cangelosi A, Schlesinger M (2015) Developmental robotics: from babies to robots. MIT Press, Cambridge, MA. (see chapter 7 and 8)
Google Scholar
Cangelosi A, Metta G, Sagerer G, Nolfi S, Nehaniv CL, Fischer K, Tani J, Belpaeme B, Sandini G, Fadiga L, Wrede B, Rohlfing K, Tuci E, Dautenhahn K, Saunders J, Zeschel A (2010) Integration of action and language knowledge: a roadmap for developmental robotics. IEEE Trans Auton Ment Dev 2(3):167–195
Article Google Scholar
Celikkanat H, Orhan G, Pugeault N, Guerin F, Erol S, Kalkan S (2014) Learning and using context on a humanoid robot using latent Dirichlet allocation. In: Joint IEEE international conferences on development and learning and epigenetic robotics (ICDL-Epirob), pp 201–207
Google Scholar
Hara I, Asano F, Asoh H, Ogata J, Ichimura N, Kawai Y (2004) Robust speech interface based on audio and video information fusion for humanoid HRP-2. In: 2004 IEEE/RSJ international conference on intelligent robots and systems (IROS) (IEEE Cat. No.04CH37566), vol 3, pp 2404–2410
Google Scholar
Hayashi K, Kanda T, Miyashita T, Ishiguro H, Hagita N (2008) Robot manzai: robot conversation as a passive–social medium. Int J Humanoid Rob 5(01):67–86
Article Google Scholar
Ishiguro H (2007) Android science. In: Robotics research. Springer, Berlin/Heidelberg, pp 118–127
Chapter Google Scholar
Kennedy J, de Greeff J, Read R, Baxter P, Belpaeme T (2014) The Chatbot strikes back. In: Proceedings of the 9th IEEE/ACM conference on human-robot interaction (HRI2014). IEEE/ACM Press, Bielefeld
Google Scholar
Lallee S, Ford Dominey P (2013) Multi-modal convergence maps: from body schema and self-representation to mental imagery. Adapt Behav 21:274
Article Google Scholar
Mavridis N (2015) A review of verbal and non-verbal human–robot interactive communication. Robot Auton Syst 63:22–35
Article MathSciNet Google Scholar
Morse A, Cangelosi A (2017) Why are there developmental stages in language learning? A developmental robotics model of language development. Cogn Sci 41:32
Article Google Scholar
Morse AF, DeGreeff J, Belpeame T, Cangelosi A (2010) Epigenetic robotics architecture (ERA). IEEE Trans Auton Ment Dev 2(4):325–339
Article Google Scholar
Morse A, Belpaeme T, Smith L, Cangelosi A (2015) Posture affects how robots and infants map words to objects. PLoS One 10(3)
Article Google Scholar
Nakamura T, Ando Y, Nagai T, Kaneko M (2015) Concept formation by robots using an infinite mixture of models. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)
Google Scholar
Nefian AV, Liang L, Pi X, Liu X, Murphy K (2002) Dynamic bayesian networks for audio-visual speech recognition. EURASIP J Appl Sig Process 2002(11):1274–1288
MATH Google Scholar
Noda K, Arie H, Suga Y, Ogata T (2014) Multimodal integration learning of robot behavior using deep neural networks. Robot Auton Syst 62(6):721–736
Article Google Scholar
Noda K, Yamaguchi Y, Nakadai K, Okuno HG, Ogata T (2015) Audio-visual speech recognition using deep learning. Appl Intell 42(4):722–737
Article Google Scholar
Pastra K, Aloimonos Y (2012) The minimalist grammar of action. Philos Trans R Soc Lond B Biol Sci 367(1585):103–117
Article Google Scholar
Samuelson LK, Smith LB, Perry LK, Spencer JP (2011) Grounding word learning in space. PLoS One 6(12):e28095
Article Google Scholar
Shiomi M, Sakamoto D, Kanda T, Ishi CT, Ishiguro H, Hagita N (2008) A semi-autonomous communication robot: a field trial at a train station. In: Proceedings of the 3rd ACM/IEEE international conference on human robot interaction, ACM, pp 303–310
Google Scholar
Steels L (ed) (2012) Experiments in cultural language evolution, vol 3. John Benjamins Publishing, Amsterdam/Philadelphia
Google Scholar
Sugita Y, Tani J (2005) Learning semantic combinatoriality from the interaction between linguistic and behavioral processes. Adapt Behav 13(1):33–52
Article Google Scholar
Taniguchi T, Nagai T, Nakamura T, Iwahashi N, Ogata T, Asoh H (2016) Symbol emergence in robotics: a survey
Google Scholar
Tikhanoff V, Cangelosi A, Metta G (2011) Language understanding in humanoid robots: iCub simulation experiments. IEEE Trans Auton Ment Dev 3(1):17–29
Article Google Scholar
Tuci E, Ferrauto T, Zeschel A, Massera G, Nolfi S (2011) An experiment on behaviour generalisation and the emergence of linguistic compositionality in evolving robots. IEEE Trans Auton Ment Dev 3(2):176–118
Article Google Scholar
Twomey KE, Morse AF, Cangelosi A, Horst J (2016) Children’s referent selection and word learning: insights from a developmental robotic system. Interact Stud 17(1):101–127
Article Google Scholar
Wallace RS (2009) The anatomy of A.L.I.C.E. In: Epstein R, Roberts G, Beber G (eds) Parsing the turing test. Springer Science+Business Media, London, pp 181–210
Chapter Google Scholar
Yamashita Y, Tani J (2008) Emergence of functional hierarchy in a multiple timescale neural network model: a humanoid robot experiment. PLoS Comput Biol 4(11):e1000220
Article Google Scholar
Yang Y, Li Y, Fermüller C, Aloimonos Y (2015) Robot learning manipulation action plans by “Watching” unconstrained videos from the World Wide Web. In: The twenty-ninth AAAI conference on artificial intelligence
Google Scholar
Zhong J, Cangelosi A, Ogata T (2017) Understanding natural language sentences with word embedding and multi-modal interaction. In: Proceedings of 2017 IEEE joint international conference on development and learning and epigenetic robotics (ICDL-EpiRob), Lisbon
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Robotics and Neural Systems, School of Computing and Mathematics, Plymouth University, Plymouth, UK
Angelo Cangelosi
Faculty of Science and Engineering, Waseda University, Tokyo, Japan
Tetsuya Ogata

Authors

Angelo Cangelosi
View author publications
You can also search for this author in PubMed Google Scholar
Tetsuya Ogata
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Angelo Cangelosi .

Editor information

Editors and Affiliations

Dept. of Mechanical Engineering Blk 1E Gillman Hts #17-43, National University of Singapore, Singapore, Singapore
Marcelo H Ang
Department of Computer Science, Stanford University, Stanford, California, USA
Oussama Khatib
Dipto di Informatica e Sistemistca, Univ di Napoli Federico II, Napoli, Italy
Bruno Siciliano

Section Editor information

School of Mechanical Engineering, Korea University of Technology & Education, Cheon-An, Chungcheong, Republic of Korea
Jee-Hwan Ryu

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Cangelosi, A., Ogata, T. (2018). Voice Speech Interfaces. In: Ang, M., Khatib, O., Siciliano, B. (eds) Encyclopedia of Robotics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41610-1_28-1

Download citation

DOI: https://doi.org/10.1007/978-3-642-41610-1_28-1
Received: 04 July 2017
Accepted: 29 January 2018
Published: 16 May 2018
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41610-1
Online ISBN: 978-3-642-41610-1
eBook Packages: Springer Reference EngineeringReference Module Computer Science and Engineering

Publish with us

Policies and ethics