Synonyms
Definition
Voice speech interfaces concerns the design and use of algorithms and tools based on natural language and machine-learning methods for human-robot communication.
Overview
A fundamental behavioral and cognitive capability of a robot interacting with a human user is speech, since spoken language is the primary means used by people to communicate with each other. Moreover, communication between people, and between humans and robots, is not only based on speech. Rather, communication is based on a rich multimodal process that combines spoken language with a variety of nonverbal behaviors such as eye gaze, hand gestures, tactile interaction, and emotional cues (Mavridis 2015; Cangelosi and Schlesinger 2015). Speech-based interfaces, complemented by multimodal communication, can contribute to forming a consistent and robust recognition process for the robot (and humans) by reducing ambiguity about the sensory...
This is a preview of subscription content, log in via an institution.
References
Antunes A, Saponaro G, Morse A, Jamone L, Santos-Victor J, Cangelosi A (2017) Learn, plan, remember: a developmental robot architecture for task solving. In: Proceedings of 2017 IEEE joint international conference on development and learning and epigenetic robotics (ICDL-EpiRob), Lisbon
Araki T, Nakamura T, Nagai T, Funakoshi K, Nakano M, Iwahashi N (2011) Autonomous acquisition of multimodal information for online object concept formation by robots. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 1540–1547
Cangelosi A (2010) Grounding language in action and perception: from cognitive agents to humanoid robots. Phys Life Rev 7(2):139–151
Cangelosi A, Ogata T (2017) Language and speech in humanoid robots. In: Vadakkepat P, Goswami A (eds) Humanoid robotics: a reference. Springer
Cangelosi A, Schlesinger M (2015) Developmental robotics: from babies to robots. MIT Press, Cambridge, MA. (see chapter 7 and 8)
Cangelosi A, Metta G, Sagerer G, Nolfi S, Nehaniv CL, Fischer K, Tani J, Belpaeme B, Sandini G, Fadiga L, Wrede B, Rohlfing K, Tuci E, Dautenhahn K, Saunders J, Zeschel A (2010) Integration of action and language knowledge: a roadmap for developmental robotics. IEEE Trans Auton Ment Dev 2(3):167–195
Celikkanat H, Orhan G, Pugeault N, Guerin F, Erol S, Kalkan S (2014) Learning and using context on a humanoid robot using latent Dirichlet allocation. In: Joint IEEE international conferences on development and learning and epigenetic robotics (ICDL-Epirob), pp 201–207
Hara I, Asano F, Asoh H, Ogata J, Ichimura N, Kawai Y (2004) Robust speech interface based on audio and video information fusion for humanoid HRP-2. In: 2004 IEEE/RSJ international conference on intelligent robots and systems (IROS) (IEEE Cat. No.04CH37566), vol 3, pp 2404–2410
Hayashi K, Kanda T, Miyashita T, Ishiguro H, Hagita N (2008) Robot manzai: robot conversation as a passive–social medium. Int J Humanoid Rob 5(01):67–86
Ishiguro H (2007) Android science. In: Robotics research. Springer, Berlin/Heidelberg, pp 118–127
Kennedy J, de Greeff J, Read R, Baxter P, Belpaeme T (2014) The Chatbot strikes back. In: Proceedings of the 9th IEEE/ACM conference on human-robot interaction (HRI2014). IEEE/ACM Press, Bielefeld
Lallee S, Ford Dominey P (2013) Multi-modal convergence maps: from body schema and self-representation to mental imagery. Adapt Behav 21:274
Mavridis N (2015) A review of verbal and non-verbal human–robot interactive communication. Robot Auton Syst 63:22–35
Morse A, Cangelosi A (2017) Why are there developmental stages in language learning? A developmental robotics model of language development. Cogn Sci 41:32
Morse AF, DeGreeff J, Belpeame T, Cangelosi A (2010) Epigenetic robotics architecture (ERA). IEEE Trans Auton Ment Dev 2(4):325–339
Morse A, Belpaeme T, Smith L, Cangelosi A (2015) Posture affects how robots and infants map words to objects. PLoS One 10(3)
Nakamura T, Ando Y, Nagai T, Kaneko M (2015) Concept formation by robots using an infinite mixture of models. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)
Nefian AV, Liang L, Pi X, Liu X, Murphy K (2002) Dynamic bayesian networks for audio-visual speech recognition. EURASIP J Appl Sig Process 2002(11):1274–1288
Noda K, Arie H, Suga Y, Ogata T (2014) Multimodal integration learning of robot behavior using deep neural networks. Robot Auton Syst 62(6):721–736
Noda K, Yamaguchi Y, Nakadai K, Okuno HG, Ogata T (2015) Audio-visual speech recognition using deep learning. Appl Intell 42(4):722–737
Pastra K, Aloimonos Y (2012) The minimalist grammar of action. Philos Trans R Soc Lond B Biol Sci 367(1585):103–117
Samuelson LK, Smith LB, Perry LK, Spencer JP (2011) Grounding word learning in space. PLoS One 6(12):e28095
Shiomi M, Sakamoto D, Kanda T, Ishi CT, Ishiguro H, Hagita N (2008) A semi-autonomous communication robot: a field trial at a train station. In: Proceedings of the 3rd ACM/IEEE international conference on human robot interaction, ACM, pp 303–310
Steels L (ed) (2012) Experiments in cultural language evolution, vol 3. John Benjamins Publishing, Amsterdam/Philadelphia
Sugita Y, Tani J (2005) Learning semantic combinatoriality from the interaction between linguistic and behavioral processes. Adapt Behav 13(1):33–52
Taniguchi T, Nagai T, Nakamura T, Iwahashi N, Ogata T, Asoh H (2016) Symbol emergence in robotics: a survey
Tikhanoff V, Cangelosi A, Metta G (2011) Language understanding in humanoid robots: iCub simulation experiments. IEEE Trans Auton Ment Dev 3(1):17–29
Tuci E, Ferrauto T, Zeschel A, Massera G, Nolfi S (2011) An experiment on behaviour generalisation and the emergence of linguistic compositionality in evolving robots. IEEE Trans Auton Ment Dev 3(2):176–118
Twomey KE, Morse AF, Cangelosi A, Horst J (2016) Children’s referent selection and word learning: insights from a developmental robotic system. Interact Stud 17(1):101–127
Wallace RS (2009) The anatomy of A.L.I.C.E. In: Epstein R, Roberts G, Beber G (eds) Parsing the turing test. Springer Science+Business Media, London, pp 181–210
Yamashita Y, Tani J (2008) Emergence of functional hierarchy in a multiple timescale neural network model: a humanoid robot experiment. PLoS Comput Biol 4(11):e1000220
Yang Y, Li Y, Fermüller C, Aloimonos Y (2015) Robot learning manipulation action plans by “Watching” unconstrained videos from the World Wide Web. In: The twenty-ninth AAAI conference on artificial intelligence
Zhong J, Cangelosi A, Ogata T (2017) Understanding natural language sentences with word embedding and multi-modal interaction. In: Proceedings of 2017 IEEE joint international conference on development and learning and epigenetic robotics (ICDL-EpiRob), Lisbon
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer-Verlag GmbH Germany, part of Springer Nature
About this entry
Cite this entry
Cangelosi, A., Ogata, T. (2018). Voice Speech Interfaces. In: Ang, M., Khatib, O., Siciliano, B. (eds) Encyclopedia of Robotics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41610-1_28-1
Download citation
DOI: https://doi.org/10.1007/978-3-642-41610-1_28-1
Received:
Accepted:
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41610-1
Online ISBN: 978-3-642-41610-1
eBook Packages: Springer Reference EngineeringReference Module Computer Science and Engineering