Skip to main content

Voice Speech Interfaces

  • Living reference work entry
  • First Online:
  • 335 Accesses

Synonyms

Human-robot communication; Language; Symbol grounding

Definition

Voice speech interfaces concerns the design and use of algorithms and tools based on natural language and machine-learning methods for human-robot communication.

Overview

A fundamental behavioral and cognitive capability of a robot interacting with a human user is speech, since spoken language is the primary means used by people to communicate with each other. Moreover, communication between people, and between humans and robots, is not only based on speech. Rather, communication is based on a rich multimodal process that combines spoken language with a variety of nonverbal behaviors such as eye gaze, hand gestures, tactile interaction, and emotional cues (Mavridis 2015; Cangelosi and Schlesinger 2015). Speech-based interfaces, complemented by multimodal communication, can contribute to forming a consistent and robust recognition process for the robot (and humans) by reducing ambiguity about the sensory...

This is a preview of subscription content, log in via an institution.

References

  • Antunes A, Saponaro G, Morse A, Jamone L, Santos-Victor J, Cangelosi A (2017) Learn, plan, remember: a developmental robot architecture for task solving. In: Proceedings of 2017 IEEE joint international conference on development and learning and epigenetic robotics (ICDL-EpiRob), Lisbon

    Google Scholar 

  • Araki T, Nakamura T, Nagai T, Funakoshi K, Nakano M, Iwahashi N (2011) Autonomous acquisition of multimodal information for online object concept formation by robots. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 1540–1547

    Google Scholar 

  • Cangelosi A (2010) Grounding language in action and perception: from cognitive agents to humanoid robots. Phys Life Rev 7(2):139–151

    Article  Google Scholar 

  • Cangelosi A, Ogata T (2017) Language and speech in humanoid robots. In: Vadakkepat P, Goswami A (eds) Humanoid robotics: a reference. Springer

    Google Scholar 

  • Cangelosi A, Schlesinger M (2015) Developmental robotics: from babies to robots. MIT Press, Cambridge, MA. (see chapter 7 and 8)

    Google Scholar 

  • Cangelosi A, Metta G, Sagerer G, Nolfi S, Nehaniv CL, Fischer K, Tani J, Belpaeme B, Sandini G, Fadiga L, Wrede B, Rohlfing K, Tuci E, Dautenhahn K, Saunders J, Zeschel A (2010) Integration of action and language knowledge: a roadmap for developmental robotics. IEEE Trans Auton Ment Dev 2(3):167–195

    Article  Google Scholar 

  • Celikkanat H, Orhan G, Pugeault N, Guerin F, Erol S, Kalkan S (2014) Learning and using context on a humanoid robot using latent Dirichlet allocation. In: Joint IEEE international conferences on development and learning and epigenetic robotics (ICDL-Epirob), pp 201–207

    Google Scholar 

  • Hara I, Asano F, Asoh H, Ogata J, Ichimura N, Kawai Y (2004) Robust speech interface based on audio and video information fusion for humanoid HRP-2. In: 2004 IEEE/RSJ international conference on intelligent robots and systems (IROS) (IEEE Cat. No.04CH37566), vol 3, pp 2404–2410

    Google Scholar 

  • Hayashi K, Kanda T, Miyashita T, Ishiguro H, Hagita N (2008) Robot manzai: robot conversation as a passive–social medium. Int J Humanoid Rob 5(01):67–86

    Article  Google Scholar 

  • Ishiguro H (2007) Android science. In: Robotics research. Springer, Berlin/Heidelberg, pp 118–127

    Chapter  Google Scholar 

  • Kennedy J, de Greeff J, Read R, Baxter P, Belpaeme T (2014) The Chatbot strikes back. In: Proceedings of the 9th IEEE/ACM conference on human-robot interaction (HRI2014). IEEE/ACM Press, Bielefeld

    Google Scholar 

  • Lallee S, Ford Dominey P (2013) Multi-modal convergence maps: from body schema and self-representation to mental imagery. Adapt Behav 21:274

    Article  Google Scholar 

  • Mavridis N (2015) A review of verbal and non-verbal human–robot interactive communication. Robot Auton Syst 63:22–35

    Article  MathSciNet  Google Scholar 

  • Morse A, Cangelosi A (2017) Why are there developmental stages in language learning? A developmental robotics model of language development. Cogn Sci 41:32

    Article  Google Scholar 

  • Morse AF, DeGreeff J, Belpeame T, Cangelosi A (2010) Epigenetic robotics architecture (ERA). IEEE Trans Auton Ment Dev 2(4):325–339

    Article  Google Scholar 

  • Morse A, Belpaeme T, Smith L, Cangelosi A (2015) Posture affects how robots and infants map words to objects. PLoS One 10(3)

    Article  Google Scholar 

  • Nakamura T, Ando Y, Nagai T, Kaneko M (2015) Concept formation by robots using an infinite mixture of models. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)

    Google Scholar 

  • Nefian AV, Liang L, Pi X, Liu X, Murphy K (2002) Dynamic bayesian networks for audio-visual speech recognition. EURASIP J Appl Sig Process 2002(11):1274–1288

    MATH  Google Scholar 

  • Noda K, Arie H, Suga Y, Ogata T (2014) Multimodal integration learning of robot behavior using deep neural networks. Robot Auton Syst 62(6):721–736

    Article  Google Scholar 

  • Noda K, Yamaguchi Y, Nakadai K, Okuno HG, Ogata T (2015) Audio-visual speech recognition using deep learning. Appl Intell 42(4):722–737

    Article  Google Scholar 

  • Pastra K, Aloimonos Y (2012) The minimalist grammar of action. Philos Trans R Soc Lond B Biol Sci 367(1585):103–117

    Article  Google Scholar 

  • Samuelson LK, Smith LB, Perry LK, Spencer JP (2011) Grounding word learning in space. PLoS One 6(12):e28095

    Article  Google Scholar 

  • Shiomi M, Sakamoto D, Kanda T, Ishi CT, Ishiguro H, Hagita N (2008) A semi-autonomous communication robot: a field trial at a train station. In: Proceedings of the 3rd ACM/IEEE international conference on human robot interaction, ACM, pp 303–310

    Google Scholar 

  • Steels L (ed) (2012) Experiments in cultural language evolution, vol 3. John Benjamins Publishing, Amsterdam/Philadelphia

    Google Scholar 

  • Sugita Y, Tani J (2005) Learning semantic combinatoriality from the interaction between linguistic and behavioral processes. Adapt Behav 13(1):33–52

    Article  Google Scholar 

  • Taniguchi T, Nagai T, Nakamura T, Iwahashi N, Ogata T, Asoh H (2016) Symbol emergence in robotics: a survey

    Google Scholar 

  • Tikhanoff V, Cangelosi A, Metta G (2011) Language understanding in humanoid robots: iCub simulation experiments. IEEE Trans Auton Ment Dev 3(1):17–29

    Article  Google Scholar 

  • Tuci E, Ferrauto T, Zeschel A, Massera G, Nolfi S (2011) An experiment on behaviour generalisation and the emergence of linguistic compositionality in evolving robots. IEEE Trans Auton Ment Dev 3(2):176–118

    Article  Google Scholar 

  • Twomey KE, Morse AF, Cangelosi A, Horst J (2016) Children’s referent selection and word learning: insights from a developmental robotic system. Interact Stud 17(1):101–127

    Article  Google Scholar 

  • Wallace RS (2009) The anatomy of A.L.I.C.E. In: Epstein R, Roberts G, Beber G (eds) Parsing the turing test. Springer Science+Business Media, London, pp 181–210

    Chapter  Google Scholar 

  • Yamashita Y, Tani J (2008) Emergence of functional hierarchy in a multiple timescale neural network model: a humanoid robot experiment. PLoS Comput Biol 4(11):e1000220

    Article  Google Scholar 

  • Yang Y, Li Y, Fermüller C, Aloimonos Y (2015) Robot learning manipulation action plans by “Watching” unconstrained videos from the World Wide Web. In: The twenty-ninth AAAI conference on artificial intelligence

    Google Scholar 

  • Zhong J, Cangelosi A, Ogata T (2017) Understanding natural language sentences with word embedding and multi-modal interaction. In: Proceedings of 2017 IEEE joint international conference on development and learning and epigenetic robotics (ICDL-EpiRob), Lisbon

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Angelo Cangelosi .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer-Verlag GmbH Germany, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Cangelosi, A., Ogata, T. (2018). Voice Speech Interfaces. In: Ang, M., Khatib, O., Siciliano, B. (eds) Encyclopedia of Robotics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41610-1_28-1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41610-1_28-1

  • Received:

  • Accepted:

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41610-1

  • Online ISBN: 978-3-642-41610-1

  • eBook Packages: Springer Reference EngineeringReference Module Computer Science and Engineering

Publish with us

Policies and ethics