Skip to main content

Abstract

Conversational systems play an important role in scenarios without a keyboard, e.g., talking to a robot. Communication in human-robot interaction (HRI) ultimately involves a combination of verbal and non-verbal inputs and outputs. HRI systems must process verbal and non-verbal observations and execute verbal and non-verbal actions in parallel, to interpret and produce synchronized behaviours. The development of such systems involves the integration of potentially many components and ensuring a complex interaction and synchronization between them. Most work in spoken dialogue system development uses pipeline architectures. Some exceptions are [1, 17], which execute system components in parallel (weakly-coupled or tightly-coupled architectures). The latter are more promising for building adaptive systems, which is one of the goals of contemporary research systems.

Parts of the research reported on in this paper were performed in the context of the EU-FP7 project ALIZ-E (ICT-248116), which develops embodied cognitive robots for believable any-depth affective interactions with young users over an extended and possibly discontinuous period [2].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Acapela website http://www.acapela-group.com/index.html

  2. ALIZ-E website http://aliz-e.org/

  3. Mary TTS website http://mary.dfki.de/

  4. OpenCCG website http://openccg.sourceforge.net/

  5. OpenCV library website http://opencv.willowgarage.com

  6. Baillie, J.: URBI: Towards a Universal Robotic Low-Level Programming Language. In: 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3219–3224. IEEE (2005)

    Google Scholar 

  7. Baldridge, J., Kruijff, G.J.: Multi-modal combinatory categorial grammar. In: Proceedings of 10th Annual Meeting of the European Association for Computational Linguistics (2003)

    Google Scholar 

  8. Beck, A., Cañamero, L., Bard, K.: Towards an affect space for robots to display emotional body language. In: Proceedings of the 19th IEEE international symposium on robot and human interactive communication, Ro-Man 2010, pp. 464–469. IEEE (2010)

    Google Scholar 

  9. Beck, A., Hiolle, A., Mazel, A., Cañamero, L.: Interpretation of emotional body language displayed by robots. In: Proceedings of the 3rd international workshop on Affective interaction in natural environments, AFFINE ’10, pp. 37–42. ACM, New York, NY, USA (2010). DOI http://doi.acm.org/10.1145/ 1877826.1877837. URL http://doi.acm.org/10.1145/1877826. 1877837

  10. Bradski, G., Davis, J.: Motion segmentation and pose recognition with motion history gradients. Machine Vision and Applications 13, 174–184 (2002)

    Article  Google Scholar 

  11. Cuayáhuitl, H., Renals, S., Lemon, O., Shimodaira, H.: Evaluation of a hierarchical reinforcement learning spoken dialogue system. Computer Speech and Language 24(2), 395–429 (2010). DOI 10.1016/j.csl.2009.07.001

    Article  Google Scholar 

  12. Dekens, T., Verhelst, W.: On the noise robustness of voice activity detection algorithms. In: Proc. of InterSpeech, Florence, Italy (2011)

    Google Scholar 

  13. Gerosa, M., Giuliani, D., Brugnara, F.: Acoustic variability and automatic recognition of children’s speech. Speech Communication 49, 847–860 (2007)

    Article  Google Scholar 

  14. Hawes, N.,Wyatt, J.L., Sloman, A., Sridharan, M., Dearden, R., Jacobsson, H., Kruijff, G.: Architecture and representations. In: H.I. Christensen, A. Sloman, G. Kruijff, J. Wyatt (eds.) Cognitive Systems, pp. 53–95. published online at http://www.cognitivesystems.org/cosybook/ (2009)

  15. Jones, M.J., Rehg, J.M.: Statistical color models with application to skin detection. Int. J. Comput. Vision 46, 81–96 (2002)

    Article  MATH  Google Scholar 

  16. Larsson, S., Traum, D.: Information state and dialogue management in the TRINDI dialogue move engine toolkit. Natural Language Engineering 5(3–4), 323–340 (2000)

    Article  Google Scholar 

  17. Lemon, O., Bracy, A., Gruenstein, A., Peters, S.: The WITAS multi-modal dialogue system I. In: EUROSPEECH, pp. 1559–1562. Aalborg, Denmark (2001)

    Google Scholar 

  18. Nicolao, M., Cosi, P.: Comparing SPHINX vs. SONIC Italian Children Speech Recognition Systems. In: 7th Conference of the Italian Association of Speech Sciences (2011). Unpublished draft version

    Google Scholar 

  19. Schröder, M.: The SEMAINE API: towards a standards-based framework for building emotion-oriented systems. Advances in Human-Computer Interaction 2010(319406) (2010). DOI 10.1155/2010/319406

  20. Stiefelhagen, R., Ekenel, H., Fugen, C., Gieselmann, P., Holzapfel, H., Kraft, F., Nickel, K., Voit, M., Waibel, A.: Enabling multimodal human-robot interaction for the Karlsruhe humanoid robot. pp. 840–851 (2007)

    Google Scholar 

  21. Tesser, F., Zovato, E., Nicolao, M., Cosi, P.: Two Vocoder Techniques for Neutral to Emotional Timbre Conversion. In: Y. Sagisaka, K. Tokuda (eds.) 7th Speech Synthesis Workshop (SSW), pp. 130–135. ISCA, Kyoto, Japan (2010)

    Google Scholar 

  22. Zen, H., Nose, T., Yamagishi, J., Sako, S., Masuko, T., Black, A., Tokuda, K.: The HMM-based speech synthesis system (HTS) version 2.0. In: Proc. of ISCA SSW6, pp. 294–299 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ivana Kruijff-Korbayová .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this paper

Cite this paper

Kruijff-Korbayová, I. et al. (2011). An Event-Based Conversational System for the Nao Robot. In: Delgado, RC., Kobayashi, T. (eds) Proceedings of the Paralinguistic Information and its Integration in Spoken Dialogue Systems Workshop. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1335-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-1335-6_14

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-1334-9

  • Online ISBN: 978-1-4614-1335-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics