The “FAME” Interactive Space

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3869)


This paper describes the “FAME” multi-modal demonstrator, which integrates multiple communication modes – vision, speech and object manipulation – by combining the physical and virtual worlds to provide support for multi-cultural or multi-lingual communication and problem solving.

The major challenges are automatic perception of human actions and understanding of dialogs between people from different cultural or linguistic backgrounds. The system acts as an information butler, which demonstrates context awareness using computer vision, speech and dialog modeling. The integrated computer-enhanced human-to-human communication has been publicly demonstrated at the FORUM2004 in Barcelona and at IST2004 in The Hague.

Specifically, the “Interactive Space” described features an “Augmented Table” for multi-cultural interaction, which allows several users at the same time to perform multi-modal, cross-lingual document retrieval of audio-visual documents previously recorded by an “Intelligent Cameraman” during a week-long seminar.


Interactive Space Speech Recognition System Word Error Rate Broadcast News Topic Detection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Gieselmann, P., Denecke, M.: Towards multimodal interaction within an intelligent room. In: Proc. Eurospeech 2003, Geneva, Switzerland, ISCA (2003)Google Scholar
  2. 2.
    Crowley, J.L., Reignier, P.: Dynamic composition of process federations for context aware perception of human activity. In: Proc. International Conference on Integration of Knowledge Intensive Multi-Agent Systems, KIMAS 2003, vol. 10. IEEE, Los Alamitos (2003)Google Scholar
  3. 3.
    Consorci Universitat Internacional Menéndez Pelayo de Barcelona: Tecnologies de la llengua: darrers avenços and Llenguatge, cognició i evolució (2004),
  4. 4.
    FORUM2004: Universal Forum of Cultures (2004),
  5. 5.
    Lachenal, C., Coutaz, J.: A reference framework for multi-surface interaction. In: Proc. HCI International 2003. Crete University Press, Greece, Crete (2003)Google Scholar
  6. 6.
    Metze, F., Fügen, C., Pan, Y., Waibel, A.: Automatically Transcribing Meetings Using Distant Microphones. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, Philadelphia, PA, USA. IEEE, Los Alamitos (2005)Google Scholar
  7. 7.
    Center, S.A.: Open Agent Architecture 2.3.0. (2003),
  8. 8.
    Rey, G., Crowley, J.L., Coutaz, J., Reignier, P.: Perceptual components for context aware computing. In: Borriello, G., Holmquist, L.E. (eds.) UbiComp 2002. LNCS, vol. 2498, p. 117. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  9. 9.
    Allen, J.: Towards a general theory of action and time. Artificial Intelligence 13 (1984)Google Scholar
  10. 10.
    Caporossi, A., Hall, D., Reignier, P., Crowley, J.: Robust visual tracking from dynamic control of processing. In: PETS 2004, Workshop on Performance Evaluation for tracking and Surveillance, ECCV 2004. Czech Republic, Prague (2004)Google Scholar
  11. 11.
    Lamel, L., Gauvain, J., Eskenazi, M.: BREF, a large vocabulary spoken corpus for French. In: Proc. Eurospeech 1991, Geneva, Switzerland (1991)Google Scholar
  12. 12.
    Surcin, S., Stiefelhagen, R., McDonough, J.: Evaluation packages for the first chil evaluation campaign. CHIL Deliverable D4.2 (2005),
  13. 13.
    Bertran, M., Gatius, M., Rodriguez, H.: FameIr, multimedia information retrieval shell. In: Proceedings of JOTRI 2003. Universidad Carlos III, Madrid, Spain (2003)Google Scholar
  14. 14.
    The Global WordNet Association: EuroWordNet (1999),
  15. 15.
    Arranz, V., Bertran, M., Rodriguez, H.: Which is the current topic? what is relevant to it? a topic detection retrieval and presentation system. FAME Deliverable D7.2 (2003)Google Scholar
  16. 16.
    Ware, C., Balakrishnan, R.: Reaching for objects in vr displays: Lag and frame rate. ACM Transactions on Computer-Human Interaction (TOCHI) 1, 331–356 (1994)CrossRefGoogle Scholar
  17. 17.
    Watson, B., Walker, N., Ribarsky, W., Spaulding, V.: The effects of variation of system responsiveness on user performance in virtual environments. Human Factors, Special Section on Virtual Environments 3, 403–414 (1998)Google Scholar
  18. 18.
    Liang, J., Shaw, C., Green, M.: On temporal-spatial realism in the virtual reality environment. In: ACMsymposium on User interface software and technology, Hilton Head, South Carolina, pp. 19–25 (1991)Google Scholar
  19. 19.
    Denecke, M.: Rapid prototyping for spoken dialogue systems. In: Proceedings of the 19th International Conference on Computational Linguistics, Taiwan (2002)Google Scholar
  20. 20.
    Holzapfel, H.: Towards development of multilingual spoken dialogue systems. In: Proceedings of the 2nd Language and Technology Conference (2005)Google Scholar
  21. 21.
    Cettolo, M., Brugnara, F., Federico, M.: Advances in the automatic transcription of lectures. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), Montreal, Canada. IEEE, Los Alamitos (2004)Google Scholar
  22. 22.
    Lamel, L., Schiel, F., Fourcin, A., Mariani, J., Tillmann, H.: The translanguage english database (ted). In: Proc. ICSLP 1994, Yokohama, Japan, ISCA, pp. 1795–1798 (1994)Google Scholar
  23. 23.
    Coutaz, J., et al.: Evaluation of the fame interaction techniques and lessons learned. FAME Deliverable D8.2 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  1. 1.Universität Karlsruhe (TH)Germany
  2. 2.Institut National Polytechnique de Grenoble (INPG)France
  3. 3.Université Joseph Fourier (UJF)GrenobleFrance
  4. 4.Universitat Politecnica de Catalunya (UPC)BarcelonaSpain

Personalised recommendations