User Interaction in Mobile Navigation Applications

  • Kristiina Jokinen
Part of the Lecture Notes in Geoinformation and Cartography book series (LNGC)


The chapter focuses on cooperation and interaction in multimodal route navigation and provides an overview of the advantages and disadvantages of multimodal interaction in location-based services in general. The goal of the research has been to study methods and techniques for richer human-computer interaction, and to investigate interconnection and user preferences concerning speech and tactile input modalities on a route navigation task. The chapter also surveys the work on a mobile navigation application which allows the user to query public transportation routes using speech and pen pointing gestures. The first version of the PDA-based navigation application, MUMS, has been developed with the Helsinki City public transportation as the domain, and the user can ask timetable and navigation information either by natural language questions or clicking on the map. On the basis of user studies, we also discuss the individual modalities and their influence in interactive applications.


Cognitive Load Railway Station Dialogue System Multimodal Interface Multimodal Interaction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Allwood, J. (1976): Linguistic Communication as Action and Cooperation. Department of Linguistics. University of Gothenburg. Gothenburg Monographs in Linguistics 2.Google Scholar
  2. Andrews, T., Broughton, M., Estival, D. (2006): Implementing an Intelligent Multimedia Presentation Planner using an Agent Based Architecture. In Procs. of the Workshop on Multimodal Dialogues. Int Conf on Intelligent User Interfaces, Sydney, Australia.Google Scholar
  3. Baddeley, A.D. (1992): Working Memory. Science,255:556—559.CrossRefGoogle Scholar
  4. Belvin, R., Burns, R., Hein, C. (2001): Development of the HRL Route Navigation Dialogue System. In Procs of the 1 st Int Conf on Human Language Tech Research, paper H01-1016.Google Scholar
  5. Berthold, A., Jameson, A. (1999): Interpreting symptoms of cognitive load in speech input. In Kay, J. ed. Procs of the 7 th Int Conf on User Modeling (UM99), Springer, Wien, 235-244.Google Scholar
  6. Bolt, R.A. (1980): Put-that-there: Voice and gesture at the graphic interface. Computer Graphics, 14(3): 262-270.CrossRefGoogle Scholar
  7. Cheng, H., Cavedon, L., Dale, R. (2004): Generating Navigation Information Based on the Driver’s Route Knowledge. In Gamböck, B. and Jokinen, K. eds. Procs of the DUMAS Final Workshop Robust and Adaptive Information Processing for Mobile Speech Interfaces, COLING Satellite Workshop, Geneva, Switzerland, 31–38.Google Scholar
  8. Cheyer, A., Julia, L. (1995): Multimodal Maps: An Agent-based Approach. In Procs of Int Conf on Cooperative Multimodal Communication (CMC/95), Eindhoven, The Netherlands.Google Scholar
  9. Clark, H., Schaefer, E.F. (1989): Contributing to Discourse. Cognitive Science, 13:259-294.CrossRefGoogle Scholar
  10. Clark, H., Wilkes-Gibbs, D. (1986): Referring as a collaborative process. Cognition, 22:1–39.CrossRefGoogle Scholar
  11. Cohen, P.R., Johnston, M., McGee, D.R., Oviatt, S.L., Pittman, J., Smith, I., Chen, L., Clow, J. (1997): QuickSet: Multimodal interaction for distributed applications. Procs of the 5 th Int Multimedia Conf (Multimedia ’97), ACM Press, Seattle, WA, 31—40.Google Scholar
  12. Cohen, P.R., Levesque, H.J. (1991): Teamwork. Nous, 25(4):487-512.Google Scholar
  13. Cowan, N. (2001): The Magical Number of 4 in Short-Term Memory: A Reconsideration of Mental Storage Capacity. Behavioural and Brain Sciences, 24(1): 87-185.CrossRefGoogle Scholar
  14. Dale, R., Geldof, S., Prost, J-P. (2005): Using Natural Language Generation in Automatic Route Description. Journal of Research and Practice in Information Technology, 37(1): 89-105.Google Scholar
  15. Danieli M., Gerbino E. (1995): Metrics for Evaluating Dialogue Strategies in a Spoken Language System. Working Notes, AAAI Spring Symposium Series, Stanford University.Google Scholar
  16. Dey, AK (2001): Understanding and using context. Personal and Ubiquitous Computing 5:20–24CrossRefGoogle Scholar
  17. Gibbon, D., Mertins I., Moore R., (Eds.) (2000): Handbook of Multimodal and Spoken Dialogue Systems; Resources, Terminology, and Product Evaluation. Kluwer Academic Publishers.Google Scholar
  18. Granström, B., (Ed). (2002): Multimodality in Language and Speech Systems. Dordrecht: Kluwer.Google Scholar
  19. Grice, H.P. (1975): Logic and Conversation. In Cole, P. and Morgan, J.L. eds. Syntax and Semantics. Vol 3: Speech Acts. Academic Press.Google Scholar
  20. Grosz, B., Sidner, C.L. (1990): Plans for discourse. In Cohen, P.R, Morgan, J. and Pollack M.E. eds. Intentions in Communication. MIT Press.Google Scholar
  21. Habel, C. (2003): Incremental Generation of Multimodal Route Instructions. In Natural Language Generation in Spoken and Written Language, AAAI Spring Symposium. Palo Alto.Google Scholar
  22. Harnard, S (1990): The Symbol Grounding Problem. Physical, D 42: 335-346.CrossRefGoogle Scholar
  23. Heeman, P. A., Hirst, G. (1995): Collaborating on referring expressions. Computational. Linguistics. 21(3): 351-382.Google Scholar
  24. Hurtig, T. (2005): Multimodaalisen informaation hyödyntöminen reitinopastusdialogeissa. (”Utilizing Multimodal Information in Route Guidance Dialogues”). Helsinki University of Technology. Department of Electrical and Communication Engineering.Google Scholar
  25. Hurtig, T., Jokinen, K. (2005): On Multimodal Route Navigation in PDAs. In Procs of the 2 nd Baltic Conf on Human Language Technologies, Tallinn, Estonia, 261-266.Google Scholar
  26. Hurtig, T., Jokinen, K. (2006): Modality fusion in a route navigation system. In Procs of the IUI 2006 Workshop on Effective Multimoda Dialoguel Interfaces, 19-24Google Scholar
  27. Ikonen, V., Anttila, V., Petökoski-Hult, T., Sotamaa, O., Ahonen, A., Schirokoff, A., Kaasinen, E. (2002): Key Usability and Ethical Issues in the NAVI programme (KEN). Deliverable 5. Adaptation of Technology and Usage Cultures. Part 1. Version 1.2. Available at: Scholar
  28. Johnston, M. (1998): Unification-based multimodal parsing. Procs of the 36 th Annual Meeting on Association for Computational Linguistics, 624-630, Montreal, Canada.Google Scholar
  29. Johnston, M., Cohen, P.R., McGee, D., Oviatt, S., Pittman, J. and Smith, I. (1997): Unification-based multimodal integration. In Procs of the 8 th Conf on European Chapter of the Association for Computational Linguistics, 281-288, Madrid, Spain.Google Scholar
  30. Johnston, M., Bangalore, S., Vasireddy, G., Stent, A., Ehlen, P., Walker, M., Whittaker, S., Maloor, P. (2002): MATCH: an architecture for multimodal dialogue systems. Procs of the 40 th Annual Meeting on Association for Computational Linguistics, Philadelphia, 376-383.Google Scholar
  31. Jokinen, K. (2004): Communicative Competence and Adaptation in a Spoken Dialogue System. Procs of the 8 th Int Conf on Spoken Language Processing (ICSLP)CD-Rom Jeju, Korea.Google Scholar
  32. Jokinen, K., Hurtig, T. (2006): User Expectations and Real Experience on a Multimodal Interactive System. Procs of Interspeech-2006, Pittsburgh, US.Google Scholar
  33. Jokinen, K., A. Kerminen, M. Kaipainen, T. Jauhiainen, G. Wilcock, M. Turunen, J. Hakulinen, J. Kuusisto, K. Lagus (2002): Adaptive Dialogue Systems: Interaction with Interact. In Jokinen, K., and McRoy, S. eds. Procs of the 3 rd SIGDial Workshop on Discourse and Dialogue, Philadelphia, USA, 64 – 73.Google Scholar
  34. Jokinen, K., Raike, A. (2003): Multimodality – technology, visions and demands for the future. Procs of the 1 st Nordic Symposium on Multimodal Interfaces, Copenhagen, Denmark.Google Scholar
  35. Kaasinen, E. (2003): User needs for location-aware mobile services. Personal and Ubiquitous Computing, 7:70–79.CrossRefGoogle Scholar
  36. Kaasinen, E., Luoma, J., Penttinen M., Petökoski-Hult, T., Södergård, R. (2001): Basics of Human-Centered Design in Personal Navigation, NAVI programme report. Available at: Scholar
  37. Klüter, A., Ndiaye, A., Kirchmann, H. (2000): VerbMobil from a Software Engineering Point of View: System Design and Software Integration. In Wahlster, W. ed. VerbMobil: Foundations of Speech-to-Speech Translation, Springer, Berlin.Google Scholar
  38. Kray, C., Laakso, K., Elting, C., Coors, V. (2003): Presenting route instructions on mobile devices. InProcs of IUI 03, 117—124. Miami Beach, FL, ACM Press.CrossRefGoogle Scholar
  39. Lee, A. (2005): The Effect of Familiarity on Knowledge Synchronisation. In Heylen, D. and Marsella, S. eds. Mind Minding Agents. AISB Symposium on Mind-Minding Agents (track of Virtual Social Agents), University of Hertfordshire, Hatfield, England.Google Scholar
  40. Levin, S. L., Mohamed, F. B. and Platek, S. M. (2005): Common ground for spatial cognition? A behavioral and fMRI study of sex differences in mental rotation and spatial working memory. Evolutionary Psychology, 3: 227-254.Google Scholar
  41. Maaβ, W. (1995): From Visual Perception to Multimodal Communication: Incremental Route Descriptions. In Mc Kevitt, P. ed. Integration of Natural Language and Vision Processing: Computational Models and Systems, Vol. 1, 68-82. Kluwer, Dordrecht.Google Scholar
  42. Maaβ, W., Wazinski, P., Herzog, G. (1993): VITRA GUIDE: Multimodal Route Descriptions for Computer Assisted Vehicle Navigation. In Procs of the 6 th Int Conf on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems IEA/AIE-93, 144—147. Edinburgh, Scotland.Google Scholar
  43. Martin, J.-C. (1997): Towards ‘intelligent’ cooperation between modalities: the example of multimodal interaction with a map. In Procs of the IJCAI’97 Workshop on Intelligent Multimodal Systems.Google Scholar
  44. Maybury, M.T., (Ed.) (1993): Intelligent Multimedia Interfaces. AAAI Press, Menlo Park, CA.Google Scholar
  45. Maybury, M. T., Wahlster, W. 1998, eds. Readings in Intelligent User Interfaces. Morgan Kaufmann, San Francisco, CA.Google Scholar
  46. Miller, G. A. (1956): The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63: 81-97.CrossRefGoogle Scholar
  47. Mousavi, S.Y., Low, R., Sweller, J. (1995): Reducing Cognitive Load by Mixing Auditory and Visual Presentation Modes. Journal of Educational Psychology, 87:2, 319-334.CrossRefGoogle Scholar
  48. Mueller, C. Groymann-Hutter, B., Jameson, A., Rummer, R., Wittig, F. (2001): Recognizing time pressure and cognitive load on the basis of speech: An experimental study. In Bauer, M., Vassileva, J., Gmytrasiewicz, P. eds. Procs of the 8 th Int Conf on User Modeling (UM2000), 24–33. Springer, Berlin.Google Scholar
  49. Möller, S. (2002): A New Taxonomy for the Quality of Telephone Services Based on Spoken Dialogue Systems. In Jokinen, K and McRoy, S. eds. Procs of the 3 rd SIGDial Workshop on Discourse and Dialogue, Philadelphia, USA.Google Scholar
  50. Neal, J.G., Shapiro, S.C. (1991): Intelligent Multi-media Interface Technology. In Sullivan, J.W., Tyler, S.W. eds. Intelligent User Interfaces, Frontier Series, ACM Press, New York 11-43.Google Scholar
  51. Nigay, L., Coutaz, J. (1995): A generic platform for addressing the multimodal challenge. Procs of ACM-CHI’95 Conf on Human Factors in Computing Systems, ACM Press, 98-105.Google Scholar
  52. Oviatt, S. L. (1997): Multimodal interactive maps: Designing for human performance. Human-Computer Interaction Special issue on Multimodal Interfaces, 12: 93-129Google Scholar
  53. Oviatt, S., De Angeli, A., Kuhn, K. (1997): Integration and synchronization of input modes during multimodal human-computer interaction. Human Factors in Computing Systems (CHI’97), 451-422, ,n>ACM Press, New York.Google Scholar
  54. Oviatt, S., Cohen, P.R., Wu, L., Vergo, J., Duncan, L., Suhm, B., Bers, J., Holzman, T., Winograd, T., Landay, J., Larson, J., Ferro, D. (2000): Designing the User Interface for Multimodal Speech and Pen-based Gesture Applications: State-of-the-Art Systems and Future Research Directions. Human Computer Interaction, 15(4): 263-322.CrossRefGoogle Scholar
  55. Oviatt, S., Coulston, R., Lunsford, R. (2004): When Do We Interact Multimodally? Cognitive Load and Multimodal Communication Patterns. In Procs of the 6 th Int Conf on Multimodal Interfaces (ICMI 2004), Pennsylvania, USA.Google Scholar
  56. Robbins, D., Cutrell, E., Sarin, R., Horvitz, E. (2004): ZoneZoom: Map Navigation for Smartphones with Recursive View Segmentation. In Procs of AVI.Google Scholar
  57. Seneff, S., Hurley, E., Lau, R., Pao, C., Schmid, P., V. Zue (1998): GALAXY-II: A reference architecture for conversational system development. Procs of ICSLP 98. Sydney, Australia.Google Scholar
  58. Shepherd, G.M. (1988): Neurobiology. Oxford University Press, Oxford.Google Scholar
  59. Stenning, K., Oberlander, J. (1995): A cognitive theory of graphical and linguistic reasoning: logic and implementation, Cognitive Science, 19: 97-140.CrossRefGoogle Scholar
  60. Sutcliffe, A. (2000): On the effective use and reuse of HCI knowledge. ACM Transactions on Computer-Human Interaction, Special Issue on Human-Computer Interaction in the New Millennium, Part 2, 7(2):197 – 221.Google Scholar
  61. Tomko, M., Winter, S. (2006): Recursive Construction of Granular Route Directions. Journal of Spatial Science, 51(1): 101-115.Google Scholar
  62. Turunen, M., Hakulinen, J. (2003): Jaspis 2 - An Architecture For Supporting Distributed Spoken Dialogues. In Procs of the Eurospeech 2003, 1913—1916.Google Scholar
  63. Tversky, B. (2000): Multiple Mental Spaces. Plenary talk at the Int Conf on Rationality and Irrationality, Austria. Available at:∼ bt/space/papers/rationality.pdfGoogle Scholar
  64. Tversky, B. (2003): Places: Points, Planes, Paths and Portions. In van der Zee, E. and Slack, J. eds. Representing Direction in Language and Space. Oxford University Press. 132-143.Google Scholar
  65. Vernier, F., Nigay, L. (2000): A Framework for the Combination and Characterization of Output Modalities. In Procs of 7 th Int Workshop on Design, Specification and Verification of Interactive Systems (DSV-IS’2000),Limerick, Ireland, 32-48.Google Scholar
  66. Wahlster, W., Reithinger, N., Blocher, A. (2001): SmartKom: Multimodal Communication with a Life-I. Character. In Procs of Eurospeech2001, Aalborg, Denmark.Google Scholar
  67. Walker, M. A., Langkilde, I., Wright, J., Gorin, A., Litman, D.J. (2000): Learning to Predict Problematic Situations in a Spoken Dialogue System: Experiments with How May I Help You? In Procs of NAACL’00, Seattle, US, 210-217.Google Scholar
  68. Wasinger, R., Oliver, D., Heckmann, D., Braun, B., Brandlarm, B., Stahl, C. (2003): Adapting Spoken and Visual Output for a Pedestrian Navigation System, based on given Situational Statements. Workshop on Adaptivity and User Modelling in Interactive Software Systems (ABIS), 343-346.Google Scholar
  69. Wickens, C.D., Holland (2000): Engineering Psychology and Human Performance. Prentice-Hall, Upper Saddle River, NJ.Google Scholar
  70. Yang, J., Stiefelhagen, R., Meier, U., Waibel, A. (1998): Visual tracking for multimodal human computer interaction. Human Factors in Computing Systems, In Procs of CHI’98, ACM Press, 140-147.Google Scholar
  71. Yankelovich, N. (1996): How do users know what to say? Interactions, 3(6): 32-43.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Kristiina Jokinen
    • 1
  1. 1.Department of Computer SciencesUniversity of TampereTampere

Personalised recommendations