Application of Speech Technology in Vehicles

  • Fang Chen
  • Ing-Marie Jonsson
  • Jessica Villing
  • Staffan Larsson


Speech technology has been regarded as one of the most interesting technologies for operating in-vehicle information systems. Cameron [1] has pointed out that under at least one of the four criteria that people are using speech system more likely. These four criteria are the following: (1) They are offered no choice; (2) it corresponds to the privacy of their surroundings; (3) their hands or eyes are busy on another task; and (4) it is quicker than any other alternatives. For driver, driving is a typical “hands and eyes are busy” task. In most of the situations, the driver is the only person inside the car, or with some passengers who know each other well, so the “privacy of surroundings” criteria are also met. There are long histories of interests of applying speech technology into controlling in-vehicle information system. Up to now, some of the commercial cars have already equipped with imbedded speech technology. In 1996, however, the S-Class car of Mercedes-Benz introduced Linguatronic, the first generation of in-car speech system for anybody who drives a car [2]. Since then, the number of in-vehicle applications using speech technology is increasing [3].


Speech Recognition Secondary Task Speech Enhancement Dialogue System Driving Simulator 


  1. 1.
    Cameron, H. (2000). Speech at the interface. In: Workshop on "Voice Operated Telecom Services". Ghent, Belgium, COST 249.Google Scholar
  2. 2.
    Heisterkamp, P. (2001). Linguatronic - Product-level speech system for Mercedes-Benz Cars. In: Proc. HLT, San Diego, CA, USA.Google Scholar
  3. 3.
    Hamerich, S. W. (2007). Towards advanced speech driven navigation systems for cars. In: 3rd IET Int. Conf. on Intelligent Environments, IE07, Sept. 24-25, Ulm, Germany.Google Scholar
  4. 4.
    Goose, S., Djennane, S. (2002). WIRE3: Driving around the information super-highway. Pers. Ubiquitous Comput., 6, 164-175.CrossRefGoogle Scholar
  5. 5.
    Nass, C., Jonsson, I.-M., Harris, H., Reaves, B., Endo, J., Brave, S., Takayama, L. (2005). Improving automotive safety by pairing driver emotion and car voice emotion. In: CHI '05 Extended Abstracts on Human factors in Computing Systems. ACM Press, New York, NY.Google Scholar
  6. 6.
    Nass, C., Brave, S. B. (2005). Wired for Speech: How Voice Activates and Enhance the Human Computer Relationship. MIT Press, Cambridge, MA.Google Scholar
  7. 7.
    Bishop, R. (2005). Intelligent Vehicle Technology and Trends. Artech House, Boston.Google Scholar
  8. 8.
    van de Weijer, C. (2008). Keynote 1: Dutch connected traffic in practice and in the future. In: IEEE Intelligent Vehicles Sympos. Eindhoven, The Netherlander, June 4-6.Google Scholar
  9. 9.
    Gardner, M. (2008). Nomadic device integration in Aide. In: Proc. AIDE Final Workshop and Exhibition. April 15-16, Goteborg, Sweden.Google Scholar
  10. 10.
    Johansson, E., Engstrom, J., Cherri, C., Nodari, E., Toffetti, A., Schindhelm, R., Gelau, C. (2004). Review of existing techniques and metrics for IVIS and ADAS assessment. EU Information Society Technology (IST) program IST-1-507674-IP: Adaptive Integrated Driver-Vehicle Interface (AIDE).Google Scholar
  11. 11.
    Lee, J. D., Caven, B., Haake, S., Brown, T. L. (2001). Speech-based interaction with in- vehicle computer: The effect of speech-based e-mail on driver's attention to the roadway. Hum. Factors, 43, 631-640.CrossRefGoogle Scholar
  12. 12.
    Barón, A., Green, P. (2006). Safety and Usability of Speech Interfaces for In-Vehicle Tasks while Driving: A Brief Literature Review. Transportation Research Institute (UMTRI), The University of Michigan.Google Scholar
  13. 13.
    Saad, F., Hjalmdahl, M., Cañas, J., Alonso, M., Garayo, P., Macchi, L., Nathan, F., Ojeda, L., Papakostopoulos, V., Panou, M., Bekiaris. E. (2004). Literature review of behavioural effects. EU Information Society Technology (IST) program: IST-1-507674-IP, Adaptive Integrated Driver-Vehicle Interface (AIDE).Google Scholar
  14. 14.
    Treffner, P. J., Barrett, R. (2004). Hands-free mobile phone speech while driving degrades coordination and control. Transport. Res. F, 7, 229-246.CrossRefGoogle Scholar
  15. 15.
    Esbjornsson, M., Juhlin, O., Weilenmann, A. (2007). Drivers using mobile phones in traffic: An ethnographic study of interactional adaption. Int. J. Hum. Comput. Inter., Special Issue on: In-Use, In-Situ: Extending Field Research Methods, 22 (1), 39-60.Google Scholar
  16. 16.
    Jonsson, I.-M., Chen, F. (2006). How big is the step for driving simulators to driving a real car? In: IEA 2006 Congress, Maastricht, The Netherlands, July 10-14.Google Scholar
  17. 17.
    Chen, F., Jordan, P. (2008). Zonal adaptive workload management system: Limiting sec- ondary task while driving. In: IEEE Intelligent Transportation System, IVs' 08, Eindhoven, The Netherlander, June 2-6.Google Scholar
  18. 18.
    Esbjörnsson, M., Brown, B., Juhlin, O., Normark, D., Östergren, M., Laurier, E. (2006). Watching the cars go round and round: designing for active spectating. In: Proc. SIGCHI Conf. on Human Factors in computing systems, Montréal, Québec, Canada, 2006.Google Scholar
  19. 19.
    Recarte, M. A., Nunes, L. M. (2003). Mental workload while driving: Effects on visual search, discrimination, and decision making. J. Exp. Psychol.: Appl., 9 (2), 119-137.Google Scholar
  20. 20.
    Victor, T. W., Harbluk, J. L., Engstrom, J. A. (2005). Sensitivity of eye-movement measures to in-vehicle task difficulty. Transport. Res. Part F, 8 (2), 167-190.CrossRefGoogle Scholar
  21. 21.
    Hart, S. G., Staveland, L. E. (1988). Development of NASA-TLX (task Load Index): Results of empirical and theoretical research. In: Meshkati (ed) Human Mental Workload, P. A. H. a. N. Elsevier Science Publishers B.V., North-Holland, 139-183.Google Scholar
  22. 22.
    Pauzie, A., Sparpedon, A., Saulnier, G. (2007). Ergonomic evaluation of a prototype guidance system in an urban area. Discussion about methodologies and data collection tools, in Vehicle Navigation and Information Systems Conference. In: Proc. in conjunction with the Pacific Rim TransTech Conf. 6th Int. VNIS. "A Ride into the Future", Seattle, WA, USA.Google Scholar
  23. 23.
    Wang, E., Chen, F. (2008). A new measurement for simulator driving performance in situation without interfere from other vehicles, International Journal of Transportation Systems F. AEI 2008. In: Applied Human Factors and Ergonomics 2008, 2nd Int. Conf., Las Vegas, USA, July 14-17.Google Scholar
  24. 24.
    Wilson, G. F., Lambert, J. D., Russell, C. A. (2002). Performance enhancement with real- time physiologically controlled adaptive aiding. In: HFA Workshop: Psychophysiological Application to Human Factors, March 11-12, 2002. Swedish Center for Human Factors in Aviation.Google Scholar
  25. 25.
    Wilson, G. F. (2002). Psychophysiological test methods and procedures. In: HFA Workshop: Psychophysiological Application to Human Factors, March 11-12, 2002. Swedish Center for Human Factors in Aviation.Google Scholar
  26. 26.
    Lai, J., Cheng, K., Green, P., Tsimhoni, O. (2001). On the road and on the web? Comprehension of synthetic and human speech while driving. In: Conf. on Human Factors and Computing Systems, CHI 2001, 31 March-5 April 2001. Seattle, Washington, USA.Google Scholar
  27. 27.
    Hermansky, H., Morgan, N. (1994). RASTA processing of speech. IEEE Trans. Speech Audio Process., 2 (4), 578-589.CrossRefGoogle Scholar
  28. 28.
    Kermorvant, C. (1999). A comparison of noise reduction techniques for robust speech recognition. IDIAP research report, IDIAP-RR-99-10, Dalle Molle Institute for perceptual Artificial Intelligence, Valais, Switzerland.Google Scholar
  29. 29.
    Furui, S. (1986). Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans. Acoustics, Speech Signal Process., 34 (1), 52-59.Google Scholar
  30. 30.
    Mansour, D., Juang, B.-H. (1989). The short-time modified coherence representation and noisy speech recognition. IEEE Trans. Acoustics Speech Signal Process., 37 (6), 795-804.CrossRefGoogle Scholar
  31. 31.
    Hernando, J., Nadeu, C. (1997). Linear prediction of the one-sided autocorrelation sequence for noisy speech recognition. IEEE Trans. Speech Audio Process., 5 (1), 80-84.CrossRefGoogle Scholar
  32. 32.
    Chen, J., Paliwal, K. K., Nakamura, S. (2003). Cepstrum derived from differentiated power spectrum for robust speech recognition. Speech Commun., 41 (2-3), 469-484.CrossRefGoogle Scholar
  33. 33.
    Yuo, K.-H., Wang, H.-C. (1998). Robust features derived from temporal trajectory filtering for speech recognition under the corruption of additive and convolutional noises. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, April 21-24, 1997, Munich, Bavaria, Germany.Google Scholar
  34. 34.
    Yuo, K.-H., Wang, H.-C. (1999). Robust features for noisy speech recognition based on temporal trajectory filtering of short-time autocorrelation sequences. Speech Commun., 28, 13-24.CrossRefGoogle Scholar
  35. 35.
    Lebart, K., Boucher, J. M. (2001). A new method based on spectral subtraction for speech dereverberation. Acta Acoustic ACUSTICA, 87, 359-366.Google Scholar
  36. 36.
    Lee, C.-H., Soong, F. K., Paliwal, K. K. (1996). Automatic Speech and Speaker Recognition. Kluwer, Norwell.CrossRefGoogle Scholar
  37. 37.
    Gales, M. J. F., Young, S. J. (1995). Robust speech recognition in additive and convolutional noise using parallel model combination. Comput. Speech Lang., 9, 289-307.CrossRefGoogle Scholar
  38. 38.
    Gales, M. J. F., Young, S. J. (1996). Robust continuous speech recognition using parallel model combination. IEEE Trans. Speech Audio Process., 4 (5), 352-359.CrossRefGoogle Scholar
  39. 39.
    Acero, A., Deng, L., Kristjansson, T., Zhang, J. (2000). HMM adaptation using vector Taylor series for noisy speech recognition. In: Proc. ICASSP, June 05-09, 2000, Istanbul, Turkey.Google Scholar
  40. 40.
    Kim, D. Y., Un, C. K., Kim, N. S. (1998). Speech recognition in noisy environments using first-order vector Taylor series. Speech Commun., 24 (1), 39-49.CrossRefGoogle Scholar
  41. 41.
    Visser, E., Otsuka, M., Lee, T.-W. (2003). A spatio-temporal speech enhancement scheme for robust speech recognition in noisy environments. Speech Commun., 41, 393-407.CrossRefGoogle Scholar
  42. 42.
    Farahani, G., Ahadi, S. M., Homayounpour, M. M. (2007). Features based on filtering and spectral peaks in autocorrelation domain for robust speech recognition. Comput. Speech Lang., 21, 187-205.CrossRefGoogle Scholar
  43. 43.
    Choi, E. H. C. (2004). Noise robust front-end for ASR using spectral subtraction, spectral flooring and cumulatie distribution mapping. In: Proc. 10th Australian Int. Conf. on Speech Science & Technology. Macquarie University, Sydney, December 8-10.Google Scholar
  44. 44.
    Fernandez, R., Corradini, A., Schlangen, D. Stede, M. (2007). Towards reducing and man- aging uncertainty in spoken dialogue systems. In: The Seventh International Workshop on Computational Semantics (IWCS-7). Tilburg, The Netherlands, Jan 10-12.Google Scholar
  45. 45.
    Skantze, G. (2005). Exploring human error recovery strategies: Implications for spoken dialogue systems. Speech Commun., 45 (3), 325-341.CrossRefGoogle Scholar
  46. 46.
    Gellatly, A. W. a. D., T. A. (1998). Speech recognition and automotive applications: using speech to perform in-vehicle tasks. In: Proc. Human Factors and Ergonomics Society 42nd Annual Meeting, October 5-9, 1998, Hyatt Regency Chicago, Chicago, Illinois.Google Scholar
  47. 47.
    Greenberg, J., Tijenna, L. Curn, R., Artz, B., Cathey, L., Grant P, Kochhar, D., Koxak, K., Blommer, M. (2003). Evaluation of driver distraction using an event detection paradigm. In: Proc. Transportation Research Board Annual Meetings, January 12-16, 2003, Washington, DC.Google Scholar
  48. 48.
    McCallum, M. C., Campbell, J. L., Richman, J. B., Brown, J. (2004). Speech recognition and in-vehicle telematics devices; Potential reductions in driver distraction. Int. J. Speech Technol., 7, 25-33.CrossRefGoogle Scholar
  49. 49.
    Bernsen, N. O., Dybkjaer, L. (2002). A multimodal virtual co-driver's problems with the driver. In: ISCA Tutorial and Research Workshop on Multi-Modal Dialogue in Mobile Environments Proceedings. Kloster Irsee, Germany, June 17-19.Google Scholar
  50. 50.
    Geutner, P., Steffens, F. Manstetten, D. (2002). Design of the VICO Spoken Dialogue System: Evaluation of User Expectations by Wizard-of-Oz Experiments. In: Proc. 3rd Int. Conf. on Language Resources and Evaluation (LREC 2002). Las Palmas, Spain, May.Google Scholar
  51. 51.
    Villing, J.a.L., S. (2006). Dico: A multimodal menu-based in-vehicle dialogue system. In: The 10th Workshop on the Semantics and Pragmatics of Dialogue, brandial'06 (Sem-Dial 10). Potsdam, Germany, Sept 11-13.Google Scholar
  52. 52.
    Larsson, S. (2002). Issue-based dialogue management. PhD Thesis, Goteborg University.Google Scholar
  53. 53.
    Bringert, B., Ljunglöf, P., Raanta, A.and Cooper, R. (2005). Multimodal dialogue systems grammars. In: The DIALOR'05, 9th Workshop on the Semantics and Pragmatics of Dialogue. Nancy (France), June 9-11, 2005.Google Scholar
  54. 54.
    Oviatt, S. (2004). When do we interact multimodally? Cognitive load and multimodal communication patterns. In: Proc. 6th Int. Conf. on Multimodal Interfaces. Pennsylvania, Oct 14-15.Google Scholar
  55. 55.
    Bernsen, O., Dybkjaer, L. (2001). Exploring natural interaction in the car. In: Proc. CLASS Workshop on Natural Interactivity and Intelligent Interactive Information Representation, Verona, Italy, Dec 2001.Google Scholar
  56. 56.
    Esbjörnsson, M., Juhlin, O., Weilenmann, A. (2007). Drivers using mobile phones in traffic: An ethnographic study of interactional adaption. Int. J. Hum Comput Interact., Special Issue on In-Use, In-Situ: Extending Field Research Meth., 22 (1), 39-60.Google Scholar
  57. 57.
    Jonsson, I.-M., Nass, C., Endo, J., Reaves, B., Harris, H., Ta, J. L., Chan, N., Knapp, S. (2004). Don't blame me I am only the driver: Impact of blame attribution on attitudes and attention to driving task. In: CHI '04 extended Abstracts on Human Factors in Computing Systems, Vienna, Austria.Google Scholar
  58. 58.
    Jonsson, I.-M., Zajicek, M. (2005). Selecting the voice for an in-car information system for older adults. In: Human Computer Interaction Int. Las Vegas, Nevada, USA.Google Scholar
  59. 59.
    Jonsson, I.-M., Zajicek, M., Harris, H., Nass, C. I. (2005). Thank you I did not see that: In-car speech-based information systems for older adults. In: Conf. on Human Factors in Computing Systems. ACM Press, Portland, OR.Google Scholar
  60. 60.
    Jonsson, I. M., Nass, C. I., Harris, H., Takayama, L. (2005). Got Info? Examining the con- sequences of inaccurate information systems. In: Int. Driving Symp. on Human Factors in Driver Assessment, Training, and Vehicle Design. Rockport, Maine.Google Scholar
  61. 61.
    Gross, J. J. (1999). Emotion and emotion regulation. In: John, L. A. P. O. P. (ed) Handbook of Personality: Theory and Research. New York: Guildford, 525-552.Google Scholar
  62. 62.
    Picard, R. W. (1997). Affective Computing. MIT Press, Cambridge, MA.Google Scholar
  63. 63.
    Clore, G. C., Gasper, K. (2000). Feeling is believing: Some affective influences on belief. In: Frijda, A. S. R. M. N. H., Bem, S. (eds) Emotions and Beliefs: How Feelings Influence Thoughts, Editions de la Maison des Sciences de l'Homme and Cambridge University Press (jointly published), Paris/Cambridge, 10-44.Google Scholar
  64. 64.
    Gross, J. J. (1998). Antecedent- and response-focused emotion regulation: Divergent con- sequences for experience, expression, and physiology. J. Personality Social Psychol., 74, 224-237.CrossRefGoogle Scholar
  65. 65.
    Davidson, R. J. (1994). On emotion, mood, and related affective constructs. In: Davidson, P. E. R. J. (ed) The Nature of Emotion, Oxford University Press, New York, 51-55.Google Scholar
  66. 66.
    Bower, G. H., Forgas, J. P. (2000). Affect, memory, and social cognition. In: Eich, J. F. K. E., Bower, G. H., Forgas, J. P., Niedenthal, P. M. (eds) Cognition and Emotion. Oxford University Press, Oxford, 87-168.Google Scholar
  67. 67.
    Groeger, J. A. (2000). Understanding Driving: Applying Cognitive Psychology to a Complex Everyday Task. Psychology Press, Philadelphia, PA.Google Scholar
  68. 68.
    Lunenfeld, H. (1989). Human factor considerations of motorist navigation and information systems. In: Proc. Vehicle Navigation and Information Systems, September 11-13, Toronto, Canada.Google Scholar
  69. 69.
    Srinivasan, R., Jovanis, P. (1997). Effect of in-vehicle route guidance systems on driver workload and choice of vehicle speed: Findings from a driving simulator experiment. In: Ian Noy, Y. (ed) Ergonomics and Safety of Intelligent Driver Interfaces, Lawrence Erlbaum Associates Inc., Publishers, Mahwah, New Jersey, 97-114.Google Scholar
  70. 70.
    Horswill, M., McKenna, F. (1999). The effect of interference on dynamic risk-taking judgments. Br. J. Psychol., 90, 189-199.CrossRefGoogle Scholar
  71. 71.
    Strayer, D., Drews, F., Johnston, W. (2003). Cell phone induced failures of visual attention during simulated driving. J. Exp. Psychol.: Appl., 9 (1), 23-32.Google Scholar
  72. 72.
    Merat, N., Jamson, A. H. (2005). Shut up I'm driving! Is talking to an inconsiderate passenger the same as talking on a mobile telephone. In: 3rd Int. Driving Symp.on Human Factors in Driver Assessment, Training, and Vehicle Design. Rockport, Maine.Google Scholar
  73. 73.
    Nass, C. et al. (2005). Improving automotive safety by pairing driver emotion and car voice emotion. In: CHI '05 Extended Abstracts on Human Factors in Computing Systems. ACM Press, New York, NY.Google Scholar
  74. 74.
    Brouwer, W. H. (1993). Older drivers and attentional demands: consequences for human factors research. In: Proc. Human Factors and Ergonomics Society-Europe, Chapter on Aging and Human Factors. Soesterberg, Netherlands, 93-106.Google Scholar
  75. 75.
    Ponds, R. W., Brouwer, W. H., Wolffelaar, P. C. (1988). Age differences in divided attention in a simulated driving task. J. Gerontol., 43 (6), 151-156.CrossRefGoogle Scholar
  76. 76.
    Zajicek, M., Hall, S. (1999). Solutions for elderly visually impaired people using the Internet. In: The 'Technology Push' and The User Tailored Information Environment, 5th Eur. Research Consortium for Informatics and Mathematics - ERCIM. 2000. Dagstuhl, Germany, November 28-December 1.Google Scholar
  77. 77.
    Zajicek, M.a.M., W. (2001). Speech output for older visually impaired adults. In: Blandford, A., Vanderdonckt, J., Gray, P. (eds) People and Computers XV - Interacting without Frontiers, Spring Verlag, 503-513.Google Scholar
  78. 78.
    Fiske, S., Taylor, S. (1991). Social Cognition. McGraw-Hill, New York, NY.Google Scholar
  79. 79.
    Lazarsfeld, P., Merton, R. (1948). Mass communication-popular taste and organized social action. In: Bryson, L. (ed) Institute for Religious and Social Studies, Nueva York.Google Scholar
  80. 80.
    Rogers, E., and Bhowmik, D. (1970). Homophily-Heterophily: Relational concepts for communication research. Public Opinion Q., 34, 523.CrossRefGoogle Scholar
  81. 81.
    Dulude, L. (2002). Automated telephone answering systems and aging. Behav. Inform. Technol., 21, 171-184.CrossRefGoogle Scholar
  82. 82.
    Van Der Laan, J., Heino, A., De Waard, D. (1997). A simple procedure for the assessment of acceptance of advanced transport telematics. Transport Res. C, 5 (1), 1-10.CrossRefGoogle Scholar
  83. 83.
    Dybkjær, L., Bernsen, N. O., Minker, W. (2004). Evaluation and usability of multimodal spoken language dialogue systems. Speech Commun., 43, 33-54.CrossRefGoogle Scholar
  84. 84.
    Graham, R., Aldridge, L., Carter, C., Lansdown, T. C. (1999). The design of in-car speech recognition interfaces for usability and user acceptance. In: Harris, D. (ed) Engineering Psychology and Cognitive Ergonomics, Ashgate, Aldershot, 313-320.Google Scholar
  85. 85.
    Larsen, L. B. (2003). Assessment of spoken dialogue system usability - what are we really measuring? In: 8th Eur. Conf. on Speech Communication and Technology - Eurospeech 2003. September 1-4, Geneva, Switzerland.Google Scholar
  86. 86.
    Zajicek, M., Jonsson, I. M. (2005). Evaluation and context for in-car speech systems for older adults. In: The 2nd Latin American Conf. on Human-Computer Interaction, CLIHC, Cuernavaca, México, October 23-26, 2005.Google Scholar
  87. 87.
    Chen, F. (2004). Speech interaction system - how to increase its usability. In: The 8th Int. Conf. on Spoken Language Processing, Interspeech. ICSL, Jeju Island, Korea, Oct 4-8, 2004.Google Scholar
  88. 88.
    Norman, D. (2007). The Design of Future Things. Basic Books, New York.Google Scholar
  89. 89.
    Jordan, P. W. (2000). Designing Pleasurable Products. Taylor & Francis, London and New YorkCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Fang Chen
    • 1
  • Ing-Marie Jonsson
    • 2
  • Jessica Villing
    • 3
  • Staffan Larsson
    • 3
  1. 1.Interaction Design, Department of Computer Science and EngineeringChalmers University of TechnologyGothenburgSweden
  2. 2.Toyota Information Technology CenterPalo AltoUSA
  3. 3.Department of Philosophy, Linguistics and Theory of ScienceUniversity of GothenburgGöteborgSweden

Personalised recommendations