Military Applications: Human Factors Aspects of Speech-Based Systems



When considering military applications of speech-based interactive systems, there are some which are specific to the military domain, and some which are more general, for example, office-type applications (dictation, directory and information enquiries; see Jokinen [23]) and training. The emphasis in the chapter is on the more specific military applications although some of the general applications are discussed. Two key components of speech-based interactive systems are Automatic Speech Recognition (ASR) and speech synthesis. These are extensively covered in earlier chapters, so are only considered here in terms of characteristics relevant to the military domain. A final comment concerns the definition of military. Traditionally, the military is thought of as comprising the Air Force, Army, Navy and Marine Corps. In addition, there are some peripheral activities relating to the military such as Air Traffic Control (ATC) and defence, for example, the military police and the security agencies. These are also considered in the chapter as part of the section on applications.


Speech Recognition Automatic Speech Recognition Impulse Noise Speech Synthesis Military Application 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Anderson, T., Pigeon, S., Swail, C., Geoffrois, E., Bruckner, C. (2004). Implications of multilingual interoperability of speech technology for military use. NATO Research and Technology Organization, Report RTO-TR-IST-011, AC/323(IST-011)TP/26.Google Scholar
  2. 2.
    Baber, C., Noyes, J. M. (1996). Automatic speech recognition in adverse environments. Hum. Factors, 38, 142–155.CrossRefGoogle Scholar
  3. 3.
    Benincasa, D. S., Smith, S. E., Smith, M. J. (2004). Impacting the war on terrorism with language translation. In: Proc. IEEE Aerospace Conf., Big Sky, MT, USA, 3283–3288.Google Scholar
  4. 4.
    Bolia, R. S., Slyh, R. E. (2003). Perception of stress and speaking style for selected elements of the SUSAS database. Speech Commun., 40, 493–501.CrossRefGoogle Scholar
  5. 5.
    Calhoun, G., Draper, M. H. (2006). Multi-sensory interfaces for remotely operated vehicles. In: Cooke, N. J., Pringle, H. L., Pedersen, H. K., Connor, O. (eds) Advances in Human Performance and Cognitive Engineering Research, vol. 7: Human Factors of Remotely Operated Vehicles, 149–163.Google Scholar
  6. 6.
    Canadian Broadcasting Corporation (CBC) PlNews (2006). Women in the military – International. In: CBC News Online, May 30, 2006. Available online
  7. 7.
    Carr, O. (2002). Interfacing COTS speech recognition and synthesis software to a Lotus notes military command and control database. Defence Science and Technology Organisation, Information Sciences Laboratory, Edinburgh, Australia. Research Report AR-012-484. Available online, May 2006:
  8. 8.
    Chengguo, L., Jiqing, H., Wang, C. (2005). Stressful speech recognition method based on difference subspace integrated with dynamic time warping. Acta Acoust., 30 (3), 229–234.Google Scholar
  9. 9.
    Cresswell, Starr, A. F. (1993). Is control by voice the right answer for the avionics environment? In: Baber, C., Noyes, J. M. (eds) Interactive Speech Technology: Human Factors Issues in the Application of Speech Input/Output to Computers. Taylor & Francis, London, 85–97.Google Scholar
  10. 10.
    Deng, L., Huang, X. (2004). Challenges in adopting speech recognition. Commun. ACM, 47 (1), 69–73.MathSciNetCrossRefGoogle Scholar
  11. 11.
    Deng, L., O’Shaughnessy, D. (2003). Speech Processing – A Dynamic and Optimization-Oriented Approach. Marcel Dekker, NY.Google Scholar
  12. 12.
    Doddington, G., Liggett, W., Martin, A., Przybocki, M., Reynolds, D. (1998). SHEEP, GOATS, LAMBS and WOLVES: A statistical analysis of speaker performance in the NEST 1998 speaker recognition evaluation. In: Proc. IEEE Int. Conf. on Spoken Language Processing, ICSLP ‘98 Sydney, Australia, 608–611.Google Scholar
  13. 13.
    Draper, M., Calhoun, G., Ruff, H., Williamson, D., Barry, T. (2003). Manual versus speech input for unmanned aerial vehicle control station operations. In: Proc. 47th Annual Meeting of the Human Factors and Ergonomics Society, Denver, CO, USA, 109–113.Google Scholar
  14. 14.
    Francis, A. L., Nusbaum, H. C. (1999). Evaluating the quality of synthetic speech. In: Gardner-Bonneau, D. (ed) Human Factors and Voice Interactive Systems. Kluwer, Norwell, MA, 63–97.Google Scholar
  15. 15.
    Frederking, R. E., Black, A. W., Brown, R. D., Moody, J., Steinbrecher, E. (2002). Field testing the Tongues speech-to-speech machine translation system, 160–164. Available online, May 2006:
  16. 16.
    Frigola, M., Fernandez, J., Aranda, J. (2003). Visual human machine interface by gestures. In: Proc. IEEE Int. Conf. on Robotics & Automation, Taipei, Taiwan, 386–391.Google Scholar
  17. 17.
    Fuegen, C., Rogina, I. (2000). Integrating dynamic speech modalities into context decision trees. In: Proc. IEEE Int. Conf. of Acoustic Speech Signal Processing, Istanbul, Turkey. ICASSP 2000, vol. 3, 1277–1280.Google Scholar
  18. 18.
    Goffin, V., Allauzen, C., Bocchieri, E., Hakkani-Tür, D., Ljolje, A., Parthasarathy, S., Rahim, M., Riccardi, G., Saraclar, M. (2005). The AT&T Watson speech recogniser. In: Proc. IEEE Int. Conf. on Spoken Language Processing, ICLSP 2005, Philadelphia, PA, I-1033–I-1036.Google Scholar
  19. 19.
    Haas, E., Shankle, R., Murray, H., Travers, D., Wheeler, T. (2000). Issues relating to automatic speech recognition and spatial auditory displays in high noise, stressful tank environments. In: Proc. IEA 2000/HFES 2000 Congress. Human Factors and Ergonomics Society, Santa Monica, CA, vol. 3, 754–757.Google Scholar
  20. 20.
    Halverson, C. A., Horn, D. B., Karat, C. M., Karat, J. (1999). The beauty of errors: Patterns of error correction in desktop speech systems. In: Sasse, M. A., Johnson, C. (eds) Proc. Human–Computer Interaction – INTERACT ’99. IOS Press, Amsterdam.Google Scholar
  21. 21.
    Hu, C., Meng, M. Q., Liu, P. X., Wang, X. (2003). Visual gesture recognition for human–machine interface of robot teleoperation. In: Proc. 2003 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, Las Vegas, NV, USA, 1560–1565.Google Scholar
  22. 22.
    Huang, S. D., Acero, A., Hon, H. (2001). Spoken Language Processing – A Guide to Theory, Algorithms, and System Development. Prentice Hall, NY.Google Scholar
  23. 23.
    Jokinen, K. (2006). Constructive dialogue management for speech-based interaction systems. In: Proc. Intelligent User Interfaces’06, Sydney, Australia. ACM Press, New York, NY.Google Scholar
  24. 24.
    Junqua, J. (2000). Robust Speech Recognition in Embedded Systems and PC Applications. Kluwer, Norwell, MA.Google Scholar
  25. 25.
    Kane, T. (2006). Who are the recruits? The demographics characteristics of U.S. Military enlistment, 2003–2005. The Heritage Foundation, Washington, DC.Google Scholar
  26. 26.
    Kirchoff, K., Vegyri, D. (2004). Cross-dialectal acoustic data sharing for Arabic speech recognition. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP 2004, vol. 1, 765–768.Google Scholar
  27. 27.
    Kudo, I., Nakama, T., Watanabe, T., Kameyama, R. (1996). Data collection of Japanese dialects and its influence into speech recognition. In: Proc. 4th Int. Conf. on Spoken Language Processing (ICSLP), vol. 4, 2021–2024.Google Scholar
  28. 28.
    Lai, J., Wood, D., Considine, M. (2000). The effect of task conditions on the comprehensibility of synthetic speech. CHI Lett., 2, 321–328.Google Scholar
  29. 29.
    Leeks, C. (1986). Operation of a speech recogniser under whole body vibration (Technical Memorandum FDS(F) 634). RAE, Farnborough, UK.Google Scholar
  30. 30.
    Leggatt, A. P., Noyes, J. M. (2004). A holistic approach to the introduction of automatic speech recognition technology in ground combat vehicles. Mil. Psychol., 16, 81–97.CrossRefGoogle Scholar
  31. 31.
    Lippmann, R. (1997). Speech recognition by machines and humans. Speech Commun., 22, 1–15.CrossRefGoogle Scholar
  32. 32.
    Littlefield, J., Hashemi-Sakhtsari, A. (2002). The effects of background noise on the performance of an Automatic Speech Recogniser. Defence Science and Technology Organisation, Information Sciences Laboratory, Edinburgh, Australia. Research Report AR-012-500. Available online, May 2006:
  33. 33.
    Marshall, S. L. (2005). Concept of operations (CONOPS) for foreign language and speech translation technologies in a coalition military environment. Unpublished Master’s Thesis, Naval Postgraduate School, Monterey, CA.Google Scholar
  34. 34.
    McCarty, D. (2000). Building the business case for speech in call centers: Balancing customer experience and cost. In: Proc. SpeechTEK, New York, 15–26.Google Scholar
  35. 35.
    Minker, W., Bühler, D., Dybkjaer, L. (2005). Spoken Multimodal Human–Computer Dialogue in Mobile Environments. Springer, Dordrecht.CrossRefGoogle Scholar
  36. 36.
    Mitsugami, I., Ukita, N., Kidode, M. (2005). Robot navigation by eye pointing. In: Proc. 4th Int. Conf. on Entertainment Computing (ICEC), Sanda, Japan, 256–267.Google Scholar
  37. 37.
    Moore, T. J., Bond, Z. S. (1987). Acoustic-phonetic changes in speech due to environmental stressors: Implications for speech recognition in the cockpit. In: Proc. 4th Int. Symp. on Aviation Psychology, Aviation Psychology Laboratory, Columbus, OH.Google Scholar
  38. 38.
    Murray, I. R., Baber, C., South, A. (1996). Towards a definition and working model of stress and its effects on speech. Speech Commun., 20, 3–12.CrossRefGoogle Scholar
  39. 39.
    Myers, B., Hudson, S. E., Pausch, R. (2000). Past, present, and future of user interface software tools. ACM Trans. Comput. Hum. Interact., 7, 3–28.CrossRefGoogle Scholar
  40. 40.
    Neely, H. E., Belvin, R. S., Fox, J. R., Daily, J. M. (2004). Multimodal interaction techniques for situational awareness and command of robotic combat entities. In: Proc. IEEE Aerospace Conf., Big Sky, MT, USA, 3297–3305.Google Scholar
  41. 41.
    Newman, D. (2000). Speech interfaces that require less human memory. In: Basson, S. (ed) AVIOS Proc. Speech Technology & Applications Expo, San Jose, CA, 65–69.Google Scholar
  42. 42.
    North, R. A., Bergeron, H. (1984). Systems concept for speech technology application in general aviation. In: Proc. 6th Digital Avionics Systems Conf. (A85-17801 06-01). American Institute of Aeronautics and Astronautics, New York, AIAA-84-2639, 184–189.Google Scholar
  43. 43.
    North Atlantic Treaty Organisation (NATO) Committee for Women in the NATO Forces (2006). Personnel comparison in deployments 2006. Available online, December 2006:
  44. 44.
    Noyes, J. M., Hellier, E., Edworthy, J. (2006). Speech warnings: A review. Theor. Issues Ergonomics Sci., 7 (6), 551–571.CrossRefGoogle Scholar
  45. 45.
    Oberteuffer, J. (1994). Commercial applications of speech interface technology: An industry at the threshold. In: Roe, R., Wilpon, J. (eds) Voice Communication Between Humans and Machines. National Academy Press, Washington DC, 347–356.Google Scholar
  46. 46.
    Oviatt, S. L. (2000). Multimodal system processing in mobile environments. CHI Lett., 2 (2), 21–30.Google Scholar
  47. 47.
    Paper, D. J., Rodger, J. A., Simon, S. J. (2004). Voice says it all in the Navy. Commun. ACM, 47, 97–101.CrossRefGoogle Scholar
  48. 48.
    Pearce, D., Hirsch, H. G. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: Proc. 6th Int. Conf. on Spoken Language Processing, ICSLP 2000, Beijing, China.Google Scholar
  49. 49.
    Pellom, B., Hacioglu, K. (2003). Recent improvements in the CU sonic ASR system for noisy speech: The SPINE task. In: IEEE Proc. Int. Conf. on Acoustics, Speech and Signal Processing, Hong Kong, China, ICASSP 2003, I-4–I-7.Google Scholar
  50. 50.
    Perzanowski, D., Brock, D., Blisard, S., Adams, W., Bugajska, M., Schultz, A. (2003). Finding the FOO: A pilot study for a multimodal interface. In: Proc. 2003 IEEE Conf. on Systems, Man and Cybernetics, vol. 4, 3218–3223.Google Scholar
  51. 51.
    Picone, J. (1990). The demographics of speaker independent digit recognition. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP 1990, vol. 1, 105–108.Google Scholar
  52. 52.
    Ralston, J. V., Pisoni, D. B., Lively, S. E., Greene, B. G., Mullennix, J. W. (1991). Comprehension of synthetic speech produced by rule: Word monitoring and sentence-by-sentence listening times. Hum. Factors, 33, 471–491.Google Scholar
  53. 53.
    Rodger, J. A., Pendharkar, P. C., Paper, D. C., Trank, T. V. (2001). Military applications of natural language processing and software. In: Proc. 7th Americas Conf. on Information Systems, Boston, MA, USA, 1188–1193.Google Scholar
  54. 54.
    Rodger, J. A., Pendharkar, P. C. (2004). A field study of the impact of gender and user’s technical experience on the performance of voice-activated medical tracking application. Int. J. Hum. Comput. Studies, 60 (5–6), 529–544.CrossRefGoogle Scholar
  55. 55.
    Rodger, J. A., Trank, T. V., Pendharkar, P. C. (2002). Military applications of natural language processing and software. Ann. Cases Inf. Technol., 5, 12–28.Google Scholar
  56. 56.
    Sawhney, N., Schmandt, C. (2000). Nomadic Radio: Speech and audio interaction for contextual messaging in nomadic environments. ACM Trans. Comput. Hum. Interact., 7 (3), 353–383.CrossRefGoogle Scholar
  57. 57.
    Shneiderman, B. (2000). The limits of speech recognition. Commun. ACM, 43, 63–65.CrossRefGoogle Scholar
  58. 58.
    Singh, R., Seltzer, M. L., Raj, B., Stern, R. M. (2001). Speech in noisy environments: Robust automatic segmentation, feature extraction, and hypothesis combination. In: Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP 2001, Salt Lake City, UT, vol. 1, 273–276.Google Scholar
  59. 59.
    Strand, O. M., Holter, T., Egeberg, A., Stensby, S. (2003). On the feasibility of ASR in extreme noise using the PARAT earplug communication terminal. IEEE Workshop on Automatic Speech Recognition and Understanding, St. Thomas, Virgin Islands, 315–320.Google Scholar
  60. 60.
    Tashakkori, R., Bowers, C. (2003). Similarity analysis of voice signals using wavelets with dynamic time warping. Proc. SPIE, 5102, 168–177.CrossRefGoogle Scholar
  61. 61.
    Viswanathan, M., Viswanathan, M. (2005). Measuring speech quality for text-to-speech systems: Development and assessment of a modified mean opinion score (MOS) scale. Comput. Speech Lang., 19, 55–83.CrossRefGoogle Scholar
  62. 62.
    Wagner, M. (1997). Speaker characteristics in speech and speaker recognition. In: Proc. 1997 IEEE TENCON Conf., Brisbane, Australia, part 2, 626.Google Scholar
  63. 63.
    Weimer, C., Ganapathy, S. K. (1989). A synthetic visual environment with hand gesturing and voice input. In: HCI International 89: 3rd International Conference on Human-Computer Interaction September 18–22, 1989, Boston, MA, USA.Google Scholar
  64. 64.
    Weinstein, C. J. (1995). Military and government applications of human-machine communication by voice. Proc. Natl Acad. Sci. USA, 92, 10011–10016. (Reprint of Military and government applications of human–machine communications by voice. In: Roe, R., Wilpon, J. (eds) Voice communication between humans and machines. National Academy Press, Washington DC, 357–370).CrossRefGoogle Scholar
  65. 65.
    White, R. W., Parks, D. L., Smith, W. D. (1984). Potential flight applications for voice recognition and synthesis systems. In: Proc. 6th AIAA/IEEE Digital Avionics System Conf., 84-2661-CP.Google Scholar
  66. 66.
    Williamson, D. T., Draper, M. H., Calhoun, G. L., Barry, T. P. (2005). Commercial speech recognition technology in the military domain: Results of two recent research efforts. Int. J. Speech Technol., 8, 9–16.CrossRefGoogle Scholar
  67. 67.
    Wilpon, J. G., Jacobsen, C. N. (1996). A study of speech recognition for children and the elderly. In: Proc. IEEE Conf. Acoustics, Speech and Signal Processing (ICASSP), Atlanta, GA, USA, vol. 1, 349–352.Google Scholar
  68. 68.
    Yoshizaki, M., Kuno, Y., Nakamura, A. (2002). Human-robot interface based on the mutual assistance between speech and vision. In: Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, Swiss Federal Institute of Technology, Lausanne, Switzerland, 1308–1313.Google Scholar
  69. 69.
    Zhou, G., Hansen, J. H. L., Kaiser, J. F. (2001). Nonlinear feature based classification of speech under stress. IEEE Trans. Speech Audio Process., 9 (3), 201–216.CrossRefGoogle Scholar
  70. 70.
    Zue, V. (2004). Eighty challenges facing speech input/output technologies. In: Proc. from Sound to Sense: 50+ Years of Discovery in Speech Communication, MIT, Boston, MA, USA, B179–B195.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.University of BristolBristolUK
  2. 2.U.S. Army Research LaboratoryAdelphiUSA

Personalised recommendations