Advertisement

The combined effect of speech codec quality and transmission delay on human performance during complex spoken interactions

  • R. Mannell
Article
  • 61 Downloads

Abstract

This paper examines the effect of interaction between speech codec output quality and simulated satellite or VoIP transmission delay time on talker performance in a complex interaction. A hardware test codec (both single and tandem) was compared against a number of processed speech reference conditions to determine the relative subjective quality of the test codecs against conditions with known Mean Opinion Scores (MOS). The two codec conditions plus an additional higher quality condition were then used in an experiment that examined the effect of the interaction of transmitted speech quality and simulated transmission delay on a speech shadowing task and an accompanying error repair task involving two speakers. One person (the “reader”) read a passage. The second person (the “shadower”) shadowed the read passage by repeating immediately the words spoken by the reader. The reader, whilst reading, also listened for errors spoken by the shadower and repaired those errors by verbally reporting them to the shadower. A significant interaction between codec quality and transmission delay was found for the error repair task, but only for cases where the shadower made a significant number of errors. These results suggest that, for highly complex interactions which involve significant cognitive load, human performance will degrade more rapidly with increases in delay for transmission systems using speech codecs with lower quality output. This is assumed to be due to the additional demands upon working memory imposed by the transmission delay.

Keywords

Speech codec quality Transmission delay Speech shadowing Cognitive load Working memory 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderson, A. H., Bader, M., Bard, E. G., Boyle, E., Doherty, G., Garrod, S., Isard, S., Kowtko, J., McAllister, J., Miller, J., Sotillo, C., Thonpson, H. S., & Weinert, R. (1991). The HCRC map task corpus. Language and Speech, 34, 351–366. Google Scholar
  2. Bailly, G. (2003). Close shadowing of synthetic speech. International Journal of Speech Technology, 6, 11–19. MATHCrossRefGoogle Scholar
  3. Baddeley, A. (1992). Working memory. Science, 255, 556–559. CrossRefGoogle Scholar
  4. Barnwell, T. (1980a). Correlation analysis of subjective and objective measures for speech quality. In Proc. IEEE international conference on acoustics, speech and signal processing ICASSP80, pp. 706–709, 1980. Google Scholar
  5. Barnwell, T. (1980b). A comparison of parametrically different objective speech quality measures using correlation analysis with subjective quality. In Proc. IEEE international conference on acoustics, speech and signal processing ICASSP80, pp. 710–713, 1980. Google Scholar
  6. Barnwell, T., & Quackenbush, S. (1982). An analysis of objectively computable measures for speech quality testing. In Proc. IEEE international conference on acoustics, speech and signal processing ICASSP82, pp. 996–999, 1982. Google Scholar
  7. Brazil, D. (1995). A grammar of speech, describing English language. London: Oxford University Press. Google Scholar
  8. Campana, E., Tanenhaus, M. K., Allen, J. F., & Remington, R. W. (2004). Evaluating the cognitive load in spoken language interfaces using a dual-task paradigm. In Proc. Interspeech 2004, pp. 1721–1724, 2004. Google Scholar
  9. Chistovich, L. (1960). Classification of rapidly repeated speech sounds. Akusticheskii Zhurnal, 6, 392–398. (cited by Marslen-Wilson, 1985). Google Scholar
  10. Cisco Systems Inc. (2002). Internetworking technology handbook (3rd ed.). Google Scholar
  11. Clark, J. E. (1983). Intelligibility comparisons for two synthetic and one natural speech source. Journal of Phonetics, 11, 37–49. Google Scholar
  12. Cox, R., & Kroon, R. (1996). Low bit-rate speech coders for multimedia communication. IEEE Communications Magazine (December). Google Scholar
  13. Dimolitsas, S., Phipps, J. G., & Wong, A. (1995). Impact of delay on the voice transmission performance of mobile-satellite systems. In Tenth international conference on digital satellite communications, Brighton UK, May 1995. Google Scholar
  14. Digital Voice Systems, Inc. (1999). AMBE-1000 vocoder chip user’s manual. Google Scholar
  15. Egan, J. P. (1948). Articulation testing methods. Laryngoscope, 58, 955–991. CrossRefGoogle Scholar
  16. ETSI (1997). Digital cellular telecommunications system; Enhanced Full Rate (EFR) speech transcoding (GSM 06.60), European Telecommunications Standards Institute. Google Scholar
  17. Fairbanks, G. (1958). Test of phonemic differentiation: the rhyme test. Journal of the Acoustical Society of America, 30, 596–600. CrossRefGoogle Scholar
  18. Gibson, J., & Wei, Bo. (2004). Tandem voice communications: Digital cellular, VoIP, and voice over Wi-Fi. In Proceedings of global telecommunications conference 2004 (GLOBECOM’04), 29 Nov–3 Dec, 2004, Dallas, Texas, Vol. 2, pp. 617–621. Google Scholar
  19. Gros, L., Durin, V., & Chateau, N. (2008). Redrawing the link between customer satisfaction and speech quality. Acta Acustica united with Acustica, 94, 32–42. CrossRefGoogle Scholar
  20. Guastavino, C., Levitin, D. J., Spackman, S., Chan-You, A., & Cooperstock, J. R. (2006). Quantifying the perceptual effects of videoconferencing compression. In CHI 2006, conference on human factors in computing systems, Montréal, Canada, 22–27 April, 2006. Google Scholar
  21. Halliday, M. (1967). Intonation and grammar in British English. The Hague: Mouton. Google Scholar
  22. Hecker, M., & Guttman, N. (1967). Survey of methods for measuring speech quality. Journal of the Audio Engineering Society, 15, 400–403. Google Scholar
  23. Hecker, M., & Williams, C. (1966). Choice of reference conditions for speech reference tests. Journal of the Acoustical Society of America, 39(5), 946–952. CrossRefGoogle Scholar
  24. House, A., Williams, C., Hecker, H., & Kryter, K. (1965). Articulation testing methods: consonantal differentiation with a closed response set. Journal of the Acoustical Society of America, 37, 159–166. CrossRefGoogle Scholar
  25. IEEE subcommittee on subjective measures (1969). IEEE recommended practice for speech quality measurements. IEEE Transactions on Audio and Electroacoustics, AU-17(3). Google Scholar
  26. ITU (1988a). ITU-T Recommendation P.48, Intermediate reference system. International Telecommunication Union—Telecommunication Standardization Sector, Geneva. Google Scholar
  27. ITU (1988b). ITU-T Recommendation G.711, Pulse code modulation (PCM) of voice frequencies. International Telecommunication Union—Telecommunication Standardization Sector, Geneva. Google Scholar
  28. ITU (1996a). ITU-T Recommendation P.800 (08/96), Methods for subjective determination of transmission quality. International Telecommunication Union—Telecommunication Standardization Sector, Geneva. Google Scholar
  29. ITU (1996b). ITU-T Recommendation P. 830 (02/96), Subjective performance assessment of telephone-band and wideband digital codecs. International Telecommunication Union—Telecommunication Standardization Sector, Geneva. Google Scholar
  30. ITU (1996c). ITU-T Recommendation G.723.1 (03/96), Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s. International Telecommunication Union—Telecommunication Standardization Sector, Geneva. Google Scholar
  31. ITU (1996d). ITU-T Recommendation G.810 (02/96), Modulated noise reference unit (MNRU). International Telecommunication Union—Telecommunication Standardization Sector, Geneva. Google Scholar
  32. ITU (1998). ITU-T Recommendation G.861 (02/98), Objective quality measurement of telephone-band speech codecs. International Telecommunication Union—Telecommunication Standardization Sector, Geneva. Google Scholar
  33. ITU (2001). ITU-T Recommendation P.862 (02/01), Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. International Telecommunication Union—Telecommunication Standardization Sector, Geneva. Google Scholar
  34. Lavie, N., Hirst, A., de Fockert, J., & Viding, E. (2004). Load theory of selective attention and cognitive control. Journal of Experimental Psychology: General, 133, 339–354. CrossRefGoogle Scholar
  35. Licklider, J., Bisberg, A., & Schwartzlander, H. (1959). An electronic device to measure the intelligibility of speech. National Electronics Conference, Proceedings, 15, 329–334. Google Scholar
  36. Makhoul, J., Viswanathan, R., & Russell, W. (1976). A framework for the objective evaluation of vocoder speech quality. In: Proceedings IEEE international conference on acoustics, speech and signal processing ICASSP76, pp. 103–106, 1976. Google Scholar
  37. Marslen-Wilson, W. (1973). Linguistic structure and speech shadowing at very short latencies. Nature, 244, 522–523. CrossRefGoogle Scholar
  38. Marslen-Wilson, W. (1985). Speech shadowing and speech comprehension. Speech Communication, 4, 55–73. CrossRefGoogle Scholar
  39. Miller, G. (1956). The magical number seven plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81–97. CrossRefGoogle Scholar
  40. Minoli, D., & Minoli, E. (1998). Delivering voice over IP networks. New York: Wiley. Google Scholar
  41. Möller, S., & Raake, A. (2002). Telephone speech quality prediction: Towards network planning and monitoring models for modern network scenarios. Speech Communication, 38, 47–75. MATHCrossRefGoogle Scholar
  42. Munson, W. A., & Karlin, J. E. (1962). Isopreference method for evaluating speech transmission circuits. Journal of the Acoustical Society of America, 34(6), 762–774. CrossRefGoogle Scholar
  43. Nakatani, L. H., & Dukes, K. D. (1973). A sensitive test of speech communication quality. Journal of the Acoustical Society of America, 53(4), 1083–1092. CrossRefGoogle Scholar
  44. Parsa, V., & Jamieson, D. (2003). Interactions between speech coders and disordered speech. Speech Communication, 40, 365–385. CrossRefGoogle Scholar
  45. Pisoni, D., & Koen, E. (1982). Some comparisons of intelligibility of synthetic and natural speech at different speech-to-noise ratios. Journal of the Acoustical Society of America, 71(Suppl. 1), S94. CrossRefGoogle Scholar
  46. Pisoni, D., Nusbaum, H., Luce, P., & Schwab, E. (1983). Perceptual evaluation of synthetic speech: Some considerations of the user/system interface. In Proceedings IEEE international conference on acoustics, speech and signal processing ICASSP83, pp. 535–538, 1983. Google Scholar
  47. Redding, C., DeMinco, N., & Linder, J. (2001). Voice quality assessment of vocoders in tandem configuration (NTIA Report 01-386). US. Department of Commerce, April 2001. Google Scholar
  48. Rix, A. W., Beerends, J. G., Kim, D.-S., Kroon, P., & Ghitza, O. (2006). Objective assessment of speech and audio quality—Technology and applications. IEEE Transactions on Audio, Speech and Language Processing, 14(6), 1890–1901. CrossRefGoogle Scholar
  49. Schwab, E., Nusbaum, H., & Pisoni, D. (1985). Some effects of training on the perception of synthetic speech. Human Factors, 27(4), 395–408. Google Scholar
  50. Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12, 257–285. CrossRefGoogle Scholar
  51. Tseng, K.-K., Lai, Y.-C., & Lin, Y.-D. (2004). Perceptual codec and interaction aware playout algorithms and quality measurements for VoIP systems. IEEE Transactions on Consumer Electronics, 50, 297–305. CrossRefGoogle Scholar
  52. Viswanathan, R., Russell, W., & Makhoul, J. (1983). Objective speech quality evaluation of mediumband and narrowband real-time speech coders. In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP83, pp. 543–546, 1983. Google Scholar
  53. Voiers, W. D. (1977). Diagnostic evaluation of speech intelligibility. In M. Hawley (Ed.), Benchmark papers in acoustics : Vol. 11. Speech intelligibility and speaker recognition. Stroudsburg: Dowden Hutchinson and Ross. Google Scholar
  54. Voiers, W. D. (1982). Measurement of intrinsic deficiency in transmitted speech: the diagnostic discrimination test (DDT). In Proc. IEEE international conference on acoustics, speech and signal processing ICASSP82, Paris, France, pp. 703–705, 1982. Google Scholar
  55. Voran, S. (1999a). Objective estimation of perceived speech quality—Part I: Development of the measuring normalizing block technique. IEEE Transactions on Speech and Audio Processing, 7, 371–382. CrossRefGoogle Scholar
  56. Voran, S. (1999b). Objective estimation of perceived speech quality—Part II: Evaluation of the measuring normalizing block technique. IEEE Transactions on Speech and Audio Processing, 7, 383–390. CrossRefGoogle Scholar
  57. Voran, S. (1999c). Advances in objective estimation of perceived speech quality. In Proc. 1999 IEEE speech coding workshop, Porvoo, Finland, 1999. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.Centre for Language Sciences, Department of LinguisticsMacquarie UniversitySydneyAustralia

Personalised recommendations