Current Problems in Automatic Speech Recognition

  • W. A. Ainsworth
  • P. D. Green

Abstract

Automatic speech recognition may be defined as any process which decodes the acoustic signal produced by the human voice into a sequence of linguistic units which contain the message that the speaker wishes to convey. At one extreme this includes the “phonetic typewriter,” a hypothetical device which types any words spoken into it, and at the other, “speech understanding systems” which extract the intended meaning from the sounds and carry out some appropriate action such as replying to a question or controlling a robot. During the last two decades the emphasis in research in automatic speech recognition has gradually shifted from the former type of device to the latter.

Keywords

Titanium Cage Autocorrelation Sorting Acoustics 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ackroyd, M. H., 1974, Commercial applications of speech recognition, IEE Colloquium on Speech Synthesis and Recognition, Digest No. 1974/9, p. 7.Google Scholar
  2. Ainsworth, W. A., 1972, Duration as a cue in the recognition of vowels, J. Acoust. Soc. Am. 51: 648–651.CrossRefGoogle Scholar
  3. Ainsworth, W. A., 1973, Intrinsic and extrinsic factors in vowel judgements, Auditory Analysis and Speech Perception, Academic Press, London.Google Scholar
  4. Barnett, J., 1973, A vocal data management system IEEE Trans. Audio Electroacoust. AU-21: 185–188.CrossRefGoogle Scholar
  5. Bates, M., 1974, The use of syntax in a speech understanding system, IEEE Symp. Speech Recognition: 226–233.Google Scholar
  6. Bezdel, W., and Chandler, H. J., 1965, Results of analysis and recognition of vowels by computer using zero-crossing data, Proc. IEEE 112: 2060.Google Scholar
  7. Broadbent, D. E., and Ladefoged, P., 1960, Vowel judgements and adaptation level, Proc. Royal Soc. B 151: 384–399.CrossRefGoogle Scholar
  8. Davis, K. H., Biddulph, R., and Balashek, H., 1952, Automatic recognition of spoken digits, J. Acoust. Soc. Am. 24: 637–642.CrossRefGoogle Scholar
  9. Dudley, H., and Balashek, S., 1958, Automatic recognition of phonetic patterns in speech, J. Acoust. Soc. Am. 30: 721–732.CrossRefGoogle Scholar
  10. Fant, C. G. M., 1960, Acoustic Theory of Speech Production, Mouton, s’Gravenhage.Google Scholar
  11. Forgie, J. W., and Forgie, C. D., 1959, Results obtained from a vowel recognition computer program, J. Acoust. Soc. Am. 31: 1480–1489.CrossRefGoogle Scholar
  12. Fry, D. B., and Denes, P., 1958, The solution of some fundamental problems in mechanical speech recognition, Language and Speech 1: 35–58.Google Scholar
  13. Fujisaki, H., and Kawashima, T., 1968, The roles of pitch and higher formants in the perception of vowels, IEEE Trans. Audio Electroacoust. AU-16: 73–77.CrossRefGoogle Scholar
  14. Gerstman, L. J., 1968, Classification of self-normalized vowels, IEEE Trans. Audio Electroacoust. AU-16: 78–80.CrossRefGoogle Scholar
  15. Green, P. D., 1971, Temporal characteristics of spoken consonants as discriminants in automatic speech recognition, Ph.D. Thesis, University of Keele.Google Scholar
  16. Green, P. D., and Ainsworth, W. A., 1973, Towards the automatic recognition of spoken Basic English, Machine Perception of Patterns and Pictures, Inst. of Physics Conf. Series No. 13, p. 161–168.Google Scholar
  17. Gregory, R. L., 1970, The Intelligent Eye, Weidenfeld and Nicolson, London.Google Scholar
  18. Halle, M., and Stevens, K. N., 1962, Speech recognition: a model and a program for research, IRE Trans. Information Theory IT-8: 155–159.CrossRefGoogle Scholar
  19. Jakobson, R., Fant, C. G. M., and Halle, M., 1952, Preliminaries to Speech Analysis, MIT Tech. Report No. 13.Google Scholar
  20. Klatt, D. H., and Stevens, K. N., 1973, On the automatic recognition of continuous speech, IEEE Trans. Audio Electroacoust. AU-21: 210–217.CrossRefGoogle Scholar
  21. Lavington, S. H., 1968, Measurement systems for automatic speech recognition, Ph.D. Thesis, University of Manchester.Google Scholar
  22. Lawrence, W., (1953), The synthesis of speech from signals which have a low information rate, Communication Theory (W. Jackson, ed.), Butterworths, London, 460–469.Google Scholar
  23. Lea, W. A., Medress, M. F., and Skinner, T. E., 1974, A prosodically-guided speech understanding strategy, IEEE Symp. Speech Recognition, 38–44.Google Scholar
  24. Lesser, V. R., Fennel, R. D., Erman, L. D. and Reddy, D. R., 1974, Organization of the HEARSAY II speech understanding system, IEEE Symp. Speech Recognition, 11–21.Google Scholar
  25. Licklider, J. C. R., and Pollack, I., 1948, Effects of differentiation, integration, and infinite peak dipping on the intelligibility of speech, J. Acoust. Soc. Am. 20: 42–51.CrossRefGoogle Scholar
  26. Lindblom, B. E. F., and Studdert-Kennedy, M., 1967, On the role of formant transitions in vowel recognition, J. Acoust. Soc. Am. 42: 830–843.CrossRefGoogle Scholar
  27. MacKay, D. M., 1952, Mentality in machines, Proc. Aristot. Soc. Suppt., 26: 61–86.Google Scholar
  28. MacKay, D. M., 1967, Ways of looking at perception, Models for the Perception of Speech and Visual Form (W. Wathen-Dunn, ed.), MIT Press, Boston, 25–43.Google Scholar
  29. Nash-Webber, B., 1974, Semantic support for a speech understanding system, IEEE Symp. Speech Recognition, 244–249.Google Scholar
  30. Nelson, A. L., Werscher, M. B., Martin, T. B., Zadell, H. J., and Falter, J. W., 1967, Acoustic recognition by analog feature-abstraction techniques’ Models for Perception of Speech and Visual Form, (W. Wathen-Dunn, ed.), MIT Press, Boston, 428–439.Google Scholar
  31. Newell, A., Barnett, J., Forgie, J. W., Green, C, Klatt, D., Licklider, J. C. R., Munson, J., Reddy, D. R., and Woods, W. A., 1973, Speech Understanding System North-Holland Publishing Co.Google Scholar
  32. Öhman, S. E. G., 1966, Perception of segments of VCCV utterances, J. Acoust. Soc. Am., 40: 978–988.CrossRefGoogle Scholar
  33. Oshika, B. T., Zue, V. W., Weeks, R. V., Neu, H., and Aurbach, J., 1974, The role of phonological rules in speech understanding research, IEEE Symp. Speech Recognition, 204–207.Google Scholar
  34. Paul, J. E., and Rabinowitz, A. S., 1974, An acoustically based continuous speech recognition system, IEEE Symp. Speech Recognition, 63–67.Google Scholar
  35. Paxton, W. H., 1974, A best-first parser, IEEE Symp. Speech Recognition, 218–225.Google Scholar
  36. Peterson, G. E., and Barney, H. L., 1952, Control methods used in a study of the vowels, J. Acoust. Soc. Am. 24: 175–184.CrossRefGoogle Scholar
  37. Pollack, I., and Pickett, J., 1964, The intelligibility of excerpts from conversation, Language and Speech 6, 165–171.Google Scholar
  38. Potter, R. K., Kopp, G. A., and Green, H. C, 1947, Visible Speech, van Nostrand, New York.Google Scholar
  39. Purton, R. F., 1968, Speech recognition using autocorrelation analysis, IEEE Trans. Audio Electroacoust. AU-16: 235–239.CrossRefGoogle Scholar
  40. Reddy, D. R., 1967, Computer recognition of connected speech, J. Acoust. Soc. Am., 44: 329–347.CrossRefGoogle Scholar
  41. Reddy, D. R., Erman, L. D., and Neely, R. B., 1973, A model and a system for machine recognition of speech, IEEE Trans. Audio Electroacoust. AU-21: 229–238.CrossRefGoogle Scholar
  42. Ritea, H. B., 1974, A voice-controlled data management system, IEEE Symp. Speech Recognition, 28–31.Google Scholar
  43. Rovner, P., Nash-Webber, R., and Words, W. A., 1974, Control concepts in a speech understanding system, IEEE Symp. Speech Recognition, 267–272.Google Scholar
  44. Sakai, T., and Doshita, S., 1962, The phonetic typewriter, Proc. IFIP Congress, Munich.Google Scholar
  45. Tappert, C. C., 1974, Experiments with a tree search method for converting noisy phonetic representation into standard orthography, IEEE Symp. Speech Recognition, pp. 261–266.Google Scholar
  46. Tappert, C. C., Dixon, N. R., and Rabinowitz, A. S., 1973, Application of sequential decoding for converting phonetic to graphic representation in automatic recognition of continuous speech (ARCS), IEEE Trans. Audio Electroacoust. AU-21: 225–229.CrossRefGoogle Scholar
  47. Teacher, C. F., Kellett, H., and Focht, L., 1967, Experimental, limited vocabulary, speech recognizer, IEEE Intern. Conv. Record (Part III), 169–173.Google Scholar
  48. Walker, D. E., 1974, The SRI speech understanding system, IEEE Symp. Speech Recognition, pp.32–37.Google Scholar
  49. Winograd, T., 1972, Understanding Natural Language, Edinburgh University Press, Edinburgh.Google Scholar
  50. Winston, P. H., 1972, The MIT robot, Machine Intelligence 7: 431–463.Google Scholar
  51. Wiren, J., and Stubbs, H. L., 1956, Electronic binary selection system for phoneme classification, J. Acoust. Soc. Am. 28: 1082–1091.CrossRefGoogle Scholar
  52. Woods, W. A., 1974, Motivation and overview of BBN SPEECHLIS, an experimental prototype for speech understanding research, IEEE Symp. Speech Recognition, pp. 1–10.Google Scholar
  53. Woods, W. A., and Makhoul, J., 1974, Mechanical inference problems in continuous speech understanding, Artific. Intell. 5: 73.CrossRefGoogle Scholar

Copyright information

© Plenum Press, New York 1978

Authors and Affiliations

  • W. A. Ainsworth
    • 1
  • P. D. Green
    • 2
  1. 1.Department of CommunicationUniversity of Keele, KeeleStaffordshireEngland
  2. 2.Department of ComputingNorth Staffordshire PolytechnicStaffordEngland

Personalised recommendations