Advertisement

Knowledge-Based Computer Recognition of Speech

  • R. de Mori
Conference paper
Part of the NATO ASI Series book series (volume 45)

Abstract

Shape recognition by fast syntactic methods is possible when there exists a natural linear (one dimensional) order on component shapes. This may not be available for structural shape descriptions taking the form of unordered, variable-length sets of simpler shapes. In this case, it is tempting to fall back on slower exhaustive correlation, graph matching, and relaxation methods. However, if the structural shapes are themselves simple, it is possible to apply multi-dimensional search techniques for asymptotically fast feature identification. I exploit the fact that many simple shape types may be parameterized as points in low-dimensional spaces where distance models dissimilarity. During training, shapes are clustered heuristically within each class, then among all classes, giving a small set of characteristic shape distributions. Each os these is then associated with a binary feature variable taking the value one when any input shape falls within the distribution. This mapping from a structural description into a bit-vector is an example of a feature identification method. Selecting such a mapping is slow and heuristic, but fully automated, applicable uniformly to many shape types, and controlled by only a few natural statistical parameters. A mapping, once selected, can be applied quickly using kD-trees. Large-scale statistically-significant trials have shown the technique to be superior to simpler fixed mappings, in an OCR context.

Keywords

Speech Recognition Acoustic Property Continuous Speech Speech Recognition System Voice Onset Time 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bahl, L.R., Jenlinek, F., Mercer, R.L., A Maximum Likelihood Approach to Continuous Speech Recognition, IEEE Trans, on Pattern Analysis and Machine Intelligence, Vol. PAMI-5, No. 2, pp. 179–190, March 1983.CrossRefGoogle Scholar
  2. Bahl, L.R., Das, S.K., de Souza, P.V., Jelinek, F., Katz, S., Mercer, R.L., Picheny, M.A., Some Experiments with Large-Vocabulary Isolated Word Sentence Recognition, Proc. of the IEEE Conference on Aoustics, Speech, and Signal Processing, San Diego, CA, pp. 2651–2653, March 1984.Google Scholar
  3. Church, K.W., Phrase-Structure Parsing: A Method for Taking Advantage of Allophonic Constraints, MIT/LCS/TR-296, Cambridge, MA, January 13, 1983. (MIT Ph.D. thesis)Google Scholar
  4. Demichelis, P., De Mori, R., Laface, P. and O’Kane, M., Computer Recognition of Plosive Sounds Using Contextual Information, IEEE Trans, on Acoustics, Speech, and Signal Processing, Vol. ASSP-31, No. 2, pp. 359–377, April 1983.CrossRefGoogle Scholar
  5. De Mori, R., Giordana, A., Laface, P., Saitta, L., An Expert System for Interpreting Speech Patterns, Proc. of the AAAI-82, pp. 107–110, 1982.Google Scholar
  6. De Mori, R., Computer Models of Speech Using Fuzzy Algorithms, Plenum Press, New York, NY, 1983.CrossRefGoogle Scholar
  7. De Mori, R. and Gilloux, M., Inductive Learning of Phonetic Rules for Automatic Speech Recognition, Proc. of the CSCSI-84, London, Ontario, pp. 103–106, May 1984.Google Scholar
  8. De Mori, R., Laface, P., and Mong, Y., Parallel Algorithms for Syllable Recognition in Continuous Speech, IEEE Trans, on Pattern Analysis and Machine Intelligence, Vol. PAMI-6, pp. 56–69, January 1985.CrossRefGoogle Scholar
  9. Doyle, J., A Truth Maintenance System, Artificial Intelligence, Vol. 12, No. 3, pp. 231–272, 1979.MathSciNetCrossRefGoogle Scholar
  10. Erman, L.D., Hayes-Roth, F., Lesser, V.R., Reddy, D.R., The HEARSAY-II Speech-Understanding System: Integrating Knowledge to Resolve Uncertainty, Computing Surveys, Vol. 12, No. 2, pp. 213–253, June 1980.CrossRefGoogle Scholar
  11. Kopec, G.E., Voiceless Stop Consonant Identification Using LPC Spectra, Proc. of the IEEE Conference on Acoustics, Speech, and Signal Processing, San Diego, CA, pp. 4211–4214, March 1984.Google Scholar
  12. Levinson, S., Rabiner, L.R., Isolated and Connected Word Recognition: Theory and Selected Applications, IEEE Trans, on Communications, Vol. COM-29, No. 5, pp. 621–659, May 1981.CrossRefGoogle Scholar
  13. McCarthy, J., Some Expert Systems Need Common Sense, in The Computer Culture, H. Pagels, ed., Annals of the New York Academy of Sciences, Vol. 426, (1984).Google Scholar
  14. Michalski, R.S., A Theory and Methodology of Inductive Learning, in Machine Learning: An Artificial Intelligence Approach, Tioga Publishing Company, Palo Alto, CA, pp. 83–134, 1983.Google Scholar
  15. Minsky, M., A Framework for Representing Knowledge, in The Psychology of Computer Vision, P. Winston, ed., McGraw-Hill, New York, NY, 1975.Google Scholar
  16. Moses, J., Computer Science as the Science of Discrete Man-Made Systems, Knowledge: Creation, Diffusion, Utilization, Vol. 4, No. 2, pp. 219-226, December 1982, reprinted in The Study of Information: Interdisciplinary Messages, F. Machlup and U. Mansfield, eds., John Wiley and Sons, New York, NY, 1983.Google Scholar
  17. Neisser, U., Cognition and Reality: Principles and Implications of Cognitive Psychology, W.H. Freeman and Co., San Francisco, CA, 1976.Google Scholar
  18. Rabiner, L.R., Wilpon, J.G., Terrace, S.G., A Directory Listing Retrieval System Based on Connected Letter Recognition, Proc. of the IEEE Conference on Acoustics, Speech, and Signal Processing, San Diego, CA, pp. 3541–3544, March 1984.Google Scholar
  19. Whitehill, S.B., Self Correcting Generalization, Proc. of the AAAI-80, pp. 240–242, 1980.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1988

Authors and Affiliations

  • R. de Mori
    • 1
  1. 1.School of Computer ScienceMcGill UniversityMontréalCanada

Personalised recommendations