Knowledge-Based Computer Recognition of Speech
Shape recognition by fast syntactic methods is possible when there exists a natural linear (one dimensional) order on component shapes. This may not be available for structural shape descriptions taking the form of unordered, variable-length sets of simpler shapes. In this case, it is tempting to fall back on slower exhaustive correlation, graph matching, and relaxation methods. However, if the structural shapes are themselves simple, it is possible to apply multi-dimensional search techniques for asymptotically fast feature identification. I exploit the fact that many simple shape types may be parameterized as points in low-dimensional spaces where distance models dissimilarity. During training, shapes are clustered heuristically within each class, then among all classes, giving a small set of characteristic shape distributions. Each os these is then associated with a binary feature variable taking the value one when any input shape falls within the distribution. This mapping from a structural description into a bit-vector is an example of a feature identification method. Selecting such a mapping is slow and heuristic, but fully automated, applicable uniformly to many shape types, and controlled by only a few natural statistical parameters. A mapping, once selected, can be applied quickly using kD-trees. Large-scale statistically-significant trials have shown the technique to be superior to simpler fixed mappings, in an OCR context.
KeywordsSpeech Recognition Acoustic Property Continuous Speech Speech Recognition System Voice Onset Time
Unable to display preview. Download preview PDF.
- Bahl, L.R., Das, S.K., de Souza, P.V., Jelinek, F., Katz, S., Mercer, R.L., Picheny, M.A., Some Experiments with Large-Vocabulary Isolated Word Sentence Recognition, Proc. of the IEEE Conference on Aoustics, Speech, and Signal Processing, San Diego, CA, pp. 2651–2653, March 1984.Google Scholar
- Church, K.W., Phrase-Structure Parsing: A Method for Taking Advantage of Allophonic Constraints, MIT/LCS/TR-296, Cambridge, MA, January 13, 1983. (MIT Ph.D. thesis)Google Scholar
- De Mori, R., Giordana, A., Laface, P., Saitta, L., An Expert System for Interpreting Speech Patterns, Proc. of the AAAI-82, pp. 107–110, 1982.Google Scholar
- De Mori, R. and Gilloux, M., Inductive Learning of Phonetic Rules for Automatic Speech Recognition, Proc. of the CSCSI-84, London, Ontario, pp. 103–106, May 1984.Google Scholar
- Kopec, G.E., Voiceless Stop Consonant Identification Using LPC Spectra, Proc. of the IEEE Conference on Acoustics, Speech, and Signal Processing, San Diego, CA, pp. 4211–4214, March 1984.Google Scholar
- McCarthy, J., Some Expert Systems Need Common Sense, in The Computer Culture, H. Pagels, ed., Annals of the New York Academy of Sciences, Vol. 426, (1984).Google Scholar
- Michalski, R.S., A Theory and Methodology of Inductive Learning, in Machine Learning: An Artificial Intelligence Approach, Tioga Publishing Company, Palo Alto, CA, pp. 83–134, 1983.Google Scholar
- Minsky, M., A Framework for Representing Knowledge, in The Psychology of Computer Vision, P. Winston, ed., McGraw-Hill, New York, NY, 1975.Google Scholar
- Moses, J., Computer Science as the Science of Discrete Man-Made Systems, Knowledge: Creation, Diffusion, Utilization, Vol. 4, No. 2, pp. 219-226, December 1982, reprinted in The Study of Information: Interdisciplinary Messages, F. Machlup and U. Mansfield, eds., John Wiley and Sons, New York, NY, 1983.Google Scholar
- Neisser, U., Cognition and Reality: Principles and Implications of Cognitive Psychology, W.H. Freeman and Co., San Francisco, CA, 1976.Google Scholar
- Rabiner, L.R., Wilpon, J.G., Terrace, S.G., A Directory Listing Retrieval System Based on Connected Letter Recognition, Proc. of the IEEE Conference on Acoustics, Speech, and Signal Processing, San Diego, CA, pp. 3541–3544, March 1984.Google Scholar
- Whitehill, S.B., Self Correcting Generalization, Proc. of the AAAI-80, pp. 240–242, 1980.Google Scholar