A template-based approach for recognition of intermittent sounds
Automatic speech and sound recognition typically involves some measure of distance between training and (possibly time-warped) test samples. Special problems arise when the spectral samples of interest are intermittent and contain temporal patterns of alternating periods of sounds and pauses that are significant for recognition. In such cases a recognizer must be capable of distinguishing between the end-points and the pauses of digitized samples and economically searching the segmented sounds for the occurrence of significant spectral patterns. The usual distance metrics based on conventional dynamic time warping algorithms may be inappropriate because time-warping often corrupts the temporal structure of the sound. The problem can be solved by first searching a test sample for distinctive temporal patterns and, if more than one match is obtained, using a spectral distance measure to classify the sample with its nearest neighbor among these. Computational advantages can be obtained if both the temporal and spectral templates are maintained in a binary format reflecting the important sound components.
KeywordsDynamic Time Warping Template Size Natural Sound Continuous Speech Recognition Spectral Sample
Unable to display preview. Download preview PDF.
- J. S. Bridle. An efficient elastic-template method for detecting given words in running speech. Spring Meeting, British Acoust. Soc, 1973.Google Scholar
- A. L. Higgins and R. Wohlford. Keyword recognition using template concatenation. Proc. IEEE Int. Conf. ASSP, pages 1233–1236, 1985.Google Scholar
- R. R. Hoy. Acoustic communication in crickets: a model system for the study of feature detection. Federation Proc., 37:2316–2323, 1978.Google Scholar
- M. James. Pattern Recognition. John Wiley and Sons, New York, New York, 1988.Google Scholar
- L. R. Lamel, L. Rabiner, A. Rosenberg, and J. Wilpon. An improved endpoint detector for isolated word recognition. IEEE Trans. ASSP, ASSP-29:777–785, 1981.Google Scholar
- D. O'Shaughnessy. Speech Communication Human and Machine. Addison-Wesley Publishing Company, Reading, Massachusetts, 1987.Google Scholar
- B. Pinkowski. Discrete discriminant models: A performance simulation with reference to expert systems applications. In 20th Annual Simulation Symposium, pages 103–119. IEEE, 1987.Google Scholar
- B. Pinkowski. A rule-based approach for simulating errors in discrete sequential processes. In 22nd Annual Simulation Symposium, pages 145–152. IEEE, 1989.Google Scholar
- G. S. Pollack and R. R. Hoy. Temporal pattern as a cue for species-specific calling song recognition in crickets. Science, 204:429–432, 1979.Google Scholar
- L. R. Rabiner. On creating reference templates for speaker independent recognition of isolated words. IEEE Trans. ASSP, ASSP-26:34–42, 1978.Google Scholar
- L. R. Rabiner and M. R. Sambur. An algorithm for determining the endpoints of isolated utterances. Bell Sys. Tech. Journal, 54:297–315, 1975.Google Scholar
- N. Sugamura, K. Shikano, and S. Furui. Isolated word recognition using phoneme-like templates. ICASSP, pages 732–726, 1983.Google Scholar