A template-based approach for recognition of intermittent sounds

  • Ben Pinkowski
Track 2: Artificial Intelligence
Part of the Lecture Notes in Computer Science book series (LNCS, volume 507)


Automatic speech and sound recognition typically involves some measure of distance between training and (possibly time-warped) test samples. Special problems arise when the spectral samples of interest are intermittent and contain temporal patterns of alternating periods of sounds and pauses that are significant for recognition. In such cases a recognizer must be capable of distinguishing between the end-points and the pauses of digitized samples and economically searching the segmented sounds for the occurrence of significant spectral patterns. The usual distance metrics based on conventional dynamic time warping algorithms may be inappropriate because time-warping often corrupts the temporal structure of the sound. The problem can be solved by first searching a test sample for distinctive temporal patterns and, if more than one match is obtained, using a spectral distance measure to classify the sample with its nearest neighbor among these. Computational advantages can be obtained if both the temporal and spectral templates are maintained in a binary format reflecting the important sound components.


Dynamic Time Warping Template Size Natural Sound Continuous Speech Recognition Spectral Sample 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    J. S. Bridle. An efficient elastic-template method for detecting given words in running speech. Spring Meeting, British Acoust. Soc, 1973.Google Scholar
  2. [2]
    P. deSouza. A statistical approach to the design of an adaptive self-normalizing silence detection. IEEE Trans. ASSP, ASSP-31:678–684, 1983.CrossRefGoogle Scholar
  3. [3]
    J. Doherty and R. Hoy. Communication in insects. III. The auditory behavior of crickets: some views of genetic coupling, song recognition, and predator detection. Quarterly Review of Biology, 60:457–472, 1985.CrossRefGoogle Scholar
  4. [4]
    R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. John Wiley and Sons, New York, New York, 1973.zbMATHGoogle Scholar
  5. [5]
    J. L. Elman and D. Zipser. Learning the hidden structure of speech. Journal Accoust. Soc. Amer., 83:1615–1626, 1988.CrossRefGoogle Scholar
  6. [6]
    A. L. Higgins and R. Wohlford. Keyword recognition using template concatenation. Proc. IEEE Int. Conf. ASSP, pages 1233–1236, 1985.Google Scholar
  7. [7]
    R. R. Hoy. Acoustic communication in crickets: a model system for the study of feature detection. Federation Proc., 37:2316–2323, 1978.Google Scholar
  8. [8]
    M. James. Pattern Recognition. John Wiley and Sons, New York, New York, 1988.Google Scholar
  9. [9]
    L. R. Lamel, L. Rabiner, A. Rosenberg, and J. Wilpon. An improved endpoint detector for isolated word recognition. IEEE Trans. ASSP, ASSP-29:777–785, 1981.Google Scholar
  10. [10]
    D. O'Shaughnessy. Speech Communication Human and Machine. Addison-Wesley Publishing Company, Reading, Massachusetts, 1987.Google Scholar
  11. [11]
    B. Pinkowski. Discrete discriminant models: A performance simulation with reference to expert systems applications. In 20th Annual Simulation Symposium, pages 103–119. IEEE, 1987.Google Scholar
  12. [12]
    B. Pinkowski. A rule-based approach for simulating errors in discrete sequential processes. In 22nd Annual Simulation Symposium, pages 145–152. IEEE, 1989.Google Scholar
  13. [13]
    G. S. Pollack and R. R. Hoy. Temporal pattern as a cue for species-specific calling song recognition in crickets. Science, 204:429–432, 1979.Google Scholar
  14. [14]
    L. R. Rabiner. On creating reference templates for speaker independent recognition of isolated words. IEEE Trans. ASSP, ASSP-26:34–42, 1978.Google Scholar
  15. [15]
    L. R. Rabiner and M. R. Sambur. An algorithm for determining the endpoints of isolated utterances. Bell Sys. Tech. Journal, 54:297–315, 1975.Google Scholar
  16. [16]
    J. J. Schwartz. The importance of spectral and temporal properties in species and call recognition in a neotropical treefrog with a complex vocal repertoire. Animal Behavior, 35:340–347, 1987.CrossRefGoogle Scholar
  17. [17]
    N. Sugamura, K. Shikano, and S. Furui. Isolated word recognition using phoneme-like templates. ICASSP, pages 732–726, 1983.Google Scholar
  18. [18]
    J. Thorson, T. Weber, and F. Huber. Auditory behavior of the cricket. II. Simplicity of calling-song recognition in gryllus, and anomalous phonotaxis at abnormal carrier frequencies. Journal Comp. Physiol., 146:361–378, 1982.CrossRefGoogle Scholar
  19. [19]
    J. D. Tubbs. A note on binary template-matching. Pattern Recognition, 22:359–365, 1989.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1991

Authors and Affiliations

  • Ben Pinkowski
    • 1
  1. 1.Computer Science DepartmentWestern Michigan UniversityKalamazoo

Personalised recommendations