A noise-robust auditory modelling front end for voiced speech

  • Leslie S. Smith
Part I: Coding and Learning in Biology
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1327)


A method for detecting and displaying voiced elements of speech using amplitude modulated pulses due to unresolved harmonics of the excitation frequency (fundamental) is presented. It uses an auditory model consisting of a gammatone filterbank (modelling the basilar membrane), simple rectification (modelling the organ of Corti inner hair cells), envelope bandpass filters (modelling some spiral ganglion neuron effects) and amplitude modulation detectors (modelling certain cell populations in the cochlear nucleus). We demonstrate that it can display a pattern of activity across the spectrum and across time that describes the energy distribution in voiced speech, and that this pattern degrades slowly in the presence of non-speech noise.


Hair Cell Amplitude Modulation Sound Pressure Level Auditory Processing Basilar Membrane 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    J.B. Allen. How do humans process and recognize speech. IEEE Transactions on Speech and Auditory Processing, 2(4):567–577, 1994.Google Scholar
  2. 2.
    A.S. Bregman. Auditory scene analysis. MIT Press, 1990.Google Scholar
  3. 3.
    B.R. Glasberg and B.C.J. Moore. Derivation of filter shapes from notched-noise data. Hearing Research, 47:103–138, 1990.Google Scholar
  4. 4.
    D.O. Kim, J.G. Sirianni, and S.O. Chang. Responses of den-pvcn neurons and auditory nerve fibres in unanesthetized decerebrate cats to am and pure tones: analysis with autocorrelation/power-spectrum. Hearing Research, 45:95–113, 1990.Google Scholar
  5. 5.
    Smith L.S. A neurally motivated technique for voicing detection and f 0 estimation in speech. Technical report, Centre for Cognitive and Computational Neuroscience, University of Stirling, Stirling UK, 1996.Google Scholar
  6. 6.
    Smith L.S. Onset-based sound segmentation. In D.S. Touretzky, M.C. Mozer, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 729–735. MIT Press, 1996.Google Scholar
  7. 7.
    A.R. Palmer and I.M. Winter. Cochlear nerve and cochlear nucleus responses to the fundamental frequency of voiced speech sounds and harmonic complex tones. Advances in the Biosciences, 83:231–239, 1992.Google Scholar
  8. 8.
    R.D. Patterson, M.H. Allerhand, and C. Giguere. Time-domain modelling of peripheral auditory processing: A modular architecture and a software platform. Journal of the Acoustical Society of America, 98:1890–1894, 1995.Google Scholar
  9. 9.
    I.M. Winter and A.R. Palmer. Level dependence of cochlear nucleus onset unit responses and facilitation by second tones or broadband noise. Journal of Neuroscience, 73(1):141–159, 1995.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Leslie S. Smith
    • 1
  1. 1.Department of Computing Science and MathematicsUniversity of StirlingStirlingScotland

Personalised recommendations