A noise-robust auditory modelling front end for voiced speech
A method for detecting and displaying voiced elements of speech using amplitude modulated pulses due to unresolved harmonics of the excitation frequency (fundamental) is presented. It uses an auditory model consisting of a gammatone filterbank (modelling the basilar membrane), simple rectification (modelling the organ of Corti inner hair cells), envelope bandpass filters (modelling some spiral ganglion neuron effects) and amplitude modulation detectors (modelling certain cell populations in the cochlear nucleus). We demonstrate that it can display a pattern of activity across the spectrum and across time that describes the energy distribution in voiced speech, and that this pattern degrades slowly in the presence of non-speech noise.
KeywordsHair Cell Amplitude Modulation Sound Pressure Level Auditory Processing Basilar Membrane
Unable to display preview. Download preview PDF.
- 1.J.B. Allen. How do humans process and recognize speech. IEEE Transactions on Speech and Auditory Processing, 2(4):567–577, 1994.Google Scholar
- 2.A.S. Bregman. Auditory scene analysis. MIT Press, 1990.Google Scholar
- 3.B.R. Glasberg and B.C.J. Moore. Derivation of filter shapes from notched-noise data. Hearing Research, 47:103–138, 1990.Google Scholar
- 4.D.O. Kim, J.G. Sirianni, and S.O. Chang. Responses of den-pvcn neurons and auditory nerve fibres in unanesthetized decerebrate cats to am and pure tones: analysis with autocorrelation/power-spectrum. Hearing Research, 45:95–113, 1990.Google Scholar
- 5.Smith L.S. A neurally motivated technique for voicing detection and f 0 estimation in speech. Technical report, Centre for Cognitive and Computational Neuroscience, University of Stirling, Stirling UK, 1996.Google Scholar
- 6.Smith L.S. Onset-based sound segmentation. In D.S. Touretzky, M.C. Mozer, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 729–735. MIT Press, 1996.Google Scholar
- 7.A.R. Palmer and I.M. Winter. Cochlear nerve and cochlear nucleus responses to the fundamental frequency of voiced speech sounds and harmonic complex tones. Advances in the Biosciences, 83:231–239, 1992.Google Scholar
- 8.R.D. Patterson, M.H. Allerhand, and C. Giguere. Time-domain modelling of peripheral auditory processing: A modular architecture and a software platform. Journal of the Acoustical Society of America, 98:1890–1894, 1995.Google Scholar
- 9.I.M. Winter and A.R. Palmer. Level dependence of cochlear nucleus onset unit responses and facilitation by second tones or broadband noise. Journal of Neuroscience, 73(1):141–159, 1995.Google Scholar