Abstract
Keyword identification in one of two simultaneous sentences is improved when the sentences differ in F0, particularly when they are almost continuously voiced. Sentences of this kind were recorded, monotonised using PSOLA, and re-synthesised to give a range of harmonic ∆F0s (0, 1, 3, and 10 semitones). They were additionally re-synthesised by LPC with the LPC residual frequency shifted by 25% of F0, to give excitation with inharmonic but regularly spaced components. Perceptual identification of frequency-shifted sentences showed a similar large improvement with nominal ∆F0 as seen for harmonic sentences, although overall performance was about 10% poorer. We compared performance with that of two autocorrelation-based computational models comprising four stages: (i) peripheral frequency selectivity and half-wave rectification; (ii) within-channel periodicity extraction; (iii) identification of the two major peaks in the summary autocorrelation function (SACF); (iv) a template-based approach to speech recognition using dynamic time warping. One model sampled the correlogram at the target-F0 period and performed spectral matching; the other deselected channels dominated by the interferer and performed matching on the short-lag portion of the residual SACF. Both models reproduced the monotonic increase observed in human performance with increasing ∆F0 for the harmonic stimuli, but not for the frequency-shifted stimuli. A revised version of the spectral-matching model, which groups patterns of periodicity that lie on a curve in the frequency-delay plane, showed a closer match to the perceptual data for frequency-shifted sentences. The results extend the range of phenomena originally attributed to harmonic processing to grouping by common spectral pattern.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Assmann PF, Summerfield Q (1990) J Acoust Soc Am 88:680–697
Bird J, Darwin CJ (1998) In: Palmer AR et al (eds) Psychophysical and physiological advances in hearing. Whurr, London, pp 263–269
Boersma P, Weenink D (1996) Praat, a system for doing phonetics by computer. Institute of Phonetic Sciences, University of Amsterdam
Brokx JPL, Nooteboom SG (1982) J Phonet 10:23–36
Brown GJ, Wang DL (1997) Neural Netw 10:1547–1558
Carlyon RP, Gockel HE (2008) In: Yost WA et al (eds) Auditory perception of sound sources. Springer, New York, pp 191–213
Culling JF, Darwin CJ (1994) J Acoust Soc Am 95:1559–1569
de Cheveigné A (1993) J Acoust Soc Am 93:3271–3290
Duifhuis H, Willems LF, Sluyter RJ (1982) J Acoust Soc Am 71:1568–1580
Ellis D (2003) http://www.ee.columbia.edu/∼dpwe/resources/matlab/dtw/
Lopez-Poveda EA, Meddis RM (2001) J Acoust Soc Am 110:3107–3118
Meddis R, Hewitt MJ (1992) J Acoust Soc Am 91:233–245
Moulines E, Charpentier F (1990) Speech Commun 9:453–467
Parsons TW (1976) J Acoust Soc Am 60:911–918
Roberts B (2005) Acta Acust Acust 91:945–957
Roberts B, Bregman AS (1991) J Acoust Soc Am 90:3050–3060
Roberts B, Brunstrom JM (1998) J Acoust Soc Am 104:2326–2338
Roberts B, Brunstrom JM (2001) J Acoust Soc Am 110:2479–2490
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this paper
Cite this paper
Roberts, B., Holmes, S.D., Darwin, C.J., Brown, G.J. (2010). Perception of Concurrent Sentences with Harmonic or Frequency-Shifted Voiced Excitation: Performance of Human Listeners and of Computational Models Based on Autocorrelation. In: Lopez-Poveda, E., Palmer, A., Meddis, R. (eds) The Neurophysiological Bases of Auditory Perception. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-5686-6_48
Download citation
DOI: https://doi.org/10.1007/978-1-4419-5686-6_48
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-5685-9
Online ISBN: 978-1-4419-5686-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)