Altering the rhythm of target and background talkers differentially affects speech understanding

Abstract

Three experiments investigated listeners’ ability to use speech rhythm to attend selectively to a single target talker presented in multi-talker babble (Experiments 1 and 2) and in speech-shaped noise (Experiment 3). Participants listened to spoken sentences of the form “Ready [Call sign] go to [Color] [Number] now” and reported the Color and Number spoken by a target talker (cued by the Call sign “Baron”). Experiment 1 altered the natural rhythm of the target talker and background talkers for two-talker and six-talker backgrounds. Experiment 2 considered parametric rhythm alterations over a wider range, altering the rhythm of either the target or the background talkers. Experiments 1 and 2 revealed that altering the rhythm of the target talker, while keeping the rhythm of the background intact, reduced listeners’ ability to report the Color and Number spoken by the target talker. Conversely, altering the rhythm of the background talkers, while keeping the target rhythm intact, improved listeners ability to report the Color and Number spoken by the target talker. Experiment 3, which embedded the target talker in speech-shaped noise rather than multi-talker babble, similarly reduced recognition of the target sentence with increased alteration of the target rhythm. This pattern of results favors a dynamic-attending theory-based selective-entrainment hypothesis over a disparity-based segregation hypothesis and an increased salience hypothesis.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Notes

  1. 1.

    The 0 and -2 dB SNRs for the two- and six-talker backgrounds were selected to equate the performance in recognizing both Color and Number at a level that was approximately 50% correct for the unaltered rhythm conditions. This performance level agreed well with the psychometric properties of the CRM-sentence recognition in noise reported in previous studies. Eddins and Liu (2012) measured the recognition of Color and Number in CRM sentences in two- and four-talker babble noises. In the two- and four-talker background conditions, the SNRs required to reach 50% correct for Color and Number recognition were -1.3 dB and -5.4 dB, respectively. Similar to the current study, a lower SNR was needed to equate the task performance for the greater number of background talkers. This also generally agrees with various other studies that measured open-set speech recognition in multi-talker backgrounds. For example, Rosen et al. (2013) measured sentence recognition in multi-talker babble for fixed SNRs of –6 and –2 dB, as the number of background talkers increased from 2 to 16; here, recognition performance gradually improved about 2% for every doubling of the number of background talkers. Freyman et al. (2004) showed that the SNR required to reach a sentence-recognition performance of 50% decreased from approximately -0.5 dB for two-talker backgrounds to approximately -3 dB for six-talker backgrounds.

References

  1. Aubanel, V., Davis, C., & Kim, J. (2016). Exploring the role of brain oscillations in speech perception in noise: intelligibility of isochronously retimed speech. Frontiers in Human Neuroscience, 10, 430.

    PubMed  PubMed Central  Article  Google Scholar 

  2. Baese-Berk, M. M., Dilley, L. C., Henry, M. J., Vinke, L., & Banzina, E. (2019). Not just a function of function words: Distal speech rate influences perception of prosodically weak syllables. Attention, Perception, & Psychophysics, 81, 571-589.

    Article  Google Scholar 

  3. Barnes, R., & Jones, M. R. (2000). Expectancy, attention, and time. Cognitive Psychology, 41, 254-311.

    PubMed  Article  Google Scholar 

  4. Bolia, R. S., Nelson, W. T., Ericson, M. A., & Simpson, B. D. (2000). A speech corpus for multitalker communications research. Journal of the Acoustical Society of America, 107, 1065-1066.

    PubMed  Article  Google Scholar 

  5. Bregman, A. S. (1990). Auditory scene analysis. Cambridge, MA: MIT Press.

    Google Scholar 

  6. Bregman, A. S., Abramson, J., Doehring, P., & Darwin, C. J. (1985). Spectral integration based on common amplitude modulation. Perception & Psychophysics, 37, 483-493.

    Article  Google Scholar 

  7. Calandruccio, L., Dhar, S., & Bradlow, A.R. (2010). Speech-on-speech masking with variable access to the linguistic content of the masker speech. The Journal of the Acoustical Society of America, 128, 860-869.

    PubMed  PubMed Central  Article  Google Scholar 

  8. Calandruccio, L., Bradlow, A.R., & Dhar, S. (2014). Speech-on-speech masking with variable access to the linguistic content of the masker speech for native and non-native speakers of English. Journal of the American Academy of Audiology, 25, 355-366.

    PubMed  PubMed Central  Article  Google Scholar 

  9. Cherry, E. C. (1953). Some experiments on the recognition of speech, with one and with two ears. The Journal of the Acoustical Society of America, 25, 975-979.

    Article  Google Scholar 

  10. Darwin, C.J. (1975). On the dynamic use of prosody in speech perception. Haskins Laboratories Status Report on Speech Research 42-43, 103-115.

    Google Scholar 

  11. Dauer, R. M. (1983). Stress-timing and syllable-timing reanalyzed. Journal of Phonetics, 11, 51-62.

    Article  Google Scholar 

  12. Dilley, L. C., & McAuley, J. D. (2008). Distal prosodic context affects word segmentation and lexical processing. Journal of Memory and Language, 59, 294-311.

    Article  Google Scholar 

  13. Ding, N., & Simon, J. Z. (2012). Emergence of neural encoding of auditory objects while listening to competing speakers. Proceedings of the National Academy of Sciences, 109, 11854-11859.

    Article  Google Scholar 

  14. Ding, N., Melloni, L., Zhang, H., Tian, X., & Poeppel, D. (2016). Cortical tracking of hierarchical linguistic structures in connected speech. Nature Neuroscience, 19, 158.

    PubMed  Article  Google Scholar 

  15. Eddins, D. A., & Liu, C. (2012). Psychometric properties of the coordinate response measure corpus with various types of background interference. The Journal of the Acoustical Society of America, 131, EL177-EL183.

    PubMed  PubMed Central  Article  Google Scholar 

  16. Freyman, R. L., Balakrishnan, U., & Helfer, K. S. (2004). Effect of number of masking talkers and auditory priming on informational masking in speech recognition. The Journal of the Acoustical Society of America, 115, 2246-2256.

    PubMed  Article  Google Scholar 

  17. Ghitza, O. (2011). Linking speech perception and neurophysiology: speech decoding guided by cascaded oscillators locked to the input rhythm. Frontiers in Psychology, 2, 130.

    PubMed  PubMed Central  Article  Google Scholar 

  18. Giraud, A. L., & Poeppel, D. (2012). Cortical oscillations and speech processing: emerging computational principles and operations. Nature Neuroscience, 15, 511.

    PubMed  PubMed Central  Article  Google Scholar 

  19. Golumbic, E. M. Z., Ding, N., Bickel, S., Lakatos, P., Schevon, C. A., McKhann, G. M., Simon, J.Z., Poeppel, D. & Schroeder, C. (2013). Mechanisms underlying selective neuronal tracking of attended speech at a “cocktail party”. Neuron, 77, 980-991.

    Article  Google Scholar 

  20. Golumbic, E. M. Z., Poeppel, D., & Schroeder, C. E. (2012). Temporal context in speech processing and attentional stream selection: a behavioral and neural perspective. Brain and Language, 122, 151-161.

    Article  Google Scholar 

  21. Horton, C., D'Zmura, M., & Srinivasan, R. (2013). Suppression of competing speech through entrainment of cortical oscillations. Journal of Neurophysiology, 109, 3082-3093.

    PubMed  PubMed Central  Article  Google Scholar 

  22. Huggins, A. W. (1972). On the perception of temporal phenomena in speech. Journal of the Acoustical Society of America, 51, 1279-90.

    PubMed  Article  Google Scholar 

  23. Humes, L. E., Kidd, G. R., & Fogerty, D. (2017). Exploring use of the coordinate response measure in a multitalker babble paradigm. Journal of Speech, Language, and Hearing Research, 60, 741-754.

    PubMed  PubMed Central  Article  Google Scholar 

  24. Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, attention, and memory. Psychological Review, 83, 323-355.

    PubMed  Article  Google Scholar 

  25. Jones, M. R., & Boltz, M. (1989). Dynamic attending and responses to time. Psychological Review, 96, 459-491.

    PubMed  Article  Google Scholar 

  26. Jones, M. R., Kidd, G., & Wetzel, R. (1981). Evidence for rhythmic attention. Journal of Experimental Psychology: Human Perception and Performance, 7, 1059-1073

    PubMed  Google Scholar 

  27. Jones, M.R, Moynihan, H., MacKenzie, N., & Puente, J. (2002). Temporal aspects of stimulus-driven attending in dynamic arrays. Psychological Science, 13, 313-319.

    PubMed  Article  Google Scholar 

  28. Kashino, M., & Hirahara, T. (1996). One, two, many—Judging the number of concurrent talkers. The Journal of the Acoustical Society of America, 99, 2596-2603.

    Google Scholar 

  29. Kidd, G. R. (1989). Articulatory-rate context effects in phoneme identification. Journal of Experimental Psychology: Human Perception and Performance, 15, 736-748.

    PubMed  Google Scholar 

  30. Kidd, G. R., & Humes, L. E. (2014). Tempo-based segregation of spoken sentences. The Journal of the Acoustical Society of America, 136, 2311-2311.

    Article  Google Scholar 

  31. Large, E. W., & Jones, M. R. (1999). The dynamics of attending: How people track time-varying events. Psychological Review, 106, 119-159.

    Article  Google Scholar 

  32. Marin, C., & McAdams, S. (1991). Segregation of concurrent sounds: II. effects of spectral envelope tracing, frequency modulation coherence, and frequency modulation width. Journal of the Acoustical Society of America, 89, 341-351.

    PubMed  Article  Google Scholar 

  33. McAdams, S. (1989). Segregation of concurrent sounds, I: Effects of frequency modulation coherence. Journal of the Acoustical Society of America, 86, 2148-2159.

    PubMed  Article  Google Scholar 

  34. McAuley, J. D., & Jones, M. R. (2003). Modeling effects of rhythmic context on perceived duration: A comparison of interval and entrainment approaches to short-interval timing. Journal of Experimental Psychology: Human Perception and Performance, 29, 1102-1125.

    PubMed  Google Scholar 

  35. McAuley, J. D., Jones, M. R., Holub, S., Johnston, H. M., & Miller, N. S. (2006). The time of our lives: Life span development of timing and event tracking. Journal of Experimental Psychology: General, 135, 348-367.

    Article  Google Scholar 

  36. Middlebrooks, J. C., Simon, J. Z., Popper, A. N., & Fay, R. R. (Eds.). (2017). The auditory system at the cocktail party (Vol. 60). Cham, Switzerland: Springer.

    Google Scholar 

  37. Miller, J., Carlson, L., & McAuley, J.D. (2013). When what you hear influences when you see: Listening to an auditory rhythm influences the temporal allocation of visual attention. Psychological Science. 24, 11-18.

    PubMed  Article  Google Scholar 

  38. Morrill, T. H., Dilley, L. C., McAuley, J.D., & Pitt, M. A. (2014). Distal rhythm influences whether or not listeners hear a word in continuous speech: Support for a perceptual grouping hypothesis. Cognition, 131, 69-74.

    PubMed  Article  Google Scholar 

  39. Moulines, E., & Charpentier, F. (1990). Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Communication, 9, 453-467.

    Article  Google Scholar 

  40. Peelle, J. E., & Davis, M. H. (2012). Neural oscillations carry speech rhythm through to comprehension. Frontiers in Psychology, 3, 320.

    PubMed  PubMed Central  Article  Google Scholar 

  41. Peelle, J.E., Gross, J., & Davis, M.H. (2013). Phase-locked responses to speech in human auditory cortex are enhanced during comprehension. Cerebral Cortex, 23, 1378-1387.

    PubMed  Article  Google Scholar 

  42. Poeppel, D. (2003). The analysis of speech in different temporal integration windows: cerebral lateralization as ‘asymmetric sampling in time.’ Speech Communication. 41, 245-255.

    Article  Google Scholar 

  43. Richards, V. M., Shen, Y., & Chubb, C. (2013). Level dominance for the detection of changes in level distribution in sound streams. The Journal of the Acoustical Society of America, 134, EL237-EL243.

    PubMed  PubMed Central  Article  Google Scholar 

  44. Riecke, L., Formisano, E., Sorger, B., Baskent, D., & Gaudrain, E. (2018). Neural entrainment to speech modulates speech intelligibility. Current Biology, 28, 161-169.

    PubMed  Article  Google Scholar 

  45. Rimmele, J. M., Golumbic, E. Z., Schröger, E., & Poeppel, D. (2015). The effects of selective attention and speech acoustics on neural speech-tracking in a multi-talker scene. Cortex, 68, 144-154.

    PubMed  PubMed Central  Article  Google Scholar 

  46. Rimmele, J. M., Jolsvai, H., & Sussman, E. (2011). Auditory target detection is affected by implicit temporal and spatial expectations. Journal of Cognitive Neuroscience, 23, 1136-1147.

    PubMed  Article  Google Scholar 

  47. Rimmele, J. M., Schröger, E., & Bendixen, A. (2012). Age-related changes in the use of regular patterns for auditory scene analysis. Hearing Research, 289, 98-107.

    PubMed  Article  Google Scholar 

  48. Rosen, S., Souza, P., Ekelund, C., & Majeed, A. A. (2013). Listening to speech in a background of other talkers: Effects of talker number and noise vocoding. The Journal of the Acoustical Society of America, 133, 2431-2443.

    PubMed  PubMed Central  Article  Google Scholar 

  49. Rosen, S. (1992). Temporal information in speech: acoustic, auditory and linguistic aspects. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 336(1278), 367-373.

    Google Scholar 

  50. Smith, M.R., Cutler, A., Butterfield, S., and Nimmo-Smith, I. (1989). Perception of rhythm and word boundaries in noise-masked speech. Journal of Speech, Language and Hearing Research, 32, 912-920.

    Article  Google Scholar 

  51. Snyder, J. S., Gregg, M. K., Weintraub, D. M., & Alain, C. (2012). Attention, awareness, and the perception of auditory scenes. Frontiers in Psychology, 3, 17.

    Article  Google Scholar 

  52. Tilsen, S., & Arvaniti, A. (2013). Speech rhythm analysis with decomposition of the amplitude envelope: characterizing rhythmic patterns within and across languages. The Journal of the Acoustical Society of America, 134, 628-639.

    PubMed  Article  Google Scholar 

  53. Van Engen, K.J., & Bradlow, A.R. (2007). Sentence recognition in native- and foreighn-language multi-talker background noise. The Journal of the Acoustical Society of America, 121, 519-526.

    PubMed  PubMed Central  Article  Google Scholar 

  54. Wang, M., Kong, L., Zhang, C., Wu, X., & Li, L. (2018). Speaking rhythmically improves speech recognition under “cocktail-party” conditions. The Journal of the Acoustical Society of America, 143, EL255-EL259.

    PubMed  Article  Google Scholar 

  55. Zeng, F., Nie, K., Stickney, G. S., Kong, Y., Vongphoe, M., Bhargave, A., … Cao, K. (2005). Speech recognition with amplitude and frequency modulations. Proceedings of the National Academy of Sciences of the United States of America, 102, 2293-2298.

    PubMed  PubMed Central  Article  Google Scholar 

  56. Zhong, X., & Yost, W. A. (2017). How many images are in an auditory scene? The Journal of the Acoustical Society of America, 141, 2882-2892.

    PubMed  PubMed Central  Article  Google Scholar 

Download references

Acknowledgements

The authors thank Audrey Drotos, Anusha Mamidipaka, and Paul Clancy for their assistance with data collection and their insights and many helpful comments over the course of the project, Dylan V. Pearson at Indiana University for assisting with stimulus generation, and members of the Timing, Attention and Perception Lab at Michigan State University for their helpful suggestions and comments at various stages of this project. NIH Grant R01DC013538 (PIs: Gary R. Kidd and J. Devin McAuley) supported this research.

Open Practices Statement

The data and materials for all experiments will be made available at http://taplab.psy.msu.edu. None of the experiments were pre-registered.

Author information

Affiliations

Authors

Corresponding author

Correspondence to J. Devin McAuley.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

McAuley, J.D., Shen, Y., Dec, S. et al. Altering the rhythm of target and background talkers differentially affects speech understanding. Atten Percept Psychophys (2020). https://doi.org/10.3758/s13414-020-02064-5

Download citation

Keywords

  • Speech perception
  • Attention: Selective
  • Temporal Processing