Word Spotting in Background Music: a Behavioural Study

  • Letizia MarchegianiEmail author
  • Xenofon Fafoutis


Introduction Speech intelligibility in realistic environments is directly correlated with the ability of focusing attention on the sounds of interest while discarding the background noise and other competing stimuli. This work investigates task-driven auditory attention in noisy environments. Specifically, this study focuses on the ability to successfully execute a word spotting task while speech perception has to cope with the presence of music playing in the background. Methods The executed behavioural experiments consider different types of songs and explore how their distinct characteristics (such as dynamics or presence of distortion sound effects) affect the subjects’ task performance and, thus, the distribution of attention. Results Our results show that the ability of correctly separating the target sound from the background noise has a major impact on the performance of the subjects. Indeed, songs not presenting any distortion effect result in being more distracting than the ones with distortion, whose frequency spectrum envelop differentiates more from the one of the narrative. Furthermore, subjects performed the worst with songs characterised by high dynamics playing in the background, due to the unexpected changes capturing the attention of the listener.


Speech perception Auditory attention Word spotting Cocktail party Auditory masking Music perception 


Compliance with Ethical Standards

Conflict of Interest

The authors declare that they have no conflict of interest.

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed Consent

Informed consent was obtained from all individual participants included in the study.


  1. 1.
    Guediche S, Blumstein S, Fiez J, Holt LL. Speech perception under adverse conditions: insights from behavioral, computational, and neuroscience research. Front Syst Neurosci 2014;7.126:41–56.Google Scholar
  2. 2.
    Fujita K, Hara Y, Suzukawa Y, Kashimori Y. Decoding word information from spatiotemporal activity of sensory neurons. Cogn Comput 2014;6(2):145–157.CrossRefGoogle Scholar
  3. 3.
    Abel A, Hussain A. Novel two-stage audiovisual speech filtering in noisy environments. Cogn Comput 2014; 6(2):200–217.CrossRefGoogle Scholar
  4. 4.
    Hussain A, Barker J, Marxer R, Adeel A, Whitmer W, Watt R, Derleth P. 2017. Towards multi-modal hearing aid design and evaluation in realistic audio-visual settings: Challenges and opportunities Proceedings of the 1st International Workshop on Challenges in Hearing Assistive Technology.Google Scholar
  5. 5.
    Stone MA, Füllgrabe C, Mackinnon RC, Moore BCJ. The importance for speech intelligibility of random fluctuations in ”steady” background noise. J Acoust Soc Amer 2011;130.5:2874–2881.CrossRefGoogle Scholar
  6. 6.
    Kidd G Jr, Mason CR, Richards VM, Durlach NI. Informational masking. Auditory perception of sound sources. Berlin: Springer; 2008, pp. 143–189.Google Scholar
  7. 7.
    Mitterer H, Mattys SL. How does cognitive load influence speech perception? an encoding hypothesis. Atten Percept Psychophys 2017;79(1):344–351.CrossRefGoogle Scholar
  8. 8.
    Cooke M, Garcia Lecumberri ML, Barker J. The foreign language cocktail party problem: Energetic and informational masking effects in non-native speech perception. J Acoust Soc Amer 2008;123.1:414–427.CrossRefGoogle Scholar
  9. 9.
    Marchegiani L, Fafoutis X. On cross-language consonant identification in second language noise. J Acoust Soc Amer 2015;138.4:2206–2209.CrossRefGoogle Scholar
  10. 10.
    Colin Cherry E. Some experiments on the recognition of speech, with one and with two ears. The Journal of the Acoustical Society of America 1953;25(5):975–979. Melville, NY, USA.CrossRefGoogle Scholar
  11. 11.
    Marchegiani L, Fafoutis X, Abbaspour S. Speech identification and comprehension in the urban soundscape. Environments 2018;5(5):56.CrossRefGoogle Scholar
  12. 12.
    Marchegiani L, Karadogan SG, Andersen T, Larsen J, Hansen LK. The role of top-down attention in the cocktail party: Revisiting cherry’s experiment after sixty years. In: Proceedings of the IEEE International Conference on Machine Learning and Applications and Workshops (ICMLA). New York: IEEE; 2011, vol. 1. pp 183–188.Google Scholar
  13. 13.
    Golob EJ, Brent Venable K, Scheuerman J, Anderson MT. Computational modeling of auditory spatial attention. In: Annu. Conf. Cogn. Sci. Soc; 2017, vol. 39.Google Scholar
  14. 14.
    Grange JA, Culling JF. The effect of listener head orientation on speech intelligibility in noise. J Acoust Soc Amer 2017;141(5):3971–3971.CrossRefGoogle Scholar
  15. 15.
    Parente JA. Music preference as a factor of music distraction. Percept Motor Skills 1976;43(1):337–338. SAGE Publications, New York.CrossRefGoogle Scholar
  16. 16.
    Doborjeh ZG, Doborjeh MG, Kasabov N. Attentional bias pattern recognition in spiking neural networks from spatio-temporal eeg data. Cogn Comput 2018;10(1):35–48.CrossRefGoogle Scholar
  17. 17.
    North AC, Hargreaves DJ. Music and driving game performance. Scand J Psychol 1999;40(4):285–292. Wiley, Haboken.CrossRefGoogle Scholar
  18. 18.
    Wolfe DE. Effects of music loudness on task performance and self-report of college-aged students. J Res Music Educ 1983;31(3):191–201. SAGE Publications, New York.CrossRefGoogle Scholar
  19. 19.
    Kallinen K. Reading news from a pocket computer in a distracting environment: effects of the tempo of background music. Comput Human Behav 2002;18(5):537–551. Elsevier, Amsterdam.CrossRefGoogle Scholar
  20. 20.
    Maidhof C, Koelsch S. Effects of selective attention on syntax processing in music and language. J Cogn Neurosci 2011;23.9:2252–2267.CrossRefGoogle Scholar
  21. 21.
    Slater J, Kraus M. The role of rhythm in perceiving speech in noise: a comparison of percussionists, vocalists and non-musicians. Cognitive processing. Springer 2016;17.1:79–87.Google Scholar
  22. 22.
    Ferreri L, Verga L. Benefits of music on verbal learning and memory. Music Percept: Interdiscip J 2016; 34.2:167–182.CrossRefGoogle Scholar
  23. 23.
    Heinke D, Backhaus A. Modelling visual search with the selective attention for identification model (VS-SAIM): a novel explanation for visual search asymmetries. Cogn Comput 2011;3.1:185–205.CrossRefGoogle Scholar
  24. 24.
    Tu Z, Abel A, Zhang L, Luo B, Hussain. A new spatio-temporal saliency-based video object segmentation. Cogn Comput 2016;8.4:629–647.CrossRefGoogle Scholar
  25. 25.
    Riche N, Mancas M, Culibrk D, Crnojevic V, Gosselin B, Dutoit T. Dynamic saliency models and human attention: a comparative study on videos. Berlin: Asian Conference on Computer Vision. Springer; 2012, pp. 586–598.Google Scholar
  26. 26.
    Burgess TW. The adventures of reddy fox. Chicago: Courier Corporation; 2012, pp. 1–86.Google Scholar
  27. 27.
    Itti L, Baldi P. Bayesian surprise attracts human attention. Vis Res 2009;49(10):1295–1306. Elsevier, Amsterdam.CrossRefGoogle Scholar
  28. 28.
    Mayfield C, Moss S. Effect of music tempo on task performance. Psychol Rep 1989;65(3 -Part 2):1283–90. SAGE Publications, New York.CrossRefGoogle Scholar
  29. 29.
    Marchegiani L, Fafoutis X. A behavioral study on the effects of rock music on auditory attention. Proceedings of the International Workshop on Human Behavior Understanding. Berlin: Springer; 2013. p. 15–26.Google Scholar
  30. 30.
    Hiyoshi-Taniguchi K, Kawasaki M, Yokota T, Bakardjian H, Fukuyama H, Cichocki A, Vialatte FB. EEG correlates of voice and face emotional judgments in the human brain. Cogn Comput 2015;7.1: 11–19. SAGE publications, New York.CrossRefGoogle Scholar
  31. 31.
    Wang DL. On ideal binary mask as the computational goal of auditory scene analysis. Speech separation by humans and machines. Berlin: Springer; 2005. p. 181–197.Google Scholar
  32. 32.
    Wang DL, Kjems U, Pedersen MS, Boldt JB, Lunner T. Speech intelligibility in background noise with ideal binary time-frequency masking. J Acoust Soc Amer 2009;125(4):2336–2347.CrossRefGoogle Scholar
  33. 33.
    Lyon R. A computational model of filtering, detection, and compression in the cochlea. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and signal processing (ICASSP). New York: IEEE; 1982. vol. 7. pp. 1282–1285.Google Scholar
  34. 34.
    Lagrange M, Badeau R, Richard G. Robust similarity metrics between audio signals based on asymmetrical spectral envelope matching. IEEE International Conference On Acoustics Speech and Signal Processing (ICASSP). IEEE; 2010. pp 405–408.Google Scholar
  35. 35.
    Rabiner Lawrence, Biing-Hwang J. Fundamentals of speech recognition PTR. Englewood Cliffs: Prentice Hall; 1993.Google Scholar
  36. 36.
    Kim YE, Schmidt EM, Migneco R, Morton BG, Richardson P, Scott J, Speck JA, Turnbull D. Music emotion recognition: A state of the art review. Proceeding of the International Society for Music Information Retrieval Conference (ISMIR). Canada; 2010. p. 255–266.Google Scholar
  37. 37.
    Vazquez-Alvarez Y, Brewster SA. Eyes-free multitasking: the effect of cognitive load on mobile spatial audio interfaces. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’11). New York: ACM; 2011. p. 2173–2176.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Electronic SystemsAalborg UniversityAalborg ØDenmark
  2. 2.Department of Applied Mathematics and Computer ScienceTechnical University of DenmarkKgs. LyngbyDenmark

Personalised recommendations