Word Spotting in Background Music: a Behavioural Study
- 11 Downloads
Introduction Speech intelligibility in realistic environments is directly correlated with the ability of focusing attention on the sounds of interest while discarding the background noise and other competing stimuli. This work investigates task-driven auditory attention in noisy environments. Specifically, this study focuses on the ability to successfully execute a word spotting task while speech perception has to cope with the presence of music playing in the background. Methods The executed behavioural experiments consider different types of songs and explore how their distinct characteristics (such as dynamics or presence of distortion sound effects) affect the subjects’ task performance and, thus, the distribution of attention. Results Our results show that the ability of correctly separating the target sound from the background noise has a major impact on the performance of the subjects. Indeed, songs not presenting any distortion effect result in being more distracting than the ones with distortion, whose frequency spectrum envelop differentiates more from the one of the narrative. Furthermore, subjects performed the worst with songs characterised by high dynamics playing in the background, due to the unexpected changes capturing the attention of the listener.
KeywordsSpeech perception Auditory attention Word spotting Cocktail party Auditory masking Music perception
Compliance with Ethical Standards
Conflict of Interest
The authors declare that they have no conflict of interest.
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed consent was obtained from all individual participants included in the study.
- 1.Guediche S, Blumstein S, Fiez J, Holt LL. Speech perception under adverse conditions: insights from behavioral, computational, and neuroscience research. Front Syst Neurosci 2014;7.126:41–56.Google Scholar
- 4.Hussain A, Barker J, Marxer R, Adeel A, Whitmer W, Watt R, Derleth P. 2017. Towards multi-modal hearing aid design and evaluation in realistic audio-visual settings: Challenges and opportunities Proceedings of the 1st International Workshop on Challenges in Hearing Assistive Technology.Google Scholar
- 6.Kidd G Jr, Mason CR, Richards VM, Durlach NI. Informational masking. Auditory perception of sound sources. Berlin: Springer; 2008, pp. 143–189.Google Scholar
- 12.Marchegiani L, Karadogan SG, Andersen T, Larsen J, Hansen LK. The role of top-down attention in the cocktail party: Revisiting cherry’s experiment after sixty years. In: Proceedings of the IEEE International Conference on Machine Learning and Applications and Workshops (ICMLA). New York: IEEE; 2011, vol. 1. pp 183–188.Google Scholar
- 13.Golob EJ, Brent Venable K, Scheuerman J, Anderson MT. Computational modeling of auditory spatial attention. In: Annu. Conf. Cogn. Sci. Soc; 2017, vol. 39.Google Scholar
- 21.Slater J, Kraus M. The role of rhythm in perceiving speech in noise: a comparison of percussionists, vocalists and non-musicians. Cognitive processing. Springer 2016;17.1:79–87.Google Scholar
- 25.Riche N, Mancas M, Culibrk D, Crnojevic V, Gosselin B, Dutoit T. Dynamic saliency models and human attention: a comparative study on videos. Berlin: Asian Conference on Computer Vision. Springer; 2012, pp. 586–598.Google Scholar
- 26.Burgess TW. The adventures of reddy fox. Chicago: Courier Corporation; 2012, pp. 1–86.Google Scholar
- 29.Marchegiani L, Fafoutis X. A behavioral study on the effects of rock music on auditory attention. Proceedings of the International Workshop on Human Behavior Understanding. Berlin: Springer; 2013. p. 15–26.Google Scholar
- 31.Wang DL. On ideal binary mask as the computational goal of auditory scene analysis. Speech separation by humans and machines. Berlin: Springer; 2005. p. 181–197.Google Scholar
- 33.Lyon R. A computational model of filtering, detection, and compression in the cochlea. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and signal processing (ICASSP). New York: IEEE; 1982. vol. 7. pp. 1282–1285.Google Scholar
- 34.Lagrange M, Badeau R, Richard G. Robust similarity metrics between audio signals based on asymmetrical spectral envelope matching. IEEE International Conference On Acoustics Speech and Signal Processing (ICASSP). IEEE; 2010. pp 405–408.Google Scholar
- 35.Rabiner Lawrence, Biing-Hwang J. Fundamentals of speech recognition PTR. Englewood Cliffs: Prentice Hall; 1993.Google Scholar
- 36.Kim YE, Schmidt EM, Migneco R, Morton BG, Richardson P, Scott J, Speck JA, Turnbull D. Music emotion recognition: A state of the art review. Proceeding of the International Society for Music Information Retrieval Conference (ISMIR). Canada; 2010. p. 255–266.Google Scholar
- 37.Vazquez-Alvarez Y, Brewster SA. Eyes-free multitasking: the effect of cognitive load on mobile spatial audio interfaces. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’11). New York: ACM; 2011. p. 2173–2176.Google Scholar