Abstract
With the advent of new generations of personal assistants integrated with voice-controlled devices (e.g., Apple Siri, Google Assistant, Amazon Alexa, etc.), the demand for efficient mechanisms to detect, localize and recognize the source of sound events is raising up. As such, microphone-array based devices using improved algorithms are of interest for the research community. In this context, the recent success of deep learning algorithms in various domains (e.g., computer vision, speech recognition, etc.) opens the door to their application to the SELD (Sound Event Localization and Detection) problem. Here, the challenge stands on effectively combining deep neural networks (DNNs) with embedded devices driving specific configurations of the microphone arrays. In this work, we propose the QuadCOIN system. It is an embedded system executing the algorithms needed to detect and localize a sound event in the space all around, which exploits a specific arrangement of microphones that improves the precision in estimating the sound source position. Specifically, our system is composed of an embedded computing device coupled with four groups of microphones, each arranged as a small grid of four sensing elements (i.e., four microphone arrays). The embedded computing device collects the estimations of the event localization from the four groups of sensors, and then determines the exact position of the sound source. To this end, each group of microphones runs a cutting-edge Convolutional Neural Network (CNN), which allows to detect events of interest. The CNN has been trained using datasets generated through a developed in-house framework. As proof of the feasibility of the proposed system, we implemented it on low-cost hardware, which is composed of a single board computer (SBC) and four ST-BlueCOIN microphone arrays. Experimental results carried out on the QuadCOIN system, demonstrate its precision and accuracy in detecting sound events and localizing the corresponding sound sources.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zafari, F., et al.: A survey of indoor localization systems and technologies. IEEE Commun. Surv. Tutorials 21, 2568–2599 (2019)
Huang, Y., Benesty, J., Elko, G.W.: Source localization. In: Audio Signal Processing for Next-Generation Multimedia Communication Systems, pp. 229–253. Springer, Boston, MA, (2004)
Ijaz, F., et al.: Indoor positioning: a review of indoor ultrasonic positioning systems. In: Proceedings of the 15th International Conference on Advanced Communications Technology (ICACT) (2013)
Mesaros, A., et al.: Acoustic event detection in real life recordings. In: 2010 18th European Signal Processing Conference. IEEE (2010)
Hayashi, T., et al.: Duration-controlled LSTM for polyphonic sound event detection. In: IEEE/ACM TASLP (2017)
Cakir, E., et al.: Polyphonic sound event detection using multi label deep neural networks. In: IEEE IJCNN-2015 (2015)
Liu, K., et al.: Guoguo: enabling fine-grained indoor localization via smartphone. In: Proceedings of the 11th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys) (2013)
Huang, W., et al.: WalkieLokie: sensing relative positions of surrounding presenters by acoustic signals. In: Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp) (2016)
Mandal, A., et al.: Beep: 3D indoor positioning using audible sound. In: Proceedings of the IEEE Consumer Communications and Networking Conference (CCNC) (2005)
Adavanne, S., et al.: Sound event localization and detection of overlapping sources using convolutional recurrent neural networks. IEEE J. Sel. Top. Sig. Process. 13, 34–48 (2018)
Scionti, A., Ciccia, S., Terzo, O.: Soundfactory: a framework for generating datasets for deep learning seld algorithms. In: Proceedings of the ACM International Conference on Computing Frontiers (CF20) (2020)
Google: A large-scale dataset of manually annotated audio events, 7 February 2020. https://research.google.com/audioset/index.html
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Ciccia, S., Scionti, A., Vitali, G., Terzo, O. (2021). QuadCOINS-Network: A Deep Learning Approach to Sound Source Localization. In: Barolli, L., Poniszewska-Maranda, A., Enokido, T. (eds) Complex, Intelligent and Software Intensive Systems. CISIS 2020. Advances in Intelligent Systems and Computing, vol 1194. Springer, Cham. https://doi.org/10.1007/978-3-030-50454-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-50454-0_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50453-3
Online ISBN: 978-3-030-50454-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)