QuadCOINS-Network: A Deep Learning Approach to Sound Source Localization

Ciccia, Simone; Scionti, Alberto; Vitali, Giacomo; Terzo, Olivier

doi:10.1007/978-3-030-50454-0_13

Simone Ciccia¹⁷,
Alberto Scionti¹⁷,
Giacomo Vitali¹⁷ &
…
Olivier Terzo¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1194))

Included in the following conference series:

Conference on Complex, Intelligent, and Software Intensive Systems

1428 Accesses

Abstract

With the advent of new generations of personal assistants integrated with voice-controlled devices (e.g., Apple Siri, Google Assistant, Amazon Alexa, etc.), the demand for efficient mechanisms to detect, localize and recognize the source of sound events is raising up. As such, microphone-array based devices using improved algorithms are of interest for the research community. In this context, the recent success of deep learning algorithms in various domains (e.g., computer vision, speech recognition, etc.) opens the door to their application to the SELD (Sound Event Localization and Detection) problem. Here, the challenge stands on effectively combining deep neural networks (DNNs) with embedded devices driving specific configurations of the microphone arrays. In this work, we propose the QuadCOIN system. It is an embedded system executing the algorithms needed to detect and localize a sound event in the space all around, which exploits a specific arrangement of microphones that improves the precision in estimating the sound source position. Specifically, our system is composed of an embedded computing device coupled with four groups of microphones, each arranged as a small grid of four sensing elements (i.e., four microphone arrays). The embedded computing device collects the estimations of the event localization from the four groups of sensors, and then determines the exact position of the sound source. To this end, each group of microphones runs a cutting-edge Convolutional Neural Network (CNN), which allows to detect events of interest. The CNN has been trained using datasets generated through a developed in-house framework. As proof of the feasibility of the proposed system, we implemented it on low-cost hardware, which is composed of a single board computer (SBC) and four ST-BlueCOIN microphone arrays. Experimental results carried out on the QuadCOIN system, demonstrate its precision and accuracy in detecting sound events and localizing the corresponding sound sources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Zafari, F., et al.: A survey of indoor localization systems and technologies. IEEE Commun. Surv. Tutorials 21, 2568–2599 (2019)
Article Google Scholar
Huang, Y., Benesty, J., Elko, G.W.: Source localization. In: Audio Signal Processing for Next-Generation Multimedia Communication Systems, pp. 229–253. Springer, Boston, MA, (2004)
Google Scholar
Ijaz, F., et al.: Indoor positioning: a review of indoor ultrasonic positioning systems. In: Proceedings of the 15th International Conference on Advanced Communications Technology (ICACT) (2013)
Google Scholar
Mesaros, A., et al.: Acoustic event detection in real life recordings. In: 2010 18th European Signal Processing Conference. IEEE (2010)
Google Scholar
Hayashi, T., et al.: Duration-controlled LSTM for polyphonic sound event detection. In: IEEE/ACM TASLP (2017)
Google Scholar
Cakir, E., et al.: Polyphonic sound event detection using multi label deep neural networks. In: IEEE IJCNN-2015 (2015)
Google Scholar
Liu, K., et al.: Guoguo: enabling fine-grained indoor localization via smartphone. In: Proceedings of the 11th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys) (2013)
Google Scholar
Huang, W., et al.: WalkieLokie: sensing relative positions of surrounding presenters by acoustic signals. In: Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp) (2016)
Google Scholar
Mandal, A., et al.: Beep: 3D indoor positioning using audible sound. In: Proceedings of the IEEE Consumer Communications and Networking Conference (CCNC) (2005)
Google Scholar
Adavanne, S., et al.: Sound event localization and detection of overlapping sources using convolutional recurrent neural networks. IEEE J. Sel. Top. Sig. Process. 13, 34–48 (2018)
Article Google Scholar
Scionti, A., Ciccia, S., Terzo, O.: Soundfactory: a framework for generating datasets for deep learning seld algorithms. In: Proceedings of the ACM International Conference on Computing Frontiers (CF20) (2020)
Google Scholar
Google: A large-scale dataset of manually annotated audio events, 7 February 2020. https://research.google.com/audioset/index.html

Download references

Author information

Authors and Affiliations

Advanced Computing and Applications, LINKS Foundation, via P. C. Boggio, 61, Turin, Italy
Simone Ciccia, Alberto Scionti, Giacomo Vitali & Olivier Terzo

Authors

Simone Ciccia
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Scionti
View author publications
You can also search for this author in PubMed Google Scholar
Giacomo Vitali
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Terzo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simone Ciccia .

Editor information

Editors and Affiliations

Department of Information and Communication Engineering, Faculty of Information Engineering, Fukuoka Institute of Technology, Fukuoka, Japan
Leonard Barolli
Institute of Information Technology, Lodz University of Technology, Łódź, Poland
Aneta Poniszewska-Maranda
Faculty of Business Administration, Rissho University, Tokyo, Japan
Tomoya Enokido

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ciccia, S., Scionti, A., Vitali, G., Terzo, O. (2021). QuadCOINS-Network: A Deep Learning Approach to Sound Source Localization. In: Barolli, L., Poniszewska-Maranda, A., Enokido, T. (eds) Complex, Intelligent and Software Intensive Systems. CISIS 2020. Advances in Intelligent Systems and Computing, vol 1194. Springer, Cham. https://doi.org/10.1007/978-3-030-50454-0_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-50454-0_13
Published: 11 June 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50453-3
Online ISBN: 978-3-030-50454-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics