Deep Learning Locally Trained Wildlife Sensing in Real Acoustic Wetland Environment
We describe ‘Tidzam’, an application of deep learning that leverages a dense, multimodal sensor network installed at a large wetland restoration performed at Tidmarsh, a 600-acre former industrial-scale cranberry farm in Southern Massachusetts. Wildlife acoustic monitoring is a crucial metric during post-restoration evaluation of the processes, as well as a challenge in such a noisy outdoor environment. This article presents the entire Tidzam system, which has been designed in order to identify in real-time the ambient sounds of weather conditions as well as sonic events such as insects, small animals and local bird species from microphones deployed on the site. This experiment provides insight on the usage of deep learning technology in a real deployment. The originality of this work concerns the system’s ability to construct its own database from local audio sampling under the supervision of human visitors and bird experts.
KeywordsWildlife acoustic identification Signal processing Deep learning Wetland environment
The authors would like to acknowledge Living Observatory and the Mass Audubon Tidmarsh Wildlife Sanctuary for the opportunity to realize the audio deployment at this location. The NVIDIA GPU Grant Program has provided the two TITAN X which are used by Tidzam. Clement DUHART has been supported by the PRESTIGE Fellowship of Campus France and the Pôle Léonard de Vinci. We also thank the Elements Collaborative and the sponsors of the MIT Media Lab for their support of this work.
- [ADCV17]Adavanne, S., Drossos, K., Cakir, E., Virtanen, T.: Stacked convolutional and recurrent neural networks for bird audio detection. In: 25th European Signal Processing Conference (EUSIPCO), pp. 1729–1733, August 2017Google Scholar
- [CAP+17]Cakir, E., Adavanne, S., Parascandolo, G., Drossos, K., Virtanen, T.: Convolutional recurrent neural networks for bird audio detection. In: 25th European Signal Processing Conference, EUSIPCO 2017, Kos, Greece, 28 August–2 September 2017, pp. 1744–1748 (2017)Google Scholar
- [HCE+17]Hershey, S., et al.: CNN architectures for large-scale audio classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 131–135 (2017)Google Scholar
- [HP17]Han, Y., Park, J.: Convolutional neural networks with binaural representations and background subtraction for acoustic scene classification. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE 2017), November 2017Google Scholar
- [LDM+17]Li, J., Dai, W., Metze, F., Qu, S., Das, S.: A comparison of deep learning methods for environmental sound detection. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 126–130, March 2017Google Scholar
- [MM18]Mayton, B., et al.: Networked sensory landscape: capturing and experiencing ecological change across scales. To appear in Presence (2018)Google Scholar
- [PP16]Paradiso, J.: Our extended sensoria - how humans will connect with the internet of things. Next Step Exponential Life Open Mind Collect. 1(1), 47–75 (2016)Google Scholar