Deep Learning Locally Trained Wildlife Sensing in Real Acoustic Wetland Environment

  • Clement DuhartEmail author
  • Gershon Dublon
  • Brian Mayton
  • Joseph Paradiso
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 968)


We describe ‘Tidzam’, an application of deep learning that leverages a dense, multimodal sensor network installed at a large wetland restoration performed at Tidmarsh, a 600-acre former industrial-scale cranberry farm in Southern Massachusetts. Wildlife acoustic monitoring is a crucial metric during post-restoration evaluation of the processes, as well as a challenge in such a noisy outdoor environment. This article presents the entire Tidzam system, which has been designed in order to identify in real-time the ambient sounds of weather conditions as well as sonic events such as insects, small animals and local bird species from microphones deployed on the site. This experiment provides insight on the usage of deep learning technology in a real deployment. The originality of this work concerns the system’s ability to construct its own database from local audio sampling under the supervision of human visitors and bird experts.


Wildlife acoustic identification Signal processing Deep learning Wetland environment 



The authors would like to acknowledge Living Observatory and the Mass Audubon Tidmarsh Wildlife Sanctuary for the opportunity to realize the audio deployment at this location. The NVIDIA GPU Grant Program has provided the two TITAN X which are used by Tidzam. Clement DUHART has been supported by the PRESTIGE Fellowship of Campus France and the Pôle Léonard de Vinci. We also thank the Elements Collaborative and the sponsors of the MIT Media Lab for their support of this work.


  1. [ADCV17]
    Adavanne, S., Drossos, K., Cakir, E., Virtanen, T.: Stacked convolutional and recurrent neural networks for bird audio detection. In: 25th European Signal Processing Conference (EUSIPCO), pp. 1729–1733, August 2017Google Scholar
  2. [AVR06]
    Acevedo, M.A., Villanueva-Rivera, L.J.: From the field: using automated digital recording systems as effective tools for the monitoring of birds and amphibians. Wildlife Soc. Bull. 34(1), 211–214 (2006)CrossRefGoogle Scholar
  3. [CAP+17]
    Cakir, E., Adavanne, S., Parascandolo, G., Drossos, K., Virtanen, T.: Convolutional recurrent neural networks for bird audio detection. In: 25th European Signal Processing Conference, EUSIPCO 2017, Kos, Greece, 28 August–2 September 2017, pp. 1744–1748 (2017)Google Scholar
  4. [CMDA09]
    Celis-Murillo, A., Deppe, J.L., Allen, M.F.: Using soundscape recordings to estimate bird species abundance, richness, and composition. J. Field Ornithol. 80(1), 64–78 (2009)CrossRefGoogle Scholar
  5. [HCE+17]
    Hershey, S., et al.: CNN architectures for large-scale audio classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 131–135 (2017)Google Scholar
  6. [HP17]
    Han, Y., Park, J.: Convolutional neural networks with binaural representations and background subtraction for acoustic scene classification. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE 2017), November 2017Google Scholar
  7. [KSH+17]
    Kojima, R., Sugiyama, O., Hoshiba, K., Nakadai, K., Suzuki, R., Taylor, C.E.: Bird song scene analysis using a spatial-cue-based probabilistic model. J. Robot. Mechatron. (JRM) 29, 236–246 (2017)CrossRefGoogle Scholar
  8. [LDM+17]
    Li, J., Dai, W., Metze, F., Qu, S., Das, S.: A comparison of deep learning methods for environmental sound detection. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 126–130, March 2017Google Scholar
  9. [MM18]
    Mayton, B., et al.: Networked sensory landscape: capturing and experiencing ecological change across scales. To appear in Presence (2018)Google Scholar
  10. [PP16]
    Paradiso, J.: Our extended sensoria - how humans will connect with the internet of things. Next Step Exponential Life Open Mind Collect. 1(1), 47–75 (2016)Google Scholar
  11. [XHW+17]
    Xu, Y., et al.: Unsupervised feature learning based on deep models for environmental audio tagging. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1230–1241 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Clement Duhart
    • 1
    Email author
  • Gershon Dublon
    • 1
  • Brian Mayton
    • 1
  • Joseph Paradiso
    • 1
  1. 1.Responsive Environment GroupMIT Media LabCambridgeUSA

Personalised recommendations