Skip to main content

Masked Conditional Neural Networks for Environmental Sound Classification

  • Conference paper
  • First Online:
Artificial Intelligence XXXIV (SGAI 2017)

Abstract

The ConditionaL Neural Network (CLNN) exploits the nature of the temporal sequencing of the sound signal represented in a spectrogram, and its variant the Masked ConditionaL Neural Network (MCLNN) induces the network to learn in frequency bands by embedding a filterbank-like sparseness over the network’s links using a binary mask. Additionally, the masking automates the exploration of different feature combinations concurrently analogous to handcrafting the optimum combination of features for a recognition task. We have evaluated the MCLNN performance using the Urbansound8k dataset of environmental sounds. Additionally, we present a collection of manually recorded sounds for rail and road traffic, YorNoise, to investigate the confusion rates among machine generated sounds possessing low-frequency components. MCLNN has achieved competitive results without augmentation and using 12% of the trainable parameters utilized by an equivalent model based on state-of-the-art Convolutional Neural Networks on the Urbansound8k. We extended the Urbansound8k dataset with YorNoise, where experiments have shown that common tonal properties affect the classification performance.

This work is funded by the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no. 608014 (CAPACITIE).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://keras.io.

  2. 2.

    http://deeplearning.net/software/theano.

  3. 3.

    https://librosa.github.io/.

  4. 4.

    http://ffmpeg.org/.

  5. 5.

    https://github.com/fadymedhat/YorNoise.

References

  1. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  2. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)

    Article  Google Scholar 

  3. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT (1992)

    Google Scholar 

  4. Dieleman, S., Schrauwen, B.: End-to-end learning for music audio. In: International Conference on Acoustics, Speech and Signal Processing, ICASSP (2014)

    Google Scholar 

  5. Fahlman, S.E., Hinton, G.E., Sejnowski, T.J.: Massively parallel architectures for Al: NETL, Thistle, and Boltzmann machines. In: National Conference on Artificial Intelligence, AAAI (1983)

    Google Scholar 

  6. Hamel, P., Eck, D.: Learning features from music audio with deep belief networks. In: International Society for Music Information Retrieval Conference, ISMIR (2010)

    Google Scholar 

  7. Taylor, G.W., Hinton, G.E., Roweis, S.: Modeling human motion using binary latent variables. In: Advances in Neural Information Processing Systems, NIPS, pp. 1345–1352 (2006)

    Google Scholar 

  8. Battenberg, E., Wessel, D.: Analyzing drum patterns using conditional deep belief networks. In: International Society for Music Information Retrieval, ISMIR (2012)

    Google Scholar 

  9. Mohamed, A.-R., Hinton, G.: Phone recognition using restricted boltzmann machines In: IEEE International Conference on Acoustics Speech and Signal Processing, ICASSP (2010)

    Google Scholar 

  10. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Neural Information Processing Systems, NIPS (2012)

    Google Scholar 

  11. Abdel-Hamid, O., Mohamed, A.-R., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22, 1533–1545 (2014)

    Article  Google Scholar 

  12. Pons, J., Lidy, T., Serra, X.: Experimenting with musically motivated convolutional neural networks. In: International Workshop on Content-based Multimedia Indexing, CBMI (2016)

    Google Scholar 

  13. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)

    Article  Google Scholar 

  14. Choi, K., Fazekas, G., Sandler, M., Cho, K.: Convolutional recurrent neural networks for music classification. In: arXiv preprint arXiv:1609.04243 (2016)

  15. Lee, H., Grosse, R. Ranganath, R., Ng, A.Y.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML, pp. 1–8 (2009)

    Google Scholar 

  16. Lee, H., Largman, Y., Pham, P., Ng, A.Y.: Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Neural Information Processing Systems (NIPS) (2009)

    Google Scholar 

  17. Medhat, F., Chesmore, D., Robinson, J.: Masked conditional neural networks for audio classification. In: Lintas, A., Rovetta, S., Verschure, P., Villa, A. (eds.) Artificial Neural Networks and Machine Learning – ICANN 2017. LNCS, vol. 10614, pp. 349–358. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68612-7_40

    Chapter  Google Scholar 

  18. Medhat, F., Chesmore, D., Robinson, J.: Masked conditional neural networks for automatic sound events recognition. In: IEEE International Conference on Data Science and Advanced Analytics (DSAA) (2017)

    Google Scholar 

  19. Lin, M., Chen, Q., Yan, S.: Network in network. In: International Conference on Learning Representations, ICLR (2014)

    Google Scholar 

  20. Bergstra, J., Casagrande, N., Erhan, D., Eck, D., Kégl, B.: Aggregate features and AdaBoost for music classification. Mach. Learn. 65, 473–484 (2006)

    Article  Google Scholar 

  21. Salamon, J., Jacoby, C., Bello, J.P.: A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 1041–1044 (2014)

    Google Scholar 

  22. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: IEEE International Conference on Computer Vision, ICCV (2015)

    Google Scholar 

  23. Kingma, D., Ba, J.: ADAM: a method for stochastic optimization. In: International Conference for Learning Representations, ICLR (2015)

    Google Scholar 

  24. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. JMLR 15, 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  25. Salamon, J., Bello, J.P.: Unsupervised feature learning for urban sound classification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2015)

    Google Scholar 

  26. Piczak, K.J.: Environmental sound classification with convolutional neural networks. In: IEEE International Workshop on Machine Learning For Signal Processing (MLSP) (2015)

    Google Scholar 

  27. Salamon, J., Bello, J.P.: Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process. Lett. 24, 279–283 (2017)

    Article  Google Scholar 

  28. Dhillon, I.S., Modha, D.S.: Concept decompositions for large sparse text data using clustering. Mach. Learn. 42, 143–175 (2001)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fady Medhat .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Medhat, F., Chesmore, D., Robinson, J. (2017). Masked Conditional Neural Networks for Environmental Sound Classification. In: Bramer, M., Petridis, M. (eds) Artificial Intelligence XXXIV. SGAI 2017. Lecture Notes in Computer Science(), vol 10630. Springer, Cham. https://doi.org/10.1007/978-3-319-71078-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-71078-5_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-71077-8

  • Online ISBN: 978-3-319-71078-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics