Masked Conditional Neural Networks for Environmental Sound Classification

Medhat, Fady; Chesmore, David; Robinson, John

doi:10.1007/978-3-319-71078-5_2

Fady Medhat¹⁵,
David Chesmore¹⁵ &
John Robinson¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10630))

Included in the following conference series:

International Conference on Innovative Techniques and Applications of Artificial Intelligence

Abstract

The ConditionaL Neural Network (CLNN) exploits the nature of the temporal sequencing of the sound signal represented in a spectrogram, and its variant the Masked ConditionaL Neural Network (MCLNN) induces the network to learn in frequency bands by embedding a filterbank-like sparseness over the network’s links using a binary mask. Additionally, the masking automates the exploration of different feature combinations concurrently analogous to handcrafting the optimum combination of features for a recognition task. We have evaluated the MCLNN performance using the Urbansound8k dataset of environmental sounds. Additionally, we present a collection of manually recorded sounds for rail and road traffic, YorNoise, to investigate the confusion rates among machine generated sounds possessing low-frequency components. MCLNN has achieved competitive results without augmentation and using 12% of the trainable parameters utilized by an equivalent model based on state-of-the-art Convolutional Neural Networks on the Urbansound8k. We extended the Urbansound8k dataset with YorNoise, where experiments have shown that common tonal properties affect the classification performance.

This work is funded by the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no. 608014 (CAPACITIE).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006)
Article MathSciNet MATH Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Article Google Scholar
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT (1992)
Google Scholar
Dieleman, S., Schrauwen, B.: End-to-end learning for music audio. In: International Conference on Acoustics, Speech and Signal Processing, ICASSP (2014)
Google Scholar
Fahlman, S.E., Hinton, G.E., Sejnowski, T.J.: Massively parallel architectures for Al: NETL, Thistle, and Boltzmann machines. In: National Conference on Artificial Intelligence, AAAI (1983)
Google Scholar
Hamel, P., Eck, D.: Learning features from music audio with deep belief networks. In: International Society for Music Information Retrieval Conference, ISMIR (2010)
Google Scholar
Taylor, G.W., Hinton, G.E., Roweis, S.: Modeling human motion using binary latent variables. In: Advances in Neural Information Processing Systems, NIPS, pp. 1345–1352 (2006)
Google Scholar
Battenberg, E., Wessel, D.: Analyzing drum patterns using conditional deep belief networks. In: International Society for Music Information Retrieval, ISMIR (2012)
Google Scholar
Mohamed, A.-R., Hinton, G.: Phone recognition using restricted boltzmann machines In: IEEE International Conference on Acoustics Speech and Signal Processing, ICASSP (2010)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Neural Information Processing Systems, NIPS (2012)
Google Scholar
Abdel-Hamid, O., Mohamed, A.-R., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22, 1533–1545 (2014)
Article Google Scholar
Pons, J., Lidy, T., Serra, X.: Experimenting with musically motivated convolutional neural networks. In: International Workshop on Content-based Multimedia Indexing, CBMI (2016)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Article Google Scholar
Choi, K., Fazekas, G., Sandler, M., Cho, K.: Convolutional recurrent neural networks for music classification. In: arXiv preprint arXiv:1609.04243 (2016)
Lee, H., Grosse, R. Ranganath, R., Ng, A.Y.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML, pp. 1–8 (2009)
Google Scholar
Lee, H., Largman, Y., Pham, P., Ng, A.Y.: Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Neural Information Processing Systems (NIPS) (2009)
Google Scholar
Medhat, F., Chesmore, D., Robinson, J.: Masked conditional neural networks for audio classification. In: Lintas, A., Rovetta, S., Verschure, P., Villa, A. (eds.) Artificial Neural Networks and Machine Learning – ICANN 2017. LNCS, vol. 10614, pp. 349–358. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68612-7_40
Chapter Google Scholar
Medhat, F., Chesmore, D., Robinson, J.: Masked conditional neural networks for automatic sound events recognition. In: IEEE International Conference on Data Science and Advanced Analytics (DSAA) (2017)
Google Scholar
Lin, M., Chen, Q., Yan, S.: Network in network. In: International Conference on Learning Representations, ICLR (2014)
Google Scholar
Bergstra, J., Casagrande, N., Erhan, D., Eck, D., Kégl, B.: Aggregate features and AdaBoost for music classification. Mach. Learn. 65, 473–484 (2006)
Article Google Scholar
Salamon, J., Jacoby, C., Bello, J.P.: A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 1041–1044 (2014)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: IEEE International Conference on Computer Vision, ICCV (2015)
Google Scholar
Kingma, D., Ba, J.: ADAM: a method for stochastic optimization. In: International Conference for Learning Representations, ICLR (2015)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. JMLR 15, 1929–1958 (2014)
MathSciNet MATH Google Scholar
Salamon, J., Bello, J.P.: Unsupervised feature learning for urban sound classification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2015)
Google Scholar
Piczak, K.J.: Environmental sound classification with convolutional neural networks. In: IEEE International Workshop on Machine Learning For Signal Processing (MLSP) (2015)
Google Scholar
Salamon, J., Bello, J.P.: Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process. Lett. 24, 279–283 (2017)
Article Google Scholar
Dhillon, I.S., Modha, D.S.: Concept decompositions for large sparse text data using clustering. Mach. Learn. 42, 143–175 (2001)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronic Engineering, University of York, York, UK
Fady Medhat, David Chesmore & John Robinson

Authors

Fady Medhat
View author publications
You can also search for this author in PubMed Google Scholar
David Chesmore
View author publications
You can also search for this author in PubMed Google Scholar
John Robinson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fady Medhat .

Editor information

Editors and Affiliations

University of Portsmouth, Portsmouth, United Kingdom
Max Bramer
Middlesex University , London, United Kingdom
Miltos Petridis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Medhat, F., Chesmore, D., Robinson, J. (2017). Masked Conditional Neural Networks for Environmental Sound Classification. In: Bramer, M., Petridis, M. (eds) Artificial Intelligence XXXIV. SGAI 2017. Lecture Notes in Computer Science(), vol 10630. Springer, Cham. https://doi.org/10.1007/978-3-319-71078-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-71078-5_2
Published: 21 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71077-8
Online ISBN: 978-3-319-71078-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics