Skip to main content

Sample Dropout for Audio Scene Classification Using Multi-scale Dense Connected Convolutional Neural Network

  • Conference paper
  • First Online:
Knowledge Management and Acquisition for Intelligent Systems (PKAW 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11016))

Included in the following conference series:

Abstract

Acoustic scene classification is an intricate problem for a machine. As an emerging field of research, deep Convolutional Neural Networks (CNN) achieve convincing results. In this paper, we explore the use of multi-scale Dense connected convolutional neural network (DenseNet) for the classification task, with the goal to improve the classification performance as multi-scale features can be extracted from the time-frequency representation of the audio signal. On the other hand, most of previous CNN-based audio scene classification approaches aim to improve the classification accuracy, by employing different regularization techniques, such as the dropout of hidden units and data augmentation, to reduce overfitting. It is widely known that outliers in the training set have a high negative influence on the trained model, and culling the outliers may improve the classification performance, while it is often under-explored in previous studies. In this paper, inspired by the silence removal in the speech signal processing, a novel sample dropout approach is proposed, which aims to remove outliers in the training dataset. Using the DCASE 2017 audio scene classification datasets, the experimental results demonstrates the proposed multi-scale DenseNet providing a superior performance than the traditional single-scale DenseNet, while the sample dropout method can further improve the classification robustness of multi-scale DenseNet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  2. Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32

    Chapter  Google Scholar 

  3. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  4. Dixit, M., Chen, S., Gao, D., Rasiwasia, N., Vasconcelos, N.: Scene classification with semantic fisher vectors. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2974–2983. IEEE (2015)

    Google Scholar 

  5. Stowell, D., Giannoulis, D., Benetos, E., Lagrange, M., Plumbley, M.D.: Detection and classification of acoustic scenes and events. IEEE Trans. Multimedia 17(10), 1733–1746 (2015)

    Article  Google Scholar 

  6. Eghbal-Zadeh, H., Lehner, B., Dorfer, M., Widmer, G.: CP-JKU submissions for DCASE-2016: a hybrid approach using binaural i-vectors and deep convolutional neural networks. In: IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) (2016)

    Google Scholar 

  7. Mesaros, A., Heittola, T., Virtanen, T.: TUT database for acoustic scene classification and sound event detection. In: 2016 24th European Signal Processing Conference (EUSIPCO), pp. 1128–1132. IEEE (2016)

    Google Scholar 

  8. Mesaros, A., et al.: DCASE 2017 challenge setup: tasks, datasets and baseline system. In: DCASE 2017-Workshop on Detection and Classification of Acoustic Scenes and Events (2017)

    Google Scholar 

  9. Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  10. Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)

    Google Scholar 

  11. Geiger, J.T., Schuller, B., Rigoll, G.: Large-scale audio feature extraction and SVM for acoustic scene classification. In: 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 1–4. IEEE (2013)

    Google Scholar 

  12. Fonseca, E., Gong, R., Bogdanov, D., Slizovskaia, O., Gómez Gutiérrez, E., Serra, X.: Acoustic scene classification by ensembling gradient boosting machine and convolutional neural networks. In: Virtanen, T., et al. (eds.) Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017), 16 November 2017, Munich, Germany. Tampere (Finland): Tampere University of Technology, pp. 37–41. Tampere University of Technology (2017)

    Google Scholar 

  13. Aytar, Y., Vondrick, C., Torralba, A.: Soundnet: learning sound representations from unlabeled video. In: Advances in Neural Information Processing Systems, pp. 892–900 (2016)

    Google Scholar 

  14. Marchi, E., Tonelli, D., Xu, X., Ringeval, F., Deng, J., Schuller, B.: The up system for the 2016 DCASE challenge using deep recurrent neural network and multiscale kernel subspace learning. In: Detection and Classification of Acoustic Scenes and Events (2016)

    Google Scholar 

  15. Bae, S.H., Choi, I., Kim, N.S.: Acoustic scene classification using parallel combination of LSTM and CNN. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016), pp. 11–15 (2016)

    Google Scholar 

  16. Phan, H., Koch, P., Hertel, L., Maass, M., Mazur, R., Mertins, A.: CNN-LTE: a class of 1-x pooling convolutional neural networks on label tree embeddings for audio scene classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 136–140. IEEE (2017)

    Google Scholar 

  17. Xu, K., et al.: Mixup-based acoustic scene classification using multi-channel convolutional neural network. arXiv preprint arXiv:1805.07319 (2018)

  18. Rakotomamonjy, A., Gasso, G.: Histogram of gradients of time-frequency representations for audio scene classification. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 23(1), 142–153 (2015)

    Google Scholar 

  19. Piczak, K.J.: ESC: dataset for environmental sound classification. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 1015–1018. ACM (2015)

    Google Scholar 

  20. LeCun, Y., et al.: Learning algorithms for classification: a comparison on handwritten digit recognition. Neural Netw. Stat. Mech. Perspect. 261, 276 (1995)

    Google Scholar 

  21. Li, B., Xu, K., Cui, X., Wang, Y., Ai, X., Wang, Y.: Multi-scale DenseNet-based electricity theft detection. arXiv preprint arXiv:1805.09591 (2018)

  22. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  23. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  24. Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, no. 2, p. 3 (2017)

    Google Scholar 

  25. Huang, G., Chen, D., Li, T., Wu, F., van der Maaten, L., Weinberger, K.Q.: Multi-scale dense convolutional networks for efficient prediction. arXiv preprint arXiv:1703.09844 (2017)

  26. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgement

This study was supported by the Strategic Priority Research Programme (17-ZLXD-XX-02-06-02-08).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kele Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Feng, D., Xu, K., Mi, H., Liao, F., Zhou, Y. (2018). Sample Dropout for Audio Scene Classification Using Multi-scale Dense Connected Convolutional Neural Network. In: Yoshida, K., Lee, M. (eds) Knowledge Management and Acquisition for Intelligent Systems. PKAW 2018. Lecture Notes in Computer Science(), vol 11016. Springer, Cham. https://doi.org/10.1007/978-3-319-97289-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-97289-3_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-97288-6

  • Online ISBN: 978-3-319-97289-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics