Skip to main content
Log in

Shallow Convolutional Neural Networks for Acoustic Scene Classification

  • Computer Science
  • Published:
Wuhan University Journal of Natural Sciences

Abstract

Recently, deep neural networks, which include convolutional neural networks (CNNs), have been widely applied to acoustic scene classification (ASC). Motivated by the fact that some simplified CNNs have shown improvements over deep CNNs, such as Visual Geometry Group Net (VGG-Net), we have figured out how to simplify the VGG-Net style architecture to a shallow CNN with improved performance. Max pooling and batch normalization are also applied for better accuracy. With a series of controlled tests on detection and classification of acoustic scenes and events (DCASE) 2016 data sets, our shallow CNN achieves 6.7% improvement, and reduces time complexity to 5%, compared with the VGG-Net style CNN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Barchiesi D, Giannoulis D, Dan S, et al. Acoustic scene classification: classifying environments from the sounds they produce[J]. IEEE Signal Processing Magazine, 2015, 32(3): 16–34.

    Article  Google Scholar 

  2. Ito A, Aiba A, Ito A, et al. Detection of abnormal sound using multi-stage GMM for surveillance microphone[C]//International Conference on Information Assurance and Security. Washington D C: IEEE, 2009:733–736.

    Google Scholar 

  3. Ajmera J, Mccowan I, Bourlard H. Speech/music segmentation using entropy and dynamism features in a HMM classification framework[J]. Speech Communication, 2003, 40(3): 351–363.

    Article  Google Scholar 

  4. Chit K M. Audio-Based action scene classification using HMM-SVM algorithm[J]. International Journal of Advanced Research in Computer Engineering & Technology, 2013, 2(4): 1347–1351.

    Google Scholar 

  5. Xu Y, Huang Q, Wang W, et al. Hierarchical Learning for DNN-based Acoustic Scene Classification[R/OL]. [2016-09-03]. http://www.cs.tut.fi/sgn/arg/dcase2016/documents/challenge_ technical_reports/Task1/Xu_2016_task1.pdf.

  6. Eghbal-Zadeh H, Lehner B, Dorfer M, et al. CP-JKU submissions for DCASE-2016: A Hybrid Approach Using Binaural Ivectors and Deep Convolutional Neural Networks[R/OL]. [2016-09-03]. http://www.cs.tut.fi/sgn/arg/dcase2016/documents/challenge_technical_reports/Task1/Eghbal-Zadeh_2016 _task1.pdf.

  7. Heittola T, Mesaros A. Acoustic Scene Classification Task Results[EB/OL]. [2017-02-13]. http://www.cs.tut.fi/sgn/arg/dcase2016/task-results-acoustic-scene-classification.

  8. Han Y C, Lee K G. Acoustic scene classification using convolutional neural network and multiple-width frequencydelta data augmentation[DB/OL]. [2017-04-15]. http://arxiv. org/ar: 1607. 02383, 2016.

  9. Valenti M, Diment A, Parascandolo G, et al. DCASE 2016 Acoustic Scene Classification Using Convolutional Neural Networks[R/OL]. [2016-09-03]. http://www.cs.tut.fi/sgn/arg/dcase2016/documents/challenge_technical_reports/Task1/Va lenti_2016_task1.pdf.

  10. Thomas L, Alexander S. CQT-based Convolutional Neural Networks for Audio Scene Classification and Domestic Audio Tagging [R/OL]. [2016-09-03]. http://www.cs.tut.fi/sgn/arg/dcase2016/documents/challenge_technical_reports/Task1/Sc hindler_2016_task1.pdf.

  11. Mafra G S, Duong N Q K, Ozerov A, et al. Acoustic Scene Classification: An Evaluation of an Extremely Compact Feature Representation[R/OL]. [2016-09-03]. http://www.cs.tut.fi/sgn/arg/dcase2016/documents/challenge_technical_report s/Task1/Duong_2016_task1.pdf.

  12. Heittola T, Mesaros A, Virtanen T. Tut database for acoustic scene classification and sound event detection[C]//European Signal Processing Conference (EUSIPCO). Washington D C: IEEE, 2016: 1128–1132.

    Google Scholar 

  13. Johnson J, Li F F, Karpathy A, et al. Convolutional neural networks: Architectures, convolution pooling layers [EB/OL]. [2017-02-13]. http://cs231n.github.io/convolutional-networks/.

  14. He K, Sun J. Convolutional neural networks at constrained time cost[C]//IEEE Conference on Computer Vision and Pattern Recognition. Washington D C: IEEE, 2015: 5353–5360.

    Google Scholar 

  15. Nam J, Herrera J, Slaney M, et al. Learning sparse feature representations for music annotation and retrieval[C]//International Society for Music Information Retrieval Conference. Porto, Portugal: Edições, 2012: 565–570.

    Google Scholar 

  16. Han Y, Lee S, Nam J, et al. Sparse feature learning for instrument identification: Effects of sampling and pooling methods[J]. Journal of the Acoustical Society of America, 2016, 139(5):2290–2298.

    Article  PubMed  Google Scholar 

  17. Mcfee B, Raffel C, Liang D W, et al. Librosa: Audio and music signal analysis in Python[C]//Proceedings of the Python 14th Python in Science Conference. Austin: TX, 2015: 18–25.

    Google Scholar 

  18. Jia Y Q, Shelhamer E, Donahue J, et al. Caffe: Convolutional architecture for fast feature embedding[C]//International Conference on Multimedia. New York: ACM, 2014: 675–678.

    Google Scholar 

  19. Sutskever I, Martens J, Dahl G, et al. On the importance of initialization and momentum in deep learning[C]//International Conference on Machine Learning. Atlanta: JMLR. org, 2013: 1139–1147.

    Google Scholar 

  20. Lee H, Kim G, Kim H G, et al. Deep CNNs along the time axis with intermap pooling for robustness to spectral variations[J]. IEEE Signal Processing Letters, 2016, 23(10): 1310–1314.

    Article  Google Scholar 

  21. Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]//International Conference on Machine Learning. Lille: JMLR. org, 2015: 448–456.

    Google Scholar 

  22. Shimodaira H. Improving predictive inference under covariate shift by weighting the log-likelihood function[J]. Journal of Statistical Planning & Inference, 2000, 90(2): 227–244.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuhong Yang.

Additional information

Foundation item: Supported by the National Natural Science Foundation of China (61102127, 61231015), National High Technology Research and Development Program of China (863 Program, 2015AA016306), National Key Research and Development Program (2016YFB0502204), the Innovation Fund of Shanghai Aerospace Science and Technology (SAST, 2015014), the Key Technology R&D Program of Hubei Provence (2014BAA153), and SKLSE-2015-A-06

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, L., Yang, Y., Jiang, Y. et al. Shallow Convolutional Neural Networks for Acoustic Scene Classification. Wuhan Univ. J. Nat. Sci. 23, 178–184 (2018). https://doi.org/10.1007/s11859-018-1308-z

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11859-018-1308-z

Key words

CLC number

Navigation