Shallow Convolutional Neural Networks for Acoustic Scene Classification

Lu, Lu; Yang, Yuhong; Jiang, Yuzhi; Ai, Haojun; Tu, Weiping

doi:10.1007/s11859-018-1308-z

Shallow Convolutional Neural Networks for Acoustic Scene Classification

Computer Science
Published: 19 March 2018

Volume 23, pages 178–184, (2018)
Cite this article

Wuhan University Journal of Natural Sciences

Lu Lu^1,2,
Yuhong Yang^1,3,
Yuzhi Jiang^1,2,
Haojun Ai^1,3 &
…
Weiping Tu^1,2

149 Accesses
12 Citations
1 Altmetric
Explore all metrics

Abstract

Recently, deep neural networks, which include convolutional neural networks (CNNs), have been widely applied to acoustic scene classification (ASC). Motivated by the fact that some simplified CNNs have shown improvements over deep CNNs, such as Visual Geometry Group Net (VGG-Net), we have figured out how to simplify the VGG-Net style architecture to a shallow CNN with improved performance. Max pooling and batch normalization are also applied for better accuracy. With a series of controlled tests on detection and classification of acoustic scenes and events (DCASE) 2016 data sets, our shallow CNN achieves 6.7% improvement, and reduces time complexity to 5%, compared with the VGG-Net style CNN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CBAM: Convolutional Block Attention Module

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

A review of object detection based on deep learning

Article 12 June 2020

References

Barchiesi D, Giannoulis D, Dan S, et al. Acoustic scene classification: classifying environments from the sounds they produce[J]. IEEE Signal Processing Magazine, 2015, 32(3): 16–34.
Article Google Scholar
Ito A, Aiba A, Ito A, et al. Detection of abnormal sound using multi-stage GMM for surveillance microphone[C]//International Conference on Information Assurance and Security. Washington D C: IEEE, 2009:733–736.
Google Scholar
Ajmera J, Mccowan I, Bourlard H. Speech/music segmentation using entropy and dynamism features in a HMM classification framework[J]. Speech Communication, 2003, 40(3): 351–363.
Article Google Scholar
Chit K M. Audio-Based action scene classification using HMM-SVM algorithm[J]. International Journal of Advanced Research in Computer Engineering & Technology, 2013, 2(4): 1347–1351.
Google Scholar
Xu Y, Huang Q, Wang W, et al. Hierarchical Learning for DNN-based Acoustic Scene Classification[R/OL]. [2016-09-03]. http://www.cs.tut.fi/sgn/arg/dcase2016/documents/challenge_ technical_reports/Task1/Xu_2016_task1.pdf.
Eghbal-Zadeh H, Lehner B, Dorfer M, et al. CP-JKU submissions for DCASE-2016: A Hybrid Approach Using Binaural Ivectors and Deep Convolutional Neural Networks[R/OL]. [2016-09-03]. http://www.cs.tut.fi/sgn/arg/dcase2016/documents/challenge_technical_reports/Task1/Eghbal-Zadeh_2016 _task1.pdf.
Heittola T, Mesaros A. Acoustic Scene Classification Task Results[EB/OL]. [2017-02-13]. http://www.cs.tut.fi/sgn/arg/dcase2016/task-results-acoustic-scene-classification.
Han Y C, Lee K G. Acoustic scene classification using convolutional neural network and multiple-width frequencydelta data augmentation[DB/OL]. [2017-04-15]. http://arxiv. org/ar: 1607. 02383, 2016.
Valenti M, Diment A, Parascandolo G, et al. DCASE 2016 Acoustic Scene Classification Using Convolutional Neural Networks[R/OL]. [2016-09-03]. http://www.cs.tut.fi/sgn/arg/dcase2016/documents/challenge_technical_reports/Task1/Va lenti_2016_task1.pdf.
Thomas L, Alexander S. CQT-based Convolutional Neural Networks for Audio Scene Classification and Domestic Audio Tagging [R/OL]. [2016-09-03]. http://www.cs.tut.fi/sgn/arg/dcase2016/documents/challenge_technical_reports/Task1/Sc hindler_2016_task1.pdf.
Mafra G S, Duong N Q K, Ozerov A, et al. Acoustic Scene Classification: An Evaluation of an Extremely Compact Feature Representation[R/OL]. [2016-09-03]. http://www.cs.tut.fi/sgn/arg/dcase2016/documents/challenge_technical_report s/Task1/Duong_2016_task1.pdf.
Heittola T, Mesaros A, Virtanen T. Tut database for acoustic scene classification and sound event detection[C]//European Signal Processing Conference (EUSIPCO). Washington D C: IEEE, 2016: 1128–1132.
Google Scholar
Johnson J, Li F F, Karpathy A, et al. Convolutional neural networks: Architectures, convolution pooling layers [EB/OL]. [2017-02-13]. http://cs231n.github.io/convolutional-networks/.
He K, Sun J. Convolutional neural networks at constrained time cost[C]//IEEE Conference on Computer Vision and Pattern Recognition. Washington D C: IEEE, 2015: 5353–5360.
Google Scholar
Nam J, Herrera J, Slaney M, et al. Learning sparse feature representations for music annotation and retrieval[C]//International Society for Music Information Retrieval Conference. Porto, Portugal: Edições, 2012: 565–570.
Google Scholar
Han Y, Lee S, Nam J, et al. Sparse feature learning for instrument identification: Effects of sampling and pooling methods[J]. Journal of the Acoustical Society of America, 2016, 139(5):2290–2298.
Article PubMed Google Scholar
Mcfee B, Raffel C, Liang D W, et al. Librosa: Audio and music signal analysis in Python[C]//Proceedings of the Python 14th Python in Science Conference. Austin: TX, 2015: 18–25.
Google Scholar
Jia Y Q, Shelhamer E, Donahue J, et al. Caffe: Convolutional architecture for fast feature embedding[C]//International Conference on Multimedia. New York: ACM, 2014: 675–678.
Google Scholar
Sutskever I, Martens J, Dahl G, et al. On the importance of initialization and momentum in deep learning[C]//International Conference on Machine Learning. Atlanta: JMLR. org, 2013: 1139–1147.
Google Scholar
Lee H, Kim G, Kim H G, et al. Deep CNNs along the time axis with intermap pooling for robustness to spectral variations[J]. IEEE Signal Processing Letters, 2016, 23(10): 1310–1314.
Article Google Scholar
Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]//International Conference on Machine Learning. Lille: JMLR. org, 2015: 448–456.
Google Scholar
Shimodaira H. Improving predictive inference under covariate shift by weighting the log-likelihood function[J]. Journal of Statistical Planning & Inference, 2000, 90(2): 227–244.
Article Google Scholar

Download references

Author information

Authors and Affiliations

National Engineering Research Center for Multimedia Software, Wuhan University, Wuhan, 430072, Hubei, China
Lu Lu, Yuhong Yang, Yuzhi Jiang, Haojun Ai & Weiping Tu
The Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan, 430072, Hubei, China
Lu Lu, Yuzhi Jiang & Weiping Tu
Collaborative Innovation Center of Geospatial Technology, Wuhan, 430072, Hubei, China
Yuhong Yang & Haojun Ai

Authors

Lu Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yuhong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yuzhi Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Haojun Ai
View author publications
You can also search for this author in PubMed Google Scholar
Weiping Tu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuhong Yang.

Additional information

Foundation item: Supported by the National Natural Science Foundation of China (61102127, 61231015), National High Technology Research and Development Program of China (863 Program, 2015AA016306), National Key Research and Development Program (2016YFB0502204), the Innovation Fund of Shanghai Aerospace Science and Technology (SAST, 2015014), the Key Technology R&D Program of Hubei Provence (2014BAA153), and SKLSE-2015-A-06

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, L., Yang, Y., Jiang, Y. et al. Shallow Convolutional Neural Networks for Acoustic Scene Classification. Wuhan Univ. J. Nat. Sci. 23, 178–184 (2018). https://doi.org/10.1007/s11859-018-1308-z

Download citation

Received: 03 May 2017
Published: 19 March 2018
Issue Date: April 2018
DOI: https://doi.org/10.1007/s11859-018-1308-z

Key words

CLC number

TP 391

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Shallow Convolutional Neural Networks for Acoustic Scene Classification

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

Attention mechanisms in computer vision: A survey

A review of object detection based on deep learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

Shallow Convolutional Neural Networks for Acoustic Scene Classification

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

Attention mechanisms in computer vision: A survey

A review of object detection based on deep learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation