Multimedia Tools and Applications

, Volume 78, Issue 3, pp 3831–3842 | Cite as

Dictionary-based active learning for sound event classification

  • Wanting JiEmail author
  • Ruili Wang
  • Junbo Ma


This paper proposes a new dictionary-based active learning method for sound event classification, which significantly reduces the required amount of labeled samples in the process of classifier training. Active learning is a process of selecting samples to be labeled. In our method, the active learning is based on clustering. We use dictionary-based clustering as the dictionary learning is more suitable to sound event classification. Our classifier will be trained using both unlabelled sound segments (that have predicted labels), and a small number of labeled samples. The proposed method and other reference methods are implemented on a public urban sound dataset with 8732 sound segments, the classification accuracy is used to measure the performance of these classifiers. Experimental results show that the proposed method has higher classification accuracy but requires much less labeled samples than other methods.


Active learning k-medoids clustering Dictionary learning Sound event classification 



This work was supported in part by the Natural Science Foundation of Zhejiang Province (No. LY18F010008) and the Marsden Fund of New Zealand.


  1. 1.
    Barkana BD, Uzkent B (2011) Environmental noise classifier using a new set of feature parameters based on pitch range. Appl Acoust 72(11):841–848CrossRefGoogle Scholar
  2. 2.
    Chu S, Narayanan S, Jay Kuo C-C (2009) Environmental sound recognition with time-frequency audio features. IEEE Trans Audio Speech Lang Process 17(6):1142–1158CrossRefGoogle Scholar
  3. 3.
    Cohn D, Atlas L, Ladner R (1994) Improving generalization with active learning. Mach Learn 15(2):201–221Google Scholar
  4. 4.
    Duan S, Zhang J, Roe P, Towsey M (2012) A survey of tagging techniques for music, speech and environmental sound. Artif Intell Rev 42(4):637–661CrossRefGoogle Scholar
  5. 5.
    Fleury A, Noury N, Vacher M, Glasson H, Seri JF (2008) Sound and speech detection and classification in a health smart home. In: Proc. IEEE Int. Conf. Engineering in Medicine and Biology Society, p 4644–4647Google Scholar
  6. 6.
    Foggia P, Petkov N, Saggese A, Strisciuglio N, Vento M (2016) Audio surveillance of roads: a system for detecting anomalous sounds. IEEE Trans Intell Transp Syst 17(1):279–288CrossRefGoogle Scholar
  7. 7.
    Gadde A, Anis A, Ortega A (2014) Active semi-supervised learning using sampling theory for graph signals. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, p 492–501Google Scholar
  8. 8.
    Ghofrani S, McLernon DC, Ayatollahi A (2003) Comparing Gaussian and chirplet dictionaries for time-frequency analysis using matching pursuit decomposition. In: Signal Processing and Information Technology, 2003. ISSPIT 2003. Proceedings of the 3rd IEEE International Symposium on. IEEE, p 713–716Google Scholar
  9. 9.
    Gold B, Morgan N, Ellis D (2011) Speech and audio signal processing: processing and perception of speech and music. Wiley, HobokenCrossRefGoogle Scholar
  10. 10.
    Han W, Coutinho E, Ruan H, Li H, Schuller B, Yu X, Zhu X (2016) Semi-supervised active learning for sound classification in hybrid learning environments. PLoS One 11(9):e0162075CrossRefGoogle Scholar
  11. 11.
    Krogh A, Vedelsby J (1995) Neural network ensembles, cross validation, and active learning. In: Advances in neural information processing systems, p 231–238Google Scholar
  12. 12.
    Lei C, Zhu X (2017) Unsupervised feature selection via local structure learning and sparse learning. Multimedia Tools and Appl: 1–18Google Scholar
  13. 13.
    Maijala P, Shuyang Z, Heittola T, Virtanen T (2018) Environmental noise monitoring using source classification in sensors. Appl Acoust 129:258–267CrossRefGoogle Scholar
  14. 14.
    Mallat SG, Zhang Z (1993) Matching pursuits with time-frequency dictionaries. IEEE Trans Signal Process 41(12):3397–3415CrossRefGoogle Scholar
  15. 15.
    Morrison D, Wang R, De Silva LC (2007) Ensemble methods for spoken emotion recognition in call-centres. Speech Comm 49(2):98–112CrossRefGoogle Scholar
  16. 16.
    Park H-S, Jun C-H (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36(2):3336–3341CrossRefGoogle Scholar
  17. 17.
    Phuong NC, Dat TD (2013) Sound classification for event detection: Application into medical telemonitoring. In: Proc. Int. Conf. Computing, Management and Telecommunications (ComManTel), p 330–333Google Scholar
  18. 18.
    Piczak KJ (2015) ESC: dataset for environmental sound classification. In: Proc. ACM Int. Conf. Multimedia, p 1015–1018Google Scholar
  19. 19.
    Ren J, Jiang X, Yuan J, Magnenat-Thalmann N (2017) Sound-event classification using robust texture features for robot hearing. IEEE Trans Multimedia 19(3):447–458CrossRefGoogle Scholar
  20. 20.
    Riccardi G, Hakkani-Tur D (2005) Active learning: theory and applications to automatic speech recognition. IEEE Trans Speech Audio Process 13(4):504–511CrossRefGoogle Scholar
  21. 21.
    Rubinstein R, Bruckstein AM, Elad M (2010) Dictionaries for sparse representation modeling. Proc IEEE 98(6):1045–1057CrossRefGoogle Scholar
  22. 22.
    Salamon J, Jacoby C, Bello JP (2014) A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM international conference on Multimedia. ACM, p. 1041–1044Google Scholar
  23. 23.
    Schröder J, Anemiiller J, Goetze S (2016) Classification of human cough signals using spectro-temporal Gabor filterbank features. In: Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, p. 6455–6459Google Scholar
  24. 24.
    Sharan RV, Moir TJ (2017) Robust acoustic event classification using deep neural networks. Inf Sci 396:24–32CrossRefGoogle Scholar
  25. 25.
    Shuyang Z, Heittola T, Virtanen T (2017) Active learning for sound event classification by clustering unlabeled data. In: Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on, p 751–755Google Scholar
  26. 26.
    Sugden P, Canagarajah N (2004) Underdetermined noisy blind separation using dual matching pursuits. In: Acoustics, Speech, and Signal Processing (ICASSP'04). IEEE International Conference on, vol. 5, p V-557. IEEEGoogle Scholar
  27. 27.
    Vera-Candeas P, Ruiz-Reyes N, Rosa-Zurera M, Martinez-Munoz D, López-Ferreras F (2004) Transient modeling by matching pursuits with a wavelet dictionary for parametric audio coding. IEEE Signal Process Lett 11(3):349–352CrossRefGoogle Scholar
  28. 28.
    Wang R, Zong M (2018) Unsupervised feature selection based on self-representation and subspace learning. World Wide Web.
  29. 29.
    Wang J-C, Lin C-H, Chen B-W, Tsai M-K (2014) Gabor-based nonuniform scale-frequency map for environmental sound classification in home automation. IEEE Trans Autom Sci Eng 11(2):607–613CrossRefGoogle Scholar
  30. 30.
    Wang C-Y, Wang J-C, Santoso A, Chiang C-C, Wu C-H (2017) Sound event recognition using auditory-receptive-field binary pattern and hierarchical-diving deep belief network. IEEE/ACM Transactions on Audio, Speech, and Language Processing, p 1–16Google Scholar
  31. 31.
    Wang R, Ji W, Liu M, Wang X, Weng J, Deng S, Gao S, Yuan C-a. (2018) Review on mining data from multiple data sources. Pattern Recogn LettGoogle Scholar
  32. 32.
    Zhang Z, Schuller B (2012) Semi-supervised learning helps in sound event classification. In: Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, p 333–336Google Scholar
  33. 33.
    Zhang S, Li X, Zong M, Zhu X, Wang R (2017) Efficient knn classification with different numbers of nearest neighbors. IEEE Trans Neural Netw Learn SystGoogle Scholar
  34. 34.
    Zheng W, Zhu X, Zhu Y, Hu R, Lei C (2017) Dynamic graph learning for spectral feature selection. Multimed Tools Appl: 1–17Google Scholar
  35. 35.
    Zhu X (2006) Semi-supervised learning literature survey. University of Wisconsin-Madison, Technical Report 1530, WisconsinGoogle Scholar
  36. 36.
    Zhu X, Zhang S, Hu R, Zhu Y (2018) Local and global structure preservation for robust unsupervised spectral feature selection. IEEE Trans Knowl Data Eng 30(3):517–529CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Zhejiang Gongshang UniversityHangzhouChina
  2. 2.Masssy UniversityAucklandNew Zealand

Personalised recommendations