Smart Ambient Sound Analysis via Structured Statistical Modeling

  • Jialie ShenEmail author
  • Liqiang Nie
  • Tat-Seng Chua
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9517)


In this paper, we introduce a novel framework called SASA (Smart Ambient Sound Analyser) to support different ambient audio mining tasks (e.g., audio classification and location estimation). To gain comprehensive ambient sound modelling, SASA extracts a variety of acoustic features from different sound components (e.g., music, voice and background), and translates them into structured information. This significantly enhances quality of audio content representation. Further, distinguished from existing approaches, SASA’s multilayered architecture seamlessly integrates mixture models and aPEGASOS (adaptive PEGASOS) SVM algorithm into a unified classification framework. The approach can leverage complimentary strengths of both models. Experimental results based on three large test collections demonstrate the SASA’s advantages over existing methods on various analysis tasks.


Ambient intelligence Environmental sound analysis 



This work was partly supported by Singapore Ministry of Education Academic Research Fund Tier 2 (MOE2013-T2-2-156), Singapore.


  1. 1.
    Bailey, T., Sapatinas, T., Powell, K.J., Krzanowski, W.J.: Signal detection in underwater sound using wavelets. J. Am. Statist. Ass 93, 73–83 (1998)zbMATHCrossRefGoogle Scholar
  2. 2.
    Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13(5), 308–311 (2006)CrossRefGoogle Scholar
  3. 3.
    Chang, C., Lin, C.: LIBSVM: A library for support vector machines. ACM TIST 2(3), 27 (2011)Google Scholar
  4. 4.
    Chu, S., Narayanan, S.S., Kuo, C.C.J.: Environmental sound recognition with time-frequency audio features. IEEE Trans. Audio Speech Lang. Process. 17(6), 1142–1158 (2009)CrossRefGoogle Scholar
  5. 5.
    Feng, Z.R., Zhou, Q., Zhang, J., Jiang, P., Yang, X.W.: A target guided subband filter for acoustic event detection in noisy environments using wavelet packets. IEEE/ACM Trans. Audio Speech Lang. Proc. 23(2), 361–372 (2015)CrossRefGoogle Scholar
  6. 6.
    Lee, K., Ellis, D.P.W.: Audio-based semantic concept classification for consumer video. IEEE Trans. Audio Speech Lang. Proc. 18(6), 1406–1416 (2010)CrossRefGoogle Scholar
  7. 7.
    Li, T., Ogihara, M., Li, Q.: A comparative study on content-based music genre classification. In: Proceedings of ACM SIGIR Conference, pp. 282–289 (2003)Google Scholar
  8. 8.
    Loui, A.C., Luo, J., Chang, S., Ellis, D., Jiang, W., Kennedy, L.S., Lee, K., Yanagawa, A.: Kodak’s consumer video benchmark data set: concept definition and annotation. In: Proceedings of ACM MIR 2007, pp. 245–254 (2007)Google Scholar
  9. 9.
    Lu, L., Zhang, H., Li, S.Z.: Content-based audio classification and segmentation by using support vector machines. Multimedia Syst. 8(6), 482–492 (2003)CrossRefGoogle Scholar
  10. 10.
    Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances In Large Margin Classifiers, pp. 61–74. MIT Press, Cambridge (1999)Google Scholar
  11. 11.
    Salamon, J., Jacoby, C., Bello, J.P.: A dataset and taxonomy for urban sound research. In: 22st ACM International Conference on Multimedia (ACM-MM 2014) (2014)Google Scholar
  12. 12.
    Shalev-Shwartz, S., Srebro, N.: SVM optimization: inverse dependence on training set size. In: Proceedings of ICML 2008, pp. 928–935 (2008)Google Scholar
  13. 13.
    Su, F., Yang, L., Lu, T., Wang, G.: Environmental sound classification for scene recognition using local discriminant bases and HMM. In: Proceedings of ACM MM 2011, pp. 1389–1392 (2011)Google Scholar
  14. 14.
    Tarzia, S.P., Dinda, P.A., Dick, R.P., Memik, G.: Indoor localization without infrastructure using the acoustic background spectrum. In: Proceedings of ACM MobiSys 2011, pp. 155–168. ACM (2011)Google Scholar
  15. 15.
    Zhang, B., Shen, J., Xiang, Q., Wang, Y.: Compositemap: a novel framework for music similarity measure. In: Proceedings of ACM SIGIR Conference, pp. 403–410 (2009)Google Scholar
  16. 16.
    Zhou, D., Schölkopf, B., Hofmann, T.: Semi-supervised learning on directed graphs. In: NIPS 2004, pp. 1633–1640 (2004)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.School of Information SystemsSingapore Management UniversitySingaporeSingapore
  2. 2.School of ComputingNational University of SingaporeSingaporeSingapore

Personalised recommendations