Multimedia Tools and Applications

, Volume 74, Issue 15, pp 5375–5400 | Cite as

Efficient implementation techniques of an SVM-based speech/music classifier in SMV



For real-time speech and audio encoders used in various multimedia applications, low-complexity encoding algorithms are required. Indeed, accurate classification of input signals is the key prerequisite for variable bit rate encoding, which has been introduced in order to effectively utilize limited communication bandwidth. This paper investigates implementation issues with a support vector machine (SVM)-based speech/music classifier in the selectable mode vocoder (SMV) framework, which is a standard codec adopted by the Third-Generation Partnership Project 2 (3GPP2). While a support vector machine is well known for its superior classification capability, it is accompanied by a high computational cost. In order to achieve a more realizable system, we propose two techniques for the SVM-based speech/music classifier, aimed at reducing the number of classification requests to the classifier. The first technique introduces a simpler classifier that processes some of the input frames instead of the SVM-based classifier, and the second technique skips a portion of input frames based on strong inter-frame correlation in speech and music frames. Our experimental results show that the proposed techniques can reduce the computational cost of the SVM-based classifier by 95.4 % with negligible performance degradation, making it plausible for integration into the SMV codec.


Speech/music classification Support vector machine Selectable mode vocoder Embedded system 


  1. 1.
    3GPP2 Specification (2004) Selectable Mode Vocoder (SMV) service option for wideband spread spectrum communication systems. 3GPP2-C.S0030-0, v3.0Google Scholar
  2. 2.
    Burges C (1996) Simplified support vector decision rules. In: Proceedings of IEEE international conference on machine learning. Bari, Italy, pp 71–77Google Scholar
  3. 3.
    Burger D, Austin TM (1997) The simplescalar tool set, version 2.0, Tech Rep 1342. University of Wisconsin-Madison, Computer Sciences DepartmentGoogle Scholar
  4. 4.
    CSR (2006) BlueCore5 Multimedia. Accessed 28 June 2013
  5. 5.
    Dardas NH, Silva JM, Saddik AE (2012) Target-shooting exergame with a hand gesture control. Multimed Tools Appl doi: 10.1007/s11042-012-1236-4
  6. 6.
    Farrugia RA, Debono CJ (2012) A support vector machine approach for detection and localization of transmission errors within standard H.263++ decoders. IEEE Trans Multimed 11(7):1323–1330CrossRefGoogle Scholar
  7. 7.
    Fisher WM, Doddington GR, Goudie-Marshall KM (1986) The DARPA speech recognition research database: specifications and status. In: Proceedings of DARPA workshop speech recognition, pp 93–99Google Scholar
  8. 8.
    Gao Y, Shlomot E, Benyassine A, Hyssen J, Su H, Murgia C (2001) The SMV algorithm selected by TIA and 3GPP2 for CDMA applications. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing. Salt Lake City, pp 709–712Google Scholar
  9. 9.
    Ho T (2005) An efficient method for simplifying support vector machines. In: Proceedings of international conference on machine learning. Bonn, pp 617–624Google Scholar
  10. 10.
    Hu H, Li Y, Liu M, Liang W (2012) Classification of defects in steel strip surface based on multiclass support vector machine. Multimed Tools Appl. doi: 10.1007/s11042-012-1248-0
  11. 11.
    Kim SK, Chang JH (2009) Speech/music classification enhancement for 3GPP2 SMV codec based on support vector machine. IEICE Trans Fundam Electron Commun Comput Sci E92-A(2):630–632MathSciNetCrossRefGoogle Scholar
  12. 12.
    Kim SK, Chang JH (2010) Discriminative weight training for support vector machine-based speech/music classification in 3GPP2 SMV codec. IEICE Trans Fundam Electron Commun Comput Sci E93-A(1):316–319MathSciNetCrossRefGoogle Scholar
  13. 13.
    Lavner Y, Ruinskiy D (2009) A decision-tree-based algorithm for speech/music classification and segmentation. EURASIP J Audio Speech Music Process 2009:1–14Google Scholar
  14. 14.
    Maitre X (1988) 7 KHz audio coding within 64 kbit/s. IEEE J Sel Areas Commun 6(2):283–298CrossRefGoogle Scholar
  15. 15.
    Nakashima Y, Babaguchi N, Fan J (2012) Intended human object detection for automatically protecting privacy in mobile video surveillance. Multimedia Systems 18(2):157–173CrossRefGoogle Scholar
  16. 16.
    Song J, An H, Song Y, Choi S, Jeong D, Lee S (2011) Enhancement of speech/music decision employing GMM for SMV codec. In: Proceedings of international congressional image and signal processing, pp 2182-2185Google Scholar
  17. 17.
    Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999CrossRefGoogle Scholar
  18. 18.
    Zhan Y (2005) Design efficient support vector machine for fast classification. Pattern Recog 38(1):157–161CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Korea National University of TransportationChungbukRepublic of Korea
  2. 2.Hanyang UniversitySeoulRepublic of Korea

Personalised recommendations