Skip to main content

Speech Emotion Recognition Using Multiple Classifiers

  • Conference paper
  • First Online:
Book cover Web and Big Data (APWeb-WAIM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10612))

  • 1121 Accesses

Abstract

The research topic of how to automatically identify the emotional state of speakers received much attention. In this paper, we mainly focus on speech emotion recognition and develop an audio-based classification framework for identifying five different emotions in our audio database where the audio segments are from Chinese TV plays. First, acoustic features were extracted from the audio segments using Wavelet analysis, then feature selection is implemented based on Information gain and Sequential Forward Selection in the purpose of reducing irrelevant information as well as dimension reduction. Our classification framework is constructed over three base classifiers: SVM, Adaboost and Randomforest. Considering of the fact that a single classifier is in the limitation of recognition capability, decision fusion methods are applied to aggregate different prediction labels. According to the experiment on our database, the fusion methods we proposed show better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44(3), 572–587 (2011)

    Article  MATH  Google Scholar 

  2. Fragopanagos, N., Taylor, J.G.: Emotion recognition in human–computer interaction. Neural Netw. 18(4), 389–405 (2005)

    Article  Google Scholar 

  3. Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2(3), 18–22 (2002)

    Google Scholar 

  4. Zhu, J., Zou, H., Rosset, S., et al.: Multi-class adaboost. Stat. Interface 2(3), 349–360 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  5. Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in opensmile, the Munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 835–838. ACM, October 2013

    Google Scholar 

  6. Tufekci, Z., Gowdy, J.N.: Feature extraction using discrete wavelet transform for speech recognition. In: Proceedings of the IEEE Southeastcon 2000, pp. 116–123. IEEE (2000)

    Google Scholar 

  7. Dharanipragada, S., Rao, B.D.: MVDR based feature extraction for robust speech recognition. In: Proceedings of 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2001), vol. 1, pp. 309–312. IEEE (2001)

    Google Scholar 

  8. Suykens, J.A.K., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)

    Article  MATH  Google Scholar 

  9. Svetnik, V., Liaw, A., Tong, C., et al.: Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43(6), 1947–1958 (2003)

    Article  Google Scholar 

  10. Bergstra, J., Casagrande, N., Erhan, D., et al.: Aggregate features and AdaBoost for music classification. Mach. Learn. 65(2–3), 473–484 (2006)

    Article  Google Scholar 

  11. Sun, B., Li, L., Wu, X., et al.: Combining feature-level and decision-level fusion in a hierarchical classifier for emotion recognition in the wild. J. Multimodal User Interfaces 10(2), 125–137 (2016)

    Article  Google Scholar 

  12. Kuncheva, L.I., Bezdek, J.C., Duin, R.P.W.: Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recogn. 34(2), 299–314 (2001)

    Article  MATH  Google Scholar 

  13. Moreno-Seco, F., Iñesta, J.M., de León, P.J.P., Micó, L.: Comparison of classifier fusion methods for classification in pattern recognition tasks. In: Yeung, D.-Y., Kwok, J.T., Fred, A., Roli, F., de Ridder, D. (eds.) SSPR /SPR 2006. LNCS, vol. 4109, pp. 705–713. Springer, Heidelberg (2006). https://doi.org/10.1007/11815921_77

    Chapter  Google Scholar 

  14. Wang, K.X., Zhang, Q.L., Liao, S.Y.: A database of elderly emotional speech. In: Proceedings of International Symposium on Signal Processing, Biomedical Engineering Information, pp. 549–553 (2014)

    Google Scholar 

  15. Wang, K., An, N., Li, L.: Speech emotion recognition based on wavelet packet coefficient model. In: 2014 9th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 478–482. IEEE, September 2014

    Google Scholar 

  16. Daubechies, I., Sweldens, W.: Factoring wavelet transforms into lifting steps. J. Fourier Anal. Appl. 4(3), 247–269 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  17. Ververidis, D., Kotropoulos, C.: Fast and accurate sequential floating forward feature selection with the Bayes classifier applied to speech emotion recognition. Signal Process. 88(12), 2956–2970 (2008)

    Article  MATH  Google Scholar 

  18. Tao, Y., Wang, K., Yang, J., An, N., Li, L.: Harmony search for feature selection in speech emotion recognition. In: International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 362–367. IEEE, September 2015

    Google Scholar 

  19. Jin, Y., Song, P., Zheng, W., Zhao, L.: A feature selection and feature fusion combination method for speaker-independent speech emotion recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4808–4812. IEEE, May 2014

    Google Scholar 

  20. K\(\ddot{a}\)chele, M., Zharkov, D., Meudt, S., Schwenker, F.: Prosodic, spectral and voice quality feature selection using a long-term stopping criterion for audio-based emotion recognition. In: 22nd International Conference on Pattern Recognition (ICPR), pp. 803–808. IEEE, August 2014

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Open Project Program of the National Laboratory of Pattern Recognition (NLPR) (NO. 201700014), Anhui Provincial Natural Science Foundation (No. 1708085MF167), and Anhui Prov-ince Key Laboratory project of affective computing and advanced intelligent machines under grant ACAIM160103. Any correspondence should be made to Li Liu and Kunxia Wang.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Kunxia Wang or Li Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, K., Chu, Z., Wang, K., Yu, T., Liu, L. (2017). Speech Emotion Recognition Using Multiple Classifiers. In: Song, S., Renz, M., Moon, YS. (eds) Web and Big Data. APWeb-WAIM 2017. Lecture Notes in Computer Science(), vol 10612. Springer, Cham. https://doi.org/10.1007/978-3-319-69781-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69781-9_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69780-2

  • Online ISBN: 978-3-319-69781-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics