Abstract
The research topic of how to automatically identify the emotional state of speakers received much attention. In this paper, we mainly focus on speech emotion recognition and develop an audio-based classification framework for identifying five different emotions in our audio database where the audio segments are from Chinese TV plays. First, acoustic features were extracted from the audio segments using Wavelet analysis, then feature selection is implemented based on Information gain and Sequential Forward Selection in the purpose of reducing irrelevant information as well as dimension reduction. Our classification framework is constructed over three base classifiers: SVM, Adaboost and Randomforest. Considering of the fact that a single classifier is in the limitation of recognition capability, decision fusion methods are applied to aggregate different prediction labels. According to the experiment on our database, the fusion methods we proposed show better performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44(3), 572–587 (2011)
Fragopanagos, N., Taylor, J.G.: Emotion recognition in human–computer interaction. Neural Netw. 18(4), 389–405 (2005)
Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2(3), 18–22 (2002)
Zhu, J., Zou, H., Rosset, S., et al.: Multi-class adaboost. Stat. Interface 2(3), 349–360 (2009)
Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in opensmile, the Munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 835–838. ACM, October 2013
Tufekci, Z., Gowdy, J.N.: Feature extraction using discrete wavelet transform for speech recognition. In: Proceedings of the IEEE Southeastcon 2000, pp. 116–123. IEEE (2000)
Dharanipragada, S., Rao, B.D.: MVDR based feature extraction for robust speech recognition. In: Proceedings of 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2001), vol. 1, pp. 309–312. IEEE (2001)
Suykens, J.A.K., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)
Svetnik, V., Liaw, A., Tong, C., et al.: Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43(6), 1947–1958 (2003)
Bergstra, J., Casagrande, N., Erhan, D., et al.: Aggregate features and AdaBoost for music classification. Mach. Learn. 65(2–3), 473–484 (2006)
Sun, B., Li, L., Wu, X., et al.: Combining feature-level and decision-level fusion in a hierarchical classifier for emotion recognition in the wild. J. Multimodal User Interfaces 10(2), 125–137 (2016)
Kuncheva, L.I., Bezdek, J.C., Duin, R.P.W.: Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recogn. 34(2), 299–314 (2001)
Moreno-Seco, F., Iñesta, J.M., de León, P.J.P., Micó, L.: Comparison of classifier fusion methods for classification in pattern recognition tasks. In: Yeung, D.-Y., Kwok, J.T., Fred, A., Roli, F., de Ridder, D. (eds.) SSPR /SPR 2006. LNCS, vol. 4109, pp. 705–713. Springer, Heidelberg (2006). https://doi.org/10.1007/11815921_77
Wang, K.X., Zhang, Q.L., Liao, S.Y.: A database of elderly emotional speech. In: Proceedings of International Symposium on Signal Processing, Biomedical Engineering Information, pp. 549–553 (2014)
Wang, K., An, N., Li, L.: Speech emotion recognition based on wavelet packet coefficient model. In: 2014 9th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 478–482. IEEE, September 2014
Daubechies, I., Sweldens, W.: Factoring wavelet transforms into lifting steps. J. Fourier Anal. Appl. 4(3), 247–269 (1998)
Ververidis, D., Kotropoulos, C.: Fast and accurate sequential floating forward feature selection with the Bayes classifier applied to speech emotion recognition. Signal Process. 88(12), 2956–2970 (2008)
Tao, Y., Wang, K., Yang, J., An, N., Li, L.: Harmony search for feature selection in speech emotion recognition. In: International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 362–367. IEEE, September 2015
Jin, Y., Song, P., Zheng, W., Zhao, L.: A feature selection and feature fusion combination method for speaker-independent speech emotion recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4808–4812. IEEE, May 2014
K\(\ddot{a}\)chele, M., Zharkov, D., Meudt, S., Schwenker, F.: Prosodic, spectral and voice quality feature selection using a long-term stopping criterion for audio-based emotion recognition. In: 22nd International Conference on Pattern Recognition (ICPR), pp. 803–808. IEEE, August 2014
Acknowledgements
This work was supported by the Open Project Program of the National Laboratory of Pattern Recognition (NLPR) (NO. 201700014), Anhui Provincial Natural Science Foundation (No. 1708085MF167), and Anhui Prov-ince Key Laboratory project of affective computing and advanced intelligent machines under grant ACAIM160103. Any correspondence should be made to Li Liu and Kunxia Wang.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Wang, K., Chu, Z., Wang, K., Yu, T., Liu, L. (2017). Speech Emotion Recognition Using Multiple Classifiers. In: Song, S., Renz, M., Moon, YS. (eds) Web and Big Data. APWeb-WAIM 2017. Lecture Notes in Computer Science(), vol 10612. Springer, Cham. https://doi.org/10.1007/978-3-319-69781-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-69781-9_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69780-2
Online ISBN: 978-3-319-69781-9
eBook Packages: Computer ScienceComputer Science (R0)