Advertisement

International Journal of Speech Technology

, Volume 22, Issue 4, pp 937–957 | Cite as

Meta-heuristic approach in neural network for stress detection in Marathi speech

  • Vaijanath V. YerigeriEmail author
  • L. K. Ragha
Article
  • 28 Downloads

Abstract

Stress is defined as a form of psychalgia. Owing to the current day lifestyle of Homo-sapiens, the most recurring pain is psychogenic; and the most damaging form of psychalgia. Stress in its most severe form, has led to the death of many individuals of this species. In accordance to a study conducted by WHO in 2015, around 800,000 individuals commit suicide each year (one individual per 40 s). The only solution to this conundrum is to bring in efficient mechanized stress detection technique which utilize proven measures and are unbiased, is called “speech emotion recognition” (SER). Stress, by itself, is not an emotion, but gives rise to specific emotions. This paper proposes SER using neural network classifier with weight optimization using fusion of optimization algorithms viz. BAT, genetic algorithm, particle swarm organization and simulated annealing. Classifier is trained using multi-model feature set. Gammatone Wavelet Cepstral coefficient, Mel Frequency Cepstral coefficient, pitch, vocal tract frequency and energy are the features used to identify different emotions. Detect the stress level being main objective SUSAS benchmark database and Marathi language database is used for performance analysis. Performance parameters like cost function for evaluating meta-heuristic optimization algorithm and accuracy of emotion detection is calculated. The overall accuracy of 84.2% of stress related emotions is achieved.

Keywords

Speech emotion GWCC MFCC Pitch Stress Neural network 

Notes

References

  1. Alkaher, Y., & Mosque, Y. (2016). Detection of stress in speech: ICSEE 2016.  https://doi.org/10.1109/ICSEE.2016.7806047.
  2. Alonso, J. B., Cabrera, J., Medina, M., & Travieso, C. M. (2015). New approach in quantification of emotional intensity from the speech signal: Emotional temperature. Expert Systems with Applications,42, 9554–9564.Google Scholar
  3. Aragon, V. S., Esquivel, S. C., & Coello, C. A. C. (2010). A modified version of a t-cell algo-rithm for constrained optimization problems. International Journal for Numerical Methods in Engineering,84(3), 351–378.  https://doi.org/10.1002/nme.2904.CrossRefzbMATHGoogle Scholar
  4. Ayadi, M. E., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.zbMATHGoogle Scholar
  5. Badshah, A. M., Ahmad, J., Rahim, N., & Baik, S. W. (2017). Speech emotion recognition from spectrograms with deep convolutional neural network. In 2017 international conference on platform technology and service (PlatCon) (pp. 1–5). IEEE.Google Scholar
  6. Bagshaw, C., Hiller, S. M., & Jack, M. A. (1993). Enhanced pitch tracking and the processing of f0 contours for computer aided intonation teaching (pp. 1003–1006).Google Scholar
  7. Banse, R., & Scherer, K. R. (1996). Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 70(3), 614–636.Google Scholar
  8. Bernardino, H. S., Barbosa, H. J., Lemonge, A. C. (2007). A Hybrid genetic algorithm for con-strained optimization problems in mechanical engineering. In 2007 IEEE Congress on Evolutionary Computation. CEC 2007 (pp. 646–653).Google Scholar
  9. Bernardino, H. S., Barbosa, H. J., Lemonge, A. C., & Fonseca, L. G. (2008). A new hybrid AIS-GA for constrained optimization problems in mechanical engineering, In IEEE Congress on Evolutionary Computation, 2008. CEC 2008 (IEEE World Congress on Computational Intelligence) (pp. 1455–1462). http://dx.doi.org/10.1109/CEC.2008.4630985.
  10. Blum, C., & Socha, K. (2005). Training feed-forward neural networks with ant colony optimization: an application to pattern classification. In Fifth International Conference on Hybrid Intelligent Systems (HIS’05).Google Scholar
  11. Boersma, P. (1993). Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In Proceedings of the institute of phonetic sciences (Vol. 17, No. 1193, pp. 97-110).Google Scholar
  12. Brookes. (1997). VOICEBOX: A speech processing toolbox for MATLAB. http://www.ee.imperial.ac.uk/hp/staff/dmb/voicebox/voicebox.html.
  13. Bozkurt, Ö. Ö., & Tayşı, Z. C. (2014). Audio based gender & age identification. IEEE Signal Processing & Communications.  https://doi.org/10.1109/SIU.2014.6830493.CrossRefGoogle Scholar
  14. Byrne, D., Dillon, H., Tran, K., Arlinger, S., Wilbraham, K., Cox, R., et al. (1994). An international comparison of long-term average speech spectra. The Journal of the Acoustical Society of America,96(4), 2108–2120.Google Scholar
  15. Camacho, A. (2007). SWIPE: A sawtooth waveform inspired pitch estimator for speech and music, Ph.D. dissertation, University of Florida.Google Scholar
  16. Calvo, R. A., & D’Mello, S. K. (2010). Affect detection: An interdisciplinary review of models, methods, and their applications. IEEE Transactions on Affective Computing, 1(1), 18–37.Google Scholar
  17. Cao, H., Verma, R., & Nenkova, A. (2015). Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech. Computer Speech & Language,29, 186–202.Google Scholar
  18. Chatterjee, S., Ghosh, S., Dawn, S., Hore, S., & Dey, N. (2016). Forest type classification: A hybrid NN-GA model based approach. In S. Satapathy, J. Mandal, S. Udgata, & V. Bhateja (Eds.), Information systems design and intelligent applications. Advances in intelligent systems and computing (Vol. 435). New Delhi: Springer.Google Scholar
  19. Chatterjee, S., Hore, S., Dey, N., Chakraborty, S., & Ashour, A. S. (2017a). Dengue fever classification using gene expression data: A PSO based artificial neural network approach. In S. Satapathy, V. Bhateja, S. Udgata, & P. Pattnaik (Eds.), Proceedings of the 5th International Conference on Frontiers in Intelligent Computing: Theory and Applications. Advances in Intelligent Systems and Computing (Vol. 516). Singapore: Springer.Google Scholar
  20. Chatterjee, S., Sarkar, S., Hore, S., et al. (2017b). Particle swarm optimization trained neural network for structural failure prediction of multistoried RC buildings. Neural Computing and Applications,28, 2005–2016.  https://doi.org/10.1007/s00521-016-2190-2.CrossRefGoogle Scholar
  21. Chaturvedi, K. T., Pandit, M., & Srivastava, L. (2008). Self-organizing hierarchical particle swarm optimization for nonconvex economic dispatch. IEEE Transactions on Power Systems,23(3), 1079–1087.Google Scholar
  22. Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., et al. (2001). Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, 18(1), 32–80.Google Scholar
  23. Datta, D., & Figueira, J. R. (2011). A real-integer-discrete-coded particle swarm optimization for design problems. Applied Soft Computing,11(4), 3625–3633.Google Scholar
  24. de Cheveigne, A., & Kawahara, H. (2002). YIN, a fundamental frequency estimator for speech and music. Journal of the Acoustical Society of America,111(4), 1917–1930.Google Scholar
  25. Deb, S., & Dandapat, S. (2015). A novel breathiness feature for analysis and classification of speech under stress. In 2015 Twenty First National Conference on Communications (NCC) (pp. 1–5). IEEE.Google Scholar
  26. Deng, J., Xu, X., Zhang, Z., Frühholz, S., & Schuller, B. (2016). Exploitation of phase-based features for whispered speech emotion recognition. IEEE Access,4, 4299–4309.Google Scholar
  27. Ding, N., Ye, N., Huang, H., Wang, R., & Malekian, R. (2018). Speech emotion features selection based on BBO-SVM. In 2018 Tenth International Conference on Advanced Computational Intelligence (ICACI) (pp. 210–216). IEEE.Google Scholar
  28. Douglas-Cowie, E., Campbell, N., Cowie, R., & Roach, P. (2003). Emotional speech: Towards a new generation of databases. Speech Communication, 40, 33–60.zbMATHGoogle Scholar
  29. Ekman, P. (1992). An argument for basic emotions. Journal of Cognition & Emotion, 6(3–4), 169–200.Google Scholar
  30. Felix, A., Hagiescu, D., Vladutu, L., & Puica, M. (2015). Neural network approaches for children’s emotion recognition in intelligent learning applications. In Proc. of EDULEARN 2015 (pp. 3229–3239).Google Scholar
  31. Gandomi, A. H. (2014). Interior search algorithm (ISA): A novel approach for global optimization. ISA Transactions, 53(4), 1168–1183.Google Scholar
  32. Gandomi, A. H., Yang, X.-S., & Alavi, A. H. (2011). Mixed variable structural optimization using firefly algorithm. Computers & Structures,89(23–24), 2325–2336.  https://doi.org/10.1016/j.compstruc.2011.08.002.CrossRefGoogle Scholar
  33. Gandomi, A. H., Yang, X. S., Alavi, A. H., & Talatahari, S. (2013). Bat algorithm for con-strained optimization tasks. Neural Computing and Applications,22(6), 1239–1255.  https://doi.org/10.1007/s00521-012-1028-9.CrossRefGoogle Scholar
  34. Ghahremani, P., BabaAli, B., Povey, D., Riedhammer, K., Trmal, J., & Khudanpur, S. (2014). A pitch extraction algorithm tuned for automatic speech recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2494-2498). IEEEGoogle Scholar
  35. Gómez-Lopera, J. F., Martínez-Aroza, J., Román-Roldán, R., Román-Gálvez, R., & Blanco-Navarro, D. (2017). The evaluation problem in discrete semi-hidden Markov models. Mathematics and Computers in Simulation, 137, 350–365.MathSciNetGoogle Scholar
  36. Gonzalez, S., & Brookes, M. (2011). A pitch estimation filter robust to high levels of noise (PEFAC). In 2011 19th European Signal Processing Conference (pp. 451–455). IEEE.Google Scholar
  37. Gudise, V. G., & Venayagamoorthy, G. K. (2003). Comparison of particle swarm optimization and backpropagation as training algorithms for neural networks. In Proceedings of the 2003 IEEE, Swarm Intelligence symposium, 2003. SIS’03, 2003 (pp. 110–117).Google Scholar
  38. Henríquez, P., Alonso, J. B., Ferrer, M. A., Travieso, C. M., & Orozco-Arroyave, J. R. (2014). Nonlinear dynamics characterization of emotional speech. Neurocomputing,132, 126–135.Google Scholar
  39. Hermes, D. J. (1988). Measurement of pitch by subharmonic summation. The Journal of the Acoustical Society of America,83(1), 257–264.Google Scholar
  40. Hess, W. (2012). Pitch determination of speech signals: Algorithms and devices. New York: Springer.Google Scholar
  41. Hoff-Ginsberg, E. (1990). Maternal speech and the child’s development of syntax: A further look. Journal of Child Language, 17(1), 85–99.Google Scholar
  42. Hore, S., et al. (2017). Indian sign language recognition using optimized neural networks. In V. Balas, L. Jain, & X. Zhao (Eds.), Information technology and intelligent transportation systems. Advances in intelligent systems and computing (Vol. 455). Cham: Springer.Google Scholar
  43. Huang, Z., & Epps, J. (2018). An investigation of partition-based and phonetically-aware acoustic features for continuous emotion prediction from speech. IEEE Transactions on Affective Computing.Google Scholar
  44. Jiejin, C., Xiaqqian, M., Lixiang, L., & Haipeng, P. (2007). Chaotic particle swarm optimization for economic dispatch considering the generator constraints. Energy Conversion and Management,48, 645–653.Google Scholar
  45. Karimi, H., & Yousefi, F. (2012). Application of artificial neural network–genetic algorithm (ANN–GA) to correlation of density in nanofluids. Fluid Phase Equilibria,336, 79–83.Google Scholar
  46. Kaveh, A., & Talatahari, S. (2010). A novel heuristic optimization method: Charged system search. Acta Mechanica,213(3–4), 267–289.  https://doi.org/10.1007/s00707-009-0270-4.CrossRefzbMATHGoogle Scholar
  47. Kavousi-Fard, A., Niknam, T., & Fotuhi-Firuzabad, M. (2015). Stochastic reconfiguration and optimal coordination of V2G plug-in electric vehicles considering correlated wind power generation. IEEE Transactions on Sustainable Energy,6(3), 822–830.Google Scholar
  48. Kim, K. J. (2006). Artificial neural networks with evolutionary instance selection for financial forecasting. Expert Systems with Applications,30(3), 519–526.Google Scholar
  49. Kostoulas, T., Mporas, I., Kocsis, O., Ganchev, T., Katsaounos, N., Santamaria, J. J., Jimenez-Murcia, S., Fernandez-Aranda, F., & Fakotakis, N. (2012). Affective speech interface in serious games for supporting therapy of mental disorders. Expert Systems with Applications, 39(12), 11072–11079.Google Scholar
  50. Kramer, E. (1963). Judgment of personal characteristics and emotions from nonverbal properties of speech. Psychological Bulletin,60(4), 408–420.Google Scholar
  51. Latif, M., Blee, K., DeMichele, M., & Simi, P. (2018). How emotional dynamics maintain and destroy white supremacist groups. Humanity & Society, 42(4), 480–501.Google Scholar
  52. López-de-Ipiña, K., Alonso, J. B., Solé-Casals, J., Barroso, N., Henriquez, P., Faundez-Zanuy, M., Travieso, C. M., Ecay-Torres, M., Martinez-Lage, P., & Eguiraun, H. (2015). On automatic diagnosis of Alzheimer’s disease based on spontaneous speech analysis and emotional temperature. Cognitive Computation, 7(1), 44–55.Google Scholar
  53. Li, L., Zhao, Y., & Jiang, D., (2013). Hybrid deep neural network hidden markov model (DNN-HMM) based speech emotion recognition. IEEE International Conference Affective Computing and Intelligent Interaction and Workshops, ACII.Google Scholar
  54. Li, Q., & Huang, Y. (2011). An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions. IEEE Transactions on Audio, Speech, and Language Processing, 19(6), 1791–1801.Google Scholar
  55. Li, Z., Tian, Y., Li, K., Zhou, F., & Yang, W. (2017). Reject inference in credit scoring using Semi-supervised Support Vector Machines. Expert Systems with Applications, 74, 105–114.Google Scholar
  56. Lim, W., Jang, D., & Lee, T. (2016). Speech emotion recognition using convolutional and recurrent neural networks. In 2016 Asia-Pacific signal and information processing association annual summit and conference (APSIPA) (pp. 1–4).Google Scholar
  57. Luengo, I., Navas, E., & Hernáez, I. (2010). Feature analysis and evaluation for automatic emotion identification in speech. IEEE Transactions on Multimedia,12, 490–501.Google Scholar
  58. Maillo, J., Ramírez, S., Triguero, I., & Herrera, F. (2017). kNN-IS: An iterative spark-based design of the k-nearest neighbors classifier for big data. Knowledge-Based Systems, 117, 3–15.Google Scholar
  59. Mallat, S. (1999). A wavelet tour of signal processing (3rd ed.). New York: Academic Press.zbMATHGoogle Scholar
  60. Mannepalli, K., Sastry, P. N., & Suman, M. (2016). A novel adaptive fractional deep belief networks for speaker emotion recognition. Alexandria Engineering Journal.  https://doi.org/10.1016/j.aej.2016.09.002.CrossRefGoogle Scholar
  61. Mao, Q., Dong, M., Huang, Z., & Zhan, Y. (2014). Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Transactions on Multimedia,16, 2203–2213.Google Scholar
  62. Martin, P. (1982). Comparison of pitch detection by cepstrum and spectral comb analysis. In ICASSP’82. IEEE International Conference on Acoustics, Speech, and Signal Processing (Vol. 7, pp. 180-183). IEEE.Google Scholar
  63. Medan, Y., Yair, E., & Chazan, D. (1991). Super resolution pitch determination of speech signals. IEEE Transactions on Signal Processing,39(1), 40–48.Google Scholar
  64. Mezura-Montes, E., & Hernández-Ocana, B. (2008). Bacterial foraging for engineering design problems: Preliminary results. In Proceedings of the Fourth Mexican congress on evolutionary computation (COMCEV 2008).Google Scholar
  65. Mishra, S., Shaw, K., & Mishra, D. (2012). A new meta-heuristic bat inspired classification approach for microarray data. Procedia Technology,4, 802–806.Google Scholar
  66. Mitrović, D., Zeppelzauer, M., & Breiteneder, C. (2010). Features for content-based audio retrieval. Advances in computers,78, 71–150.Google Scholar
  67. Murray, I. R., & Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. The Journal of the Acoustical Society of America,93(2), 1097–1108.Google Scholar
  68. Muthusamy, H., Polat, K., & Yaacob, S. (2015). Improved emotion recognition using gaussian mixture model and extreme learning machine in speech and glottal signals. Mathematical Problems in Engineering.Google Scholar
  69. Pahune, S., & Mishra, N. (2015). Emotion recognition through combination of speech and image processing, International Journal on Recent and Innovation Trends in Computing and Communication. ISSN, (pp. 2321–8169), .Google Scholar
  70. Panigrahi, B. K., & Pandi, V. R. (2008). Bacterial foraging optimization: Nelder-Mead hybrid algorithm for economic load dispatch. IET Generation, Transmission and Distribution,2(4), 556–565.Google Scholar
  71. Patterson, R. D., Nimmo-Smith, I., Holdsworth, J., & Rice, P. (1987). An efficient auditory filterbank based on the gammatone function. In a meeting of the IOC Speech Group on Auditory Modelling at RSRE (Vol. 2, No. 7).Google Scholar
  72. Rabiner, L., Cheng, M., Rosenberg, A., & McGonegal, C. (1976). A comparative performance study of several pitch detection algorithms. IEEE Transactions on Acoustics, Speech, and Signal Processing,24(5), 399–418.Google Scholar
  73. Raza, M. Q., Baharudin, Z., & Nallagownden, P. (2014). A comparative analysis of PSO and LM based NN short term load forecast with exogenous variables for smart power generation. In 2014 5th International Conference on Intelligent and Advanced Systems (ICIAS) (pp. 1–6). IEEE.Google Scholar
  74. Ringeval, F., Schuller, B., Valstar, M., Gratch, J., Cowie, R., Scherer, S., Mozgai, S., Cummins, N., Schmitt, M., & Pantic, M. (2017). Real life depression and affect recognition workshop challenge. IEEE Transcations on Affective Computing, 8(3), 314–327.Google Scholar
  75. Ross, M., Shaffer, H., Cohen, A., Freudberg, R., & Manley, H. (1974). Average magnitude difference function pitch extractor. IEEE Transactions on Acoustics, Speech, and Signal Processing,22(5), 353–362.Google Scholar
  76. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by backpropagating errors. Nature,323, 533–536.zbMATHGoogle Scholar
  77. Sadollah, A., Bahreininejad, H., & Eskandar, M. (2013). Hamdi, Mine blast algorithm: Anew population based algorithm for solving constrained engineering optimiza-tion problems. Applied Soft Computing,13(5), 2592–2612.  https://doi.org/10.1016/j.asoc.2012.11.026.CrossRefGoogle Scholar
  78. Sangwan, S., Saxena, S., & Kant, G. (2015). Optimization of machining parameters to minimize surface roughness using integrated ANN-GA approach. In Proceedings of the 22nd CIRP Conference on Life Cycle Engineering (LCE’15) (Vol. 29, pp. 305– 310). Sydney, Australia.Google Scholar
  79. Schuller, B. W. (2018). Speech emotion recognition: Two decades in a nutshell, benchmarks, and ongoing trends. Communications of the ACM,61(5), 90–99.Google Scholar
  80. Shahin, I. (2014). Novel third-order hidden markov models for speaker identification in shouted talking environments. Engineering Applications of Artificial Intelligence, 35, 316–323.  https://doi.org/10.1016/j.engappai.2014.07.006.CrossRefGoogle Scholar
  81. Shahin, I., & Ba-Hutair, M. N. (2015b). Talking condition recognition in stressful and emotional talking environments based on CSPHMM2s. International Journal of Speech Technology,18(1), 77–90.  https://doi.org/10.1007/s10772-014-9251-7.CrossRefGoogle Scholar
  82. Shahin, I., Nassif, A. B., & Hamsa, S. (2019). Emotion recognition using hybrid gaussian mixture model and deep neural network. IEEE Access,7, 26777–26787.Google Scholar
  83. Shahin, L., & Ba-Hutair, M. N. (2015a). Talking condition recognition in stressful and emotional talking environments based on CSPHMM2s. International Journal of Speech Technology,18, 77–90.Google Scholar
  84. Shukla, S., Dandapat, S., & Prasanna, S. M. (2016). A subspace projection approach for analysis of speech under stressed condition. Circuits, Systems, and Signal Processing,35(12), 4486–4500.MathSciNetGoogle Scholar
  85. Sidorov, M., Brester, C., Minker, W., & Semenkin, E. (2014). Speech-based emotion recognition : Feature selection by self-adaptive multi-criteria genetic algorithm. In International Conference on Language Resources and Evaluation (LREC).Google Scholar
  86. Silton, N. R. (2018). Scientific concepts behind happiness, kindness and empathy in contemporary society. IGI Global, Psychology.Google Scholar
  87. Slaney, M. (1993). An efficient implementation of the Patterson-Holdsworth auditory filter bank. Apple Computer, Perception Group, Tech. Rep, 35(8).Google Scholar
  88. Socha, K., & Blum, C. (2007). An ant colony optimization algorithm for continuous optimization: Application to feed-forward neural network training. Neural Computing and Applications,16(3), 235–247.Google Scholar
  89. Solbach, L., Wöhrmann, R., & Kliewer, J. (1998). The complex-valued continuous wavelet transform as a preprocessor for auditory scene analysis. In Computational auditory scene analysis (pp. 273–291). Lawrence Erlbaum Associates.Google Scholar
  90. Sood, S., & Krishnamurthy, A. (2004). A robust on-the-fly pitch (OTFP) estimation algorithm. Columbus: The Ohio State University.Google Scholar
  91. Sorin, A., Ramabadran, T., Chazan, D., Hoory, R., McLaughlin, M., Pearce, D., et al. (2004). The ETSI extended distributed speech recognition (DSR) standards: Client side processing and tonal language recognition evaluation. IEEE International Conference on Acoustics, Speech, and Signal Processing,1, 129–132.Google Scholar
  92. Stella, F., & Amer, Y. (2012). Continuous time Bayesian network classifiers. Journal of Biomedical Informatics, 45(6), 1108–1119.Google Scholar
  93. Stuhlsatz, A., Eyben, F., Meyer, C., ZieIke, T., Meier, G., & Schuller, B. (2011). Deep neural networks for acoustic emotion recognition: Raising the benchmarks. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5688–5691). IEEE.Google Scholar
  94. Sun, Y., Wen, G., & Wang, J. (2015). Weighted spectral features based on local Hu moments for speech emotion recognition. Biomedical Signal Processing and Control,18, 80–90.Google Scholar
  95. Talkin, D. (1995). A robust algorithm for pitch tracking (RAPT). Speech Coding and Synthesis,495, 518.Google Scholar
  96. Venkitaraman, A., Adiga, A., & Seelamantula, C. S. (2014). Auditory-motivated Gammatone wavelet transform. Signal Processing,94, 608–619.Google Scholar
  97. Waghmare, V. B., Deshmukh, R. R., Shrishrimal, P. P., & Janvale, G. B. (2014a). Development of isolated marathi words emotional speech database. International Journal of Computer Applications,94(4), 19–22.Google Scholar
  98. Waghmare, V. B., Deshmukh, R. R., Shrishrimal, P. P., & Janvale, G. B. (2014b). Emotion recognition system from artificial marathi speech using MFCC and LDA techniquesGoogle Scholar
  99. Wang, K., An, N., Li, B. N., Zhang, Y., & Li, L. (2015a). Speech emotion recognition using fourier parameters. IEEE Transactions on Affective Computing,6, 69–75.Google Scholar
  100. Wang, K., An, N., Li, B. N., Zhang, Y., & Li, L. (2015b). Speech emotion recognition using Fourier parameters. IEEE Transactions on Affective Computing,6(1), 69–75.Google Scholar
  101. Williams, C. E., & Stevens, K. N. (1972). Emotions & speech: Some acoustical correlates. Journal of Acoustics Society of America, 52(4), 1238–1250.Google Scholar
  102. Whitley, D., Starkweather, T., & Bogart, C. (1990). Genetic algorithms and neural networks: optimizing connections and connectivity. Parallel Computing,14(3), 347–361.Google Scholar
  103. Xu, X., Deng, J., Coutinho, E., Wu, C., Zhao, L., & Schuller, B. W. (2018). Connecting subspace learning and extreme learning machine in speech emotion recognition. IEEE Transactions on Multimedia,21(3), 795–808.Google Scholar
  104. Xu, S. H., Liu, J. P., Zhang, F. H., Wang, L., & Sun, L. J. (2015). A combination of genetic algorithm and particle swarm optimization for vehicle routing problem with time windows. Sensors,15(9), 21033–21053.  https://doi.org/10.3390/s150921033.CrossRefGoogle Scholar
  105. Yang, X. S., Sadat Hosseini, S. S., & Gandomi, A. H. (2012). Firefly algorithm for solving non-convex economic dispatch problems with valve loading effect. Applied Soft Computing,12, 1180–1186.Google Scholar
  106. Yang, X., Wang, K., & Shamma, S. A. (1992). Auditory representations of acoustic signals. IEEE Transactions on Information Theory,38(2), 824–839.Google Scholar
  107. Yilmaz, S., & Kucuksille, E. U. (2015). A new modification approach on bat algorithm for solving optimization problems. Applied Soft Computing,28, 259–275.Google Scholar
  108. Yogesh, C. K., Hariharan, M., Ngadiran, R., Adom, A. H., Yaacob, S., Berkai, C., & Polat, K. (2016). A new hybrid PSO assisted biogeography-based optimization for emotion and stress recognition from speech signal. PII: S0957-4174(16)30575-9. Expert Systems with Applications. Google Scholar
  109. Yogesh, C. K., Hariharan, M., Ngadiran, R., Adom, A. H., Yaacob, S., & Polat, K. (2017). Hybrid BBO PSO and higher order spectral features for emotion and stress recognition from natural speech. Applied Soft Computing,56, 217–232.Google Scholar
  110. Zahara, E., & Kao, Y.-T. (2009). Hybrid Nelder-Mead simplex search and particle swarmoptimization for constrained engineering design problems. Expert Systems with Applications,36(2), 3880–3886.  https://doi.org/10.1016/j.eswa.2008.02.039.CrossRefGoogle Scholar
  111. Zhang, Z., Coutinho, E., Deng, J., & Schuller, B. (2016). Cooperative learning and its application to emotion recognition from speech. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP),23(1), 115–126.Google Scholar
  112. Zhang, J., Liang, C., Huang, Y., Wu, J., & Yang, S. (2009). An effective multi agent evolutionary algorithm integrating a novel roulette inversion operator for engineering optimization. Applied Mathematics and Computation,211(2), 392–416.  https://doi.org/10.1016/j.amc.2009.01.048.MathSciNetCrossRefzbMATHGoogle Scholar
  113. Zhang, M., Luo, W., & Wang, X. (2008). Differential evolution with dynamic stochastic selection for constrained optimization. Information Sciences,178(15), 3043–3074.  https://doi.org/10.1016/j.ins.2008.02.014.CrossRefGoogle Scholar
  114. Zhang, J. Z., Mbitiru, N., Tay, P. C., & Adams, R. D. (2009). Analysis of stress in speech using adaptive empirical mode decomposition.Google Scholar
  115. Zong, Y., Zheng, W., Cui, Z., & Li, Q. (2016). Double sparse learning model for speech emotion recognition. Electronics Letters,52(16), 1410–1412.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.M.B.E.S. College of EngineeringAmbajogaiIndia
  2. 2.Terna Engineering CollegeNavi-MumbaiIndia

Personalised recommendations