Abstract
Emotion recognition in a speech signal has received much attention recently, due to its usefulness in many applications associated with human – computer interaction. Fundamental frequency recognition in a speech signal is one of the most crucial factors in successful emotion recognition. In this work, parameters of an autocorrelation – based algorithm for fundamental frequency detection are analysed on the example of Berlin emotion speech database (EMO-DB). The obtained results show that lower-than-standard values of the upper limit of the analysed frequency range tend to improve the classification outcome. Statistics of prosody contours and Mel-frequency cepstral coefficients (MFCC) have been used for feature set construction and support vector machine (SVM) has been used as a classifier, yielding high recognition rates.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Dziubiński, M., Kostek, B.: High accuracy and octave error immune pitch detection algorithms. Archives of Acoustics 29(1), 1–21 (2004)
Gerhard, D.: Pitch Extraction and Fundamental Frequency: History and Current Techniques. Technical Report TR-CS 2003-06, Dept. of Computer Science, University of Regina (2003)
Paeschke, A.: Global Trend of Fundamental Frequency in Emotional Speech. In: Proceedings of Speech Prosody, Nara, Japan (2004)
Boersma, P.: Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In: IFA Proceedings 17 (1993)
Boersma, P.: Praat, a system for doing phonetics by computer. Glot International 5(9/10), 341–345 (2001)
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A Database of German Emotional Speech. In: Proceedings Interspeech, Portugal (2005)
Ververidis, D., Kotropoulos, C.: Emotional speech recognition: Resources, features, and methods. Speech Communication 48(9) (2006)
Neiberg, D., Elenius, K., Karlsson, I., Laskowski, K.: Emotion Recognition in Spontaneous Speech. Working Papers 52, University of Lund (2006)
Niewiadomy, D., Pelikant, A.: Digital Speech Signal Parametrization by Mel Frequency Cepstral Coefficients and Word Boundaries. Journal of Applied Computer Science 15(2), 71–81 (2007)
Mao, X., Chen, L., Zhang, B.: Mandarin speech emotion recognition based on a hybrid HMM/ANN. International Journal of Computers 1(4) (2007)
Nogueiras, A., Moreno, A., Bonafonte, A., Mariño, J.B.: Speech Emotion Recognition Using Hidden Markov Models. In: 7th European Conference on Speech Communication and Technology, Aalborg, Denmark (2001)
Mansoorizadeh, M., Charkari, N.M.: Speech emotion recognition: comparison of speech segmentation approaches. In: IKT 2007 (2007)
Datcu, D., Rothkrantz, L.J.M.: The recognition of emotions from speech using GentleBoost classifier. A comparison approach. In: International Conference on Computer Systems and Technologies (2006)
Koolagudi, S.G., Rao, K.S.: Real life emotion classification using VOP and pitch based spectral features. In: India Conference (INDICON) Annual IEEE (2010)
Prasanna, S.R.M., Reddy, B.V.S., Krishnamoorthy, P.: Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Audio, Speech, and Language Processing 17, 556–565 (2009)
Murty, K.S.R., Yegnanarayana, B.: Epoch extraction from speech signals. IEEE Trans. Audio, Speech, Language Processing 16(8), 1602–1615 (2008)
Hahn, M., Kang, D.G.: Precise glottal closure instant detector for voiced speech. IEE Electronics Letters 32(23) (1996)
Shami, M.T., Kamel, M.S.: Segment-based approach to the recognition of emotions in speech. In: ICME (2005)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)
Xuedong, H., Acero, A., Hon, H.W.: Spoken Language Processing. Prentice Hall PTR (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Stasiak, B., Rychlicki-Kicior, K. (2012). Fundamental Frequency Extraction in Speech Emotion Recognition. In: Dziech, A., Czyżewski, A. (eds) Multimedia Communications, Services and Security. MCSS 2012. Communications in Computer and Information Science, vol 287. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30721-8_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-30721-8_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30720-1
Online ISBN: 978-3-642-30721-8
eBook Packages: Computer ScienceComputer Science (R0)