Fundamental Frequency Extraction in Speech Emotion Recognition

  • Bartłomiej Stasiak
  • Krzysztof Rychlicki-Kicior
Part of the Communications in Computer and Information Science book series (CCIS, volume 287)


Emotion recognition in a speech signal has received much attention recently, due to its usefulness in many applications associated with human – computer interaction. Fundamental frequency recognition in a speech signal is one of the most crucial factors in successful emotion recognition. In this work, parameters of an autocorrelation – based algorithm for fundamental frequency detection are analysed on the example of Berlin emotion speech database (EMO-DB). The obtained results show that lower-than-standard values of the upper limit of the analysed frequency range tend to improve the classification outcome. Statistics of prosody contours and Mel-frequency cepstral coefficients (MFCC) have been used for feature set construction and support vector machine (SVM) has been used as a classifier, yielding high recognition rates.


speech emotion recognition pitch extraction prosody contours 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Dziubiński, M., Kostek, B.: High accuracy and octave error immune pitch detection algorithms. Archives of Acoustics 29(1), 1–21 (2004)Google Scholar
  2. 2.
    Gerhard, D.: Pitch Extraction and Fundamental Frequency: History and Current Techniques. Technical Report TR-CS 2003-06, Dept. of Computer Science, University of Regina (2003)Google Scholar
  3. 3.
    Paeschke, A.: Global Trend of Fundamental Frequency in Emotional Speech. In: Proceedings of Speech Prosody, Nara, Japan (2004)Google Scholar
  4. 4.
    Boersma, P.: Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In: IFA Proceedings 17 (1993)Google Scholar
  5. 5.
    Boersma, P.: Praat, a system for doing phonetics by computer. Glot International 5(9/10), 341–345 (2001)Google Scholar
  6. 6.
    Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A Database of German Emotional Speech. In: Proceedings Interspeech, Portugal (2005)Google Scholar
  7. 7.
    Ververidis, D., Kotropoulos, C.: Emotional speech recognition: Resources, features, and methods. Speech Communication 48(9) (2006)Google Scholar
  8. 8.
    Neiberg, D., Elenius, K., Karlsson, I., Laskowski, K.: Emotion Recognition in Spontaneous Speech. Working Papers 52, University of Lund (2006)Google Scholar
  9. 9.
    Niewiadomy, D., Pelikant, A.: Digital Speech Signal Parametrization by Mel Frequency Cepstral Coefficients and Word Boundaries. Journal of Applied Computer Science 15(2), 71–81 (2007)Google Scholar
  10. 10.
    Mao, X., Chen, L., Zhang, B.: Mandarin speech emotion recognition based on a hybrid HMM/ANN. International Journal of Computers 1(4) (2007)Google Scholar
  11. 11.
    Nogueiras, A., Moreno, A., Bonafonte, A., Mariño, J.B.: Speech Emotion Recognition Using Hidden Markov Models. In: 7th European Conference on Speech Communication and Technology, Aalborg, Denmark (2001)Google Scholar
  12. 12.
    Mansoorizadeh, M., Charkari, N.M.: Speech emotion recognition: comparison of speech segmentation approaches. In: IKT 2007 (2007)Google Scholar
  13. 13.
    Datcu, D., Rothkrantz, L.J.M.: The recognition of emotions from speech using GentleBoost classifier. A comparison approach. In: International Conference on Computer Systems and Technologies (2006)Google Scholar
  14. 14.
    Koolagudi, S.G., Rao, K.S.: Real life emotion classification using VOP and pitch based spectral features. In: India Conference (INDICON) Annual IEEE (2010)Google Scholar
  15. 15.
    Prasanna, S.R.M., Reddy, B.V.S., Krishnamoorthy, P.: Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Audio, Speech, and Language Processing 17, 556–565 (2009)CrossRefGoogle Scholar
  16. 16.
    Murty, K.S.R., Yegnanarayana, B.: Epoch extraction from speech signals. IEEE Trans. Audio, Speech, Language Processing 16(8), 1602–1615 (2008)CrossRefGoogle Scholar
  17. 17.
    Hahn, M., Kang, D.G.: Precise glottal closure instant detector for voiced speech. IEE Electronics Letters 32(23) (1996)Google Scholar
  18. 18.
    Shami, M.T., Kamel, M.S.: Segment-based approach to the recognition of emotions in speech. In: ICME (2005)Google Scholar
  19. 19.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011)Google Scholar
  20. 20.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)Google Scholar
  21. 21.
    Xuedong, H., Acero, A., Hon, H.W.: Spoken Language Processing. Prentice Hall PTR (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Bartłomiej Stasiak
    • 1
  • Krzysztof Rychlicki-Kicior
    • 1
  1. 1.Institute of Information TechnologyTechnical University of ŁódźŁódźPoland

Personalised recommendations