Skip to main content

Novel Temporal and Spectral Features Derived from TEO for Classification Normal and Dysphonic Voices

  • Chapter
Frontiers in Computer Education

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 133))

Abstract

In this paper, various temporal features (i.e., zero crossing rate and short-time energy) and spectral features (spectral flux and spectral centroid) have been derived from the Teager energy operator (TEO) profile of the speech waveform. The efficacy of these features has been analyzed for the classification of normal and dysphonic voices by comparing their performance with the features derived from the linear prediction (LP) residual and the speech waveform. In addition, the effectiveness of fusing these features with state-of-the-art Mel frequency cepstral coefficients (MFCC) feature-set has also been investigated to understand whether these features provide complementary results. The classifier that has been used is the 2nd order polynomial classifier, with experiments being carried out on a subset of the Massachusetts Eye and Ear Infirmary (MEEI) database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Davis, S.B.: Acoustic Characteristics of Normal and Pathological Voices. Haskins Labora-tories: Status Report on Speech Research 54, 133–164 (1978)

    Google Scholar 

  2. Parsa, V., Jamieson, D.G.: Identification of Pathological Voices Using Glottal Noise Measures. J. Speech, Language, Hearing Res. 43(2), 469–485 (2000)

    Google Scholar 

  3. Teager, H.M., Teager, S.M.: Evidence for Nonlinear Sound Production Mechanisms in the Vocal Tract. In: Hardcastle, W.J., Marchal, A. (eds.) Speech Production and Speech Modelling, pp. 241–261. Kluwer, Netherlands (1990)

    Google Scholar 

  4. CMU-ARCTIC speech synthesis databases, http://festvox.org/cmu_arctic/index.html

  5. Markaki, M., Stylianou, Y., Arias-Londoño, J.D., Godino-Llorente, J.I.: Dysphonia Detec-tion Based on Modulation Spectral Features and Cepstral Coefficients. In: EEE Proc. Int. Conf. Acoust., Speech, Signal Processing (ICASSP), pp. 5162–5165 (2010)

    Google Scholar 

  6. Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals. Prentice Hall, Englewood Cliffs (1978)

    Google Scholar 

  7. Low, L.S.A., Maddage, N.C., Lech, M., Sheeber, L., Allen, N.: Influence of Acoustic Low-Level Descriptors in the Detection of Clinical Depression in Adolescents. In: IEEE Proc. Int. Conf. Acoust., Speech, Signal Processing (ICASSP), pp. 5154–5157 (2010)

    Google Scholar 

  8. Paliwal, K.K.: Spectral Subband Centroid Features for Speech Recognition. In: IEEE Proc. Int. Conf. Acoust., Speech, Signal Processing, ICASSP (1998)

    Google Scholar 

  9. Hossienzadeh, D., Krishnan, S.: Combining Vocal Source and MFCC Features for En-hanced Speaker Recognition Performance using GMMs. In: Proc of IEEE 9th Workshop on Multimedia Signal Processing, pp. 365–368 (2007)

    Google Scholar 

  10. Kay Elemetrics Corp, Disordered Voice Database Model 4337, Version 1.03, Massachusetts Eye and Ear Infirmary Voice and Speech Lab (2002)

    Google Scholar 

  11. Campbell, W.M., Assaleh, K.T., Broun, C.C.: Speaker Recognition with Polynomial Classifiers. IEEE Transactions on Speech and Audio Processing 10(4), 205–212 (2002)

    Article  Google Scholar 

  12. Martin, A.F., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET Curve in Assessment of Detection Task Performance. In: Proc. Eurospeech 1997, Rhodes, Greece, vol. 4, pp. 1899–1903 (1997)

    Google Scholar 

  13. Davis, S.B., Mermelstein, P.: Comparison on Parametric Representation for Monosyl-labic Word Recognition in Continuously Spoken Sentences. IEEE, Transactions on Acoustics, Speech, And Signal Processing ASSP-28(4), 357–366 (1980)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hemant A. Patil .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag GmbH Berlin Heidelberg

About this chapter

Cite this chapter

Patil, H.A., Baljekar, P.N., Basu, T.K. (2012). Novel Temporal and Spectral Features Derived from TEO for Classification Normal and Dysphonic Voices. In: Sambath, S., Zhu, E. (eds) Frontiers in Computer Education. Advances in Intelligent and Soft Computing, vol 133. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27552-4_76

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-27552-4_76

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-27551-7

  • Online ISBN: 978-3-642-27552-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics