Novel Temporal and Spectral Features Derived from TEO for Classification Normal and Dysphonic Voices

Patil, Hemant A.; Baljekar, Pallavi N.; Basu, T. K.

doi:10.1007/978-3-642-27552-4_76

Hemant A. Patil³,
Pallavi N. Baljekar⁴ &
T. K. Basu⁵

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 133))

97 Accesses
1 Citations

Abstract

In this paper, various temporal features (i.e., zero crossing rate and short-time energy) and spectral features (spectral flux and spectral centroid) have been derived from the Teager energy operator (TEO) profile of the speech waveform. The efficacy of these features has been analyzed for the classification of normal and dysphonic voices by comparing their performance with the features derived from the linear prediction (LP) residual and the speech waveform. In addition, the effectiveness of fusing these features with state-of-the-art Mel frequency cepstral coefficients (MFCC) feature-set has also been investigated to understand whether these features provide complementary results. The classifier that has been used is the 2nd order polynomial classifier, with experiments being carried out on a subset of the Massachusetts Eye and Ear Infirmary (MEEI) database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Davis, S.B.: Acoustic Characteristics of Normal and Pathological Voices. Haskins Labora-tories: Status Report on Speech Research 54, 133–164 (1978)
Google Scholar
Parsa, V., Jamieson, D.G.: Identification of Pathological Voices Using Glottal Noise Measures. J. Speech, Language, Hearing Res. 43(2), 469–485 (2000)
Google Scholar
Teager, H.M., Teager, S.M.: Evidence for Nonlinear Sound Production Mechanisms in the Vocal Tract. In: Hardcastle, W.J., Marchal, A. (eds.) Speech Production and Speech Modelling, pp. 241–261. Kluwer, Netherlands (1990)
Google Scholar
CMU-ARCTIC speech synthesis databases, http://festvox.org/cmu_arctic/index.html
Markaki, M., Stylianou, Y., Arias-Londoño, J.D., Godino-Llorente, J.I.: Dysphonia Detec-tion Based on Modulation Spectral Features and Cepstral Coefficients. In: EEE Proc. Int. Conf. Acoust., Speech, Signal Processing (ICASSP), pp. 5162–5165 (2010)
Google Scholar
Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals. Prentice Hall, Englewood Cliffs (1978)
Google Scholar
Low, L.S.A., Maddage, N.C., Lech, M., Sheeber, L., Allen, N.: Influence of Acoustic Low-Level Descriptors in the Detection of Clinical Depression in Adolescents. In: IEEE Proc. Int. Conf. Acoust., Speech, Signal Processing (ICASSP), pp. 5154–5157 (2010)
Google Scholar
Paliwal, K.K.: Spectral Subband Centroid Features for Speech Recognition. In: IEEE Proc. Int. Conf. Acoust., Speech, Signal Processing, ICASSP (1998)
Google Scholar
Hossienzadeh, D., Krishnan, S.: Combining Vocal Source and MFCC Features for En-hanced Speaker Recognition Performance using GMMs. In: Proc of IEEE 9th Workshop on Multimedia Signal Processing, pp. 365–368 (2007)
Google Scholar
Kay Elemetrics Corp, Disordered Voice Database Model 4337, Version 1.03, Massachusetts Eye and Ear Infirmary Voice and Speech Lab (2002)
Google Scholar
Campbell, W.M., Assaleh, K.T., Broun, C.C.: Speaker Recognition with Polynomial Classifiers. IEEE Transactions on Speech and Audio Processing 10(4), 205–212 (2002)
Article Google Scholar
Martin, A.F., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET Curve in Assessment of Detection Task Performance. In: Proc. Eurospeech 1997, Rhodes, Greece, vol. 4, pp. 1899–1903 (1997)
Google Scholar
Davis, S.B., Mermelstein, P.: Comparison on Parametric Representation for Monosyl-labic Word Recognition in Continuously Spoken Sentences. IEEE, Transactions on Acoustics, Speech, And Signal Processing ASSP-28(4), 357–366 (1980)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar, Gujarat, India
Hemant A. Patil
Manipal Institute of Technology (MIT), Manipal, Karnataka, India
Pallavi N. Baljekar
Institute of Technology and Marine Engineering (ITME), Amira, West Bengal, India
T. K. Basu

Authors

Hemant A. Patil
View author publications
You can also search for this author in PubMed Google Scholar
Pallavi N. Baljekar
View author publications
You can also search for this author in PubMed Google Scholar
T. K. Basu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hemant A. Patil .

Editor information

Editors and Affiliations

South China Normal University, Guangzhou, 510631, China, People's Republic
Sabo Sambath
South China Normal University, Guangzhou, 510631, China, People's Republic
Egui Zhu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Patil, H.A., Baljekar, P.N., Basu, T.K. (2012). Novel Temporal and Spectral Features Derived from TEO for Classification Normal and Dysphonic Voices. In: Sambath, S., Zhu, E. (eds) Frontiers in Computer Education. Advances in Intelligent and Soft Computing, vol 133. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27552-4_76

Download citation

DOI: https://doi.org/10.1007/978-3-642-27552-4_76
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27551-7
Online ISBN: 978-3-642-27552-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics