Non-linear Speech Feature Extraction for Phoneme Classification and Speaker Recognition

Chetouani, Mohamed; Faundez-Zanuy, Marcos; Gas, Bruno; Zarader, Jean-Luc

doi:10.1007/11520153_16

Mohamed Chetouani²²,
Marcos Faundez-Zanuy²³,
Bruno Gas²² &
…
Jean-Luc Zarader²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3445))

Included in the following conference series:

International School on Neural Networks, Initiated by IIASS and EMFCSC

1176 Accesses
7 Citations

Abstract

In this paper we propose a new feature extraction algorithm based on non-linear prediction: the Neural Predictive Coding (NPC) model which is an extension of the classical LPC one. We apply this model to two significant tasks: phoneme classification and speaker identification. For the first one, the NPC model is trained with a Minimum Classification Error (MCE) criterion. The experiments carried out with the NTIMIT database show an improvement of the classification rates. For speaker identification, we propose a new feature extraction principle based on the NPC model. We also investigate different initialization methods. The new method gives better performances than the traditional ones (LPC, MFCC and PLP).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hermansky, H.: Should Recognizers Have Ears? Speech Communication 25, 3–27 (1998)
Article Google Scholar
Mary, L., Rama Murty, K.S., Mahadeva Prasanna, S.R., Yegnanarayana, B.: Features for Speaker and Language Identification. In: Proc. of ISCA Tutorial and Research Workshop on Speaker and Language Recognition (Odyssey 2004), pp. 323–328 (2004)
Google Scholar
Gas, B., Zarader, J.L., Chavy, C., Chetouani, M.: Discriminant neural predictive coding applied to phoneme recognition. Neurocomputing 56, 141–166 (2004)
Article Google Scholar
Kleijn, W.B.: Signal Processing Representations of Speech. IEICE Trans. Inf. and Syst. E86-D(3), 359–376 (2003)
Google Scholar
Chetouani, M., Gas, B., Zarader, J.L.: Learning vector quantization and neural predictive coding for nonlinear speech feature extraction. In: EUSIPCO (2004)
Google Scholar
Jankowski, C., Kalyanswamy, A., Basson, S., Spitz, J.: NTIMIT: A Phonetically Balanced, Continous Speech, Telephone Bandwidth Speech Database. In: ICASSP, vol. 1, pp. 109–112 (1990)
Google Scholar
Chetouani, M., Faundez-Zanuy, M., Gas, B., Zarader, J.L.: A new nonlinear speaker parameterization algorithm for speaker identification. In: Proc. of ISCA Tutorial and Research Workshop on Speaker and Language Recognition (Odyssey 2004), pp. 309–314 (2004)
Google Scholar
Burrows, T.L.: Speech Processing with Linear and Neural Networks Models. PhD Cambridge (1996)
Google Scholar
Ortega-Garcia, J., et al.: Ahumada: a large speech corpus in Spanish for speaker identification and verification. In: ICASSP, vol. 2, pp. 773–776 (1998)
Google Scholar
Bimbot, F., Mathan, L.: Text-free speaker recognition using an arithmeticharmonic sphericity measure. In: EUROSPEECH, pp. 169–172 (1991)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire des Instruments et Systèmes d’Ile-De-France, Université Paris VI, Paris, France
Mohamed Chetouani, Bruno Gas & Jean-Luc Zarader
Escola Universitària Politècnica de Mataró, Barcelona, Spain
Marcos Faundez-Zanuy

Authors

Mohamed Chetouani
View author publications
You can also search for this author in PubMed Google Scholar
Marcos Faundez-Zanuy
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Gas
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Luc Zarader
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CNRS LTCI/TSI Paris, 46 rue Barrault, 75634, Paris Cedex 13, France
Gérard Chollet
Department of Psychology, Second University of Naples, and IIASS, Via Pellegrino 19, 84019, Vietri sul Mare, SA, Italy
Anna Esposito
Escola Universitària Politècnica de Mataró, Universitat Politècnica de Catalunya, Barcelona, Spain
Marcos Faundez-Zanuy
Dipartimento di Fisica “E.R. Caianiello”, Università degli Studi di Salerno, Via S. Allende, 84081, Baronissi, SA, Italy
Maria Marinaro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chetouani, M., Faundez-Zanuy, M., Gas, B., Zarader, JL. (2005). Non-linear Speech Feature Extraction for Phoneme Classification and Speaker Recognition. In: Chollet, G., Esposito, A., Faundez-Zanuy, M., Marinaro, M. (eds) Nonlinear Speech Modeling and Applications. NN 2004. Lecture Notes in Computer Science(), vol 3445. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11520153_16

Download citation

DOI: https://doi.org/10.1007/11520153_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27441-4
Online ISBN: 978-3-540-31886-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics