Abstract
In this work, the discrimination capabilities of speech cepstra for text and speaker related information are investigated. For this purpose, Bhattacharya distance metric is used as the measure of discrimination. The scope of the study covers static and dynamic cepstra derived using the linear prediction analysis (LPCC) as well as mel-frequency analysis (MFCC). The investigations also include the assessment of the linear prediction-based mel-frequency cepstral coefficients (LP-MFCC) as an alternative speech feature type. It is shown experimentally that whilst contaminations in speech unfavourably affect the performance of all types of cepstra, the effects are more severe in the case of MFCC. Furthermore, it is shown that with a combination of static and dynamic features, LP-based mel-frequency cepstra (LP-MFCC) exhibit the best discrimination capabilities in almost all experimental cases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Campbell, J.: Speaker Recognition: A tutorial. Proceedings of the IEEE 85(9), 1437–1462 (1997)
Bimbot, F., et al.: An overview of the CAVE project research activities in speaker verification. Speech Communication 31(2–3), 155–180 (2000)
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing 28(4), 357–366 (1980)
Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden Markov models. Speech Communication 41(4), 603–623 (2003)
O’Shaughnessy, D.: Speech Communication: Human and Machine. Addison-Wesley, Reading (1987)
Lee, K.F., Hon, H.W., Reddy, R.: An overview of the SPHINX speech recognition system. IEEE Transactions on Signal Processing 38, 35–45 (1990)
Sivakumaran, P.: Robust Text Dependant Speaker Verification. Ph.D. thesis, University of Hertfordshire (1998)
Reynolds, D., Andrews, W., et al.: The SuperSID project: exploiting high-level information for high-accuracy speaker recognition. In: Proc. ICASSP 2003, vol. 4, pp. 784–784 (2003)
Johnson, S.: Speaker Tracking. M.Phil. Thesis, C.U.E.D. - University of Cambridge (1997)
Umesh, S., Cohen, L., Marinovic, N., Nelson, D.J.: Scale transform in speech analysis. IEEE Transactions on Speech and Audio Processing 7(1), 40–45 (1999)
Liu, C.S., Huang, C.S., Lin, M.T., Wang, H.C.: Automatic speaker recognition based upon various distances of LSP frequencies. In: Proc. IEEE International Carnahan Conference on Security Technology, pp. 104–109 (1991)
Fukunaga, K.: Introduction to Statistical Pattern Recognition. Academic Press, London (1990)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Malegaonkar, A., Ariyaeeinia, A., Sivakumaran, P., Pillay, S. (2008). Discrimination Effectiveness of Speech Cepstral Features. In: Schouten, B., Juul, N.C., Drygajlo, A., Tistarelli, M. (eds) Biometrics and Identity Management. BioID 2008. Lecture Notes in Computer Science, vol 5372. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89991-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-89991-4_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89990-7
Online ISBN: 978-3-540-89991-4
eBook Packages: Computer ScienceComputer Science (R0)