Abstract
Speaker recognition is an important task in security applications where a person is recognized through speech input. In speaker recognition, a person is recognized from his or her voice. As no two individuals have same voice and also they have different speaking style, rhythm, tone, etc., speaker is recognized by extracting low-level spectral features and high-level behavioural features. This paper presents a robust speaker recognition approach which combines spectral features and prosodic features to improve the performance. The robust recognition system has been tested under different SNR levels. Two subsystems are implemented (i) speaker recognition based on low-level features such as Mel-frequency cepstral coefficient (MFCC) features. (ii) combined system with MFCC and prosodic features. These subsystems are able to achieve competitive results in classifying different speakers. Experimental results are done on interactive emotional dyadic motion capture database (IEMOCAP). The fusion of low-level and prosodic features achieve approximate 15–20% improvement in accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kinnunen T, Li H (2009) An overview of text-independent speaker recognition from features to supervectors. Speech Commun 52:12–40 (Elsevier)
Muhaseena TK, Lekshmi MS (2016) A model for pitch estimation using wavelet packet transform based cepstrum method. IEEE Access Multidiscip J 24:1061–1067
Zhao X, Wang Y, Wang D (2014) Robust speaker identification in noisy and reverberant conditions. IEEE Trans Audio Speech Lang Process 12(4):836–845
Shaver CD, Acken JM (2010) Effects of equipment variation on speaker recognition error rates. In: IEEE conference. ICASSP, pp 1814–1817
Busso C, Bulut M, Lee CC, Kazemzadeh A, Mower E, Kim S, Chang JN, Lee S, Narayanan SS (2008) IEMOCAP interactive emotional dyadic motion capture database. J Lang Resour Eval 42(4):335–359
Yan F, Men A, Yang B, Jiang Z (2016) An improved ranking-based feature enhancement approach for robust speaker recognition. IEEE Access Multidiscip J 4:5258–5267
Maesa A, Garzia F, Scarpiniti M, Roberto C (2012) Text independent automatic speaker recognition system using mel-frequency cepstrum coefficient and gaussian mixture models. J Inf Secur 3:335–340
Campbell JP, Reynolds DA, Dunn RB (2003) Fusing high and low level features for speaker recognition. Eurospeech:2665–2668
Chakroborty S, Roy A, Saha G (2006) Fusion of a complementary feature set with MFCC for improved closed set text-independent speaker identification. In: IEEE international conference on computing and processing, pp 387–389
Polzehl T, Schmitt A, Metze F (2011) Anger recognition in speech using acoustic and linguistic cues. Speech Commun 53:1198–1209 (Elsevier)
Kumar P, Chandra M (2011) Hybrid of wavelet and MFCC feature for speaker verification. In: IEEE conference on information and communication technologies, pp 1150–1154
B Yu, Li H, Fang C (2012) Speech emotion recognition based on optimized support vector machine. J Softw 7(12):2726–2733
Shriberg E (2007) Higher-level features in speaker recognition. In: Speaker classification. Springer, Berlin, pp 241–259
Sun H, Ma B, Li H (2008) An efficient feature selection method for speaker recognition. In: IEEE conference on Chinese spoken language processing, pp 1–4
Wu CH (2011) Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Trans Affect Comput 2(1):10–21
Campbell WM, Campbell JP, Gleason TP, Reynold DA (2007) Speaker verification using support vector machine and high level features. IEEE Trans Audio Speech Lang Process 15:2085–2094
Campbell WM (2006) Compensating for mismatch in high level speaker recognition. In: Speaker and language recognition workshop. IEEE, Odyssey
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Jagdale, S.M., Shinde, A.A., Chitode, J.S. (2020). Robust Speaker Recognition Based on Low-Level- and Prosodic-Level-Features. In: Jain, V., Chaudhary, G., Taplamacioglu, M., Agarwal, M. (eds) Advances in Data Sciences, Security and Applications. Lecture Notes in Electrical Engineering, vol 612. Springer, Singapore. https://doi.org/10.1007/978-981-15-0372-6_20
Download citation
DOI: https://doi.org/10.1007/978-981-15-0372-6_20
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0371-9
Online ISBN: 978-981-15-0372-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)