Robust Speaker Recognition Based on Low-Level- and Prosodic-Level-Features

Jagdale, S. M.; Shinde, A. A.; Chitode, J. S.

doi:10.1007/978-981-15-0372-6_20

S. M. Jagdale³⁸,
A. A. Shinde³⁹ &
J. S. Chitode³⁹

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 612))

591 Accesses
3 Citations

Abstract

Speaker recognition is an important task in security applications where a person is recognized through speech input. In speaker recognition, a person is recognized from his or her voice. As no two individuals have same voice and also they have different speaking style, rhythm, tone, etc., speaker is recognized by extracting low-level spectral features and high-level behavioural features. This paper presents a robust speaker recognition approach which combines spectral features and prosodic features to improve the performance. The robust recognition system has been tested under different SNR levels. Two subsystems are implemented (i) speaker recognition based on low-level features such as Mel-frequency cepstral coefficient (MFCC) features. (ii) combined system with MFCC and prosodic features. These subsystems are able to achieve competitive results in classifying different speakers. Experimental results are done on interactive emotional dyadic motion capture database (IEMOCAP). The fusion of low-level and prosodic features achieve approximate 15–20% improvement in accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kinnunen T, Li H (2009) An overview of text-independent speaker recognition from features to supervectors. Speech Commun 52:12–40 (Elsevier)
Article Google Scholar
Muhaseena TK, Lekshmi MS (2016) A model for pitch estimation using wavelet packet transform based cepstrum method. IEEE Access Multidiscip J 24:1061–1067
Google Scholar
Zhao X, Wang Y, Wang D (2014) Robust speaker identification in noisy and reverberant conditions. IEEE Trans Audio Speech Lang Process 12(4):836–845
Article Google Scholar
Shaver CD, Acken JM (2010) Effects of equipment variation on speaker recognition error rates. In: IEEE conference. ICASSP, pp 1814–1817
Google Scholar
Busso C, Bulut M, Lee CC, Kazemzadeh A, Mower E, Kim S, Chang JN, Lee S, Narayanan SS (2008) IEMOCAP interactive emotional dyadic motion capture database. J Lang Resour Eval 42(4):335–359
Article Google Scholar
Yan F, Men A, Yang B, Jiang Z (2016) An improved ranking-based feature enhancement approach for robust speaker recognition. IEEE Access Multidiscip J 4:5258–5267
Article Google Scholar
Maesa A, Garzia F, Scarpiniti M, Roberto C (2012) Text independent automatic speaker recognition system using mel-frequency cepstrum coefficient and gaussian mixture models. J Inf Secur 3:335–340
Google Scholar
Campbell JP, Reynolds DA, Dunn RB (2003) Fusing high and low level features for speaker recognition. Eurospeech:2665–2668
Google Scholar
Chakroborty S, Roy A, Saha G (2006) Fusion of a complementary feature set with MFCC for improved closed set text-independent speaker identification. In: IEEE international conference on computing and processing, pp 387–389
Google Scholar
Polzehl T, Schmitt A, Metze F (2011) Anger recognition in speech using acoustic and linguistic cues. Speech Commun 53:1198–1209 (Elsevier)
Article Google Scholar
Kumar P, Chandra M (2011) Hybrid of wavelet and MFCC feature for speaker verification. In: IEEE conference on information and communication technologies, pp 1150–1154
Google Scholar
B Yu, Li H, Fang C (2012) Speech emotion recognition based on optimized support vector machine. J Softw 7(12):2726–2733
Google Scholar
Shriberg E (2007) Higher-level features in speaker recognition. In: Speaker classification. Springer, Berlin, pp 241–259
Google Scholar
Sun H, Ma B, Li H (2008) An efficient feature selection method for speaker recognition. In: IEEE conference on Chinese spoken language processing, pp 1–4
Google Scholar
Wu CH (2011) Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Trans Affect Comput 2(1):10–21
Article Google Scholar
Campbell WM, Campbell JP, Gleason TP, Reynold DA (2007) Speaker verification using support vector machine and high level features. IEEE Trans Audio Speech Lang Process 15:2085–2094
Article Google Scholar
Campbell WM (2006) Compensating for mismatch in high level speaker recognition. In: Speaker and language recognition workshop. IEEE, Odyssey
Google Scholar

Download references

Author information

Authors and Affiliations

Bharati Vidyapeeth (Deemed to be) University COE, Pune, Maharashtra, India
S. M. Jagdale
Department of Electronics, Bharati Vidyapeeth (Deemed to be) University COE, Pune, Maharashtra, India
A. A. Shinde & J. S. Chitode

Authors

S. M. Jagdale
View author publications
You can also search for this author in PubMed Google Scholar
A. A. Shinde
View author publications
You can also search for this author in PubMed Google Scholar
J. S. Chitode
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. M. Jagdale .

Editor information

Editors and Affiliations

Bharati Vidyapeeth’s College of Engineering, New Delhi, Delhi, India
Vanita Jain
Bharati Vidyapeeth’s College of Engineering, New Delhi, Delhi, India
Gopal Chaudhary
Gazi University, Ankara, Turkey
M. Cengiz Taplamacioglu
Indian Institute of Technology Mumbai, Mumbai, Maharashtra, India
M. S. Agarwal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jagdale, S.M., Shinde, A.A., Chitode, J.S. (2020). Robust Speaker Recognition Based on Low-Level- and Prosodic-Level-Features. In: Jain, V., Chaudhary, G., Taplamacioglu, M., Agarwal, M. (eds) Advances in Data Sciences, Security and Applications. Lecture Notes in Electrical Engineering, vol 612. Springer, Singapore. https://doi.org/10.1007/978-981-15-0372-6_20

Download citation

DOI: https://doi.org/10.1007/978-981-15-0372-6_20
Published: 03 December 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0371-9
Online ISBN: 978-981-15-0372-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics