Skip to main content

Robust Speaker Recognition Based on Low-Level- and Prosodic-Level-Features

  • Conference paper
  • First Online:
Advances in Data Sciences, Security and Applications

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 612))

Abstract

Speaker recognition is an important task in security applications where a person is recognized through speech input. In speaker recognition, a person is recognized from his or her voice. As no two individuals have same voice and also they have different speaking style, rhythm, tone, etc., speaker is recognized by extracting low-level spectral features and high-level behavioural features. This paper presents a robust speaker recognition approach which combines spectral features and prosodic features to improve the performance. The robust recognition system has been tested under different SNR levels. Two subsystems are implemented (i) speaker recognition based on low-level features such as Mel-frequency cepstral coefficient (MFCC) features. (ii) combined system with MFCC and prosodic features. These subsystems are able to achieve competitive results in classifying different speakers. Experimental results are done on interactive emotional dyadic motion capture database (IEMOCAP). The fusion of low-level and prosodic features achieve approximate 15–20% improvement in accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kinnunen T, Li H (2009) An overview of text-independent speaker recognition from features to supervectors. Speech Commun 52:12–40 (Elsevier)

    Article  Google Scholar 

  2. Muhaseena TK, Lekshmi MS (2016) A model for pitch estimation using wavelet packet transform based cepstrum method. IEEE Access Multidiscip J 24:1061–1067

    Google Scholar 

  3. Zhao X, Wang Y, Wang D (2014) Robust speaker identification in noisy and reverberant conditions. IEEE Trans Audio Speech Lang Process 12(4):836–845

    Article  Google Scholar 

  4. Shaver CD, Acken JM (2010) Effects of equipment variation on speaker recognition error rates. In: IEEE conference. ICASSP, pp 1814–1817

    Google Scholar 

  5. Busso C, Bulut M, Lee CC, Kazemzadeh A, Mower E, Kim S, Chang JN, Lee S, Narayanan SS (2008) IEMOCAP interactive emotional dyadic motion capture database. J Lang Resour Eval 42(4):335–359

    Article  Google Scholar 

  6. Yan F, Men A, Yang B, Jiang Z (2016) An improved ranking-based feature enhancement approach for robust speaker recognition. IEEE Access Multidiscip J 4:5258–5267

    Article  Google Scholar 

  7. Maesa A, Garzia F, Scarpiniti M, Roberto C (2012) Text independent automatic speaker recognition system using mel-frequency cepstrum coefficient and gaussian mixture models. J Inf Secur 3:335–340

    Google Scholar 

  8. Campbell JP, Reynolds DA, Dunn RB (2003) Fusing high and low level features for speaker recognition. Eurospeech:2665–2668

    Google Scholar 

  9. Chakroborty S, Roy A, Saha G (2006) Fusion of a complementary feature set with MFCC for improved closed set text-independent speaker identification. In: IEEE international conference on computing and processing, pp 387–389

    Google Scholar 

  10. Polzehl T, Schmitt A, Metze F (2011) Anger recognition in speech using acoustic and linguistic cues. Speech Commun 53:1198–1209 (Elsevier)

    Article  Google Scholar 

  11. Kumar P, Chandra M (2011) Hybrid of wavelet and MFCC feature for speaker verification. In: IEEE conference on information and communication technologies, pp 1150–1154

    Google Scholar 

  12. B Yu, Li H, Fang C (2012) Speech emotion recognition based on optimized support vector machine. J Softw 7(12):2726–2733

    Google Scholar 

  13. Shriberg E (2007) Higher-level features in speaker recognition. In: Speaker classification. Springer, Berlin, pp 241–259

    Google Scholar 

  14. Sun H, Ma B, Li H (2008) An efficient feature selection method for speaker recognition. In: IEEE conference on Chinese spoken language processing, pp 1–4

    Google Scholar 

  15. Wu CH (2011) Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Trans Affect Comput 2(1):10–21

    Article  Google Scholar 

  16. Campbell WM, Campbell JP, Gleason TP, Reynold DA (2007) Speaker verification using support vector machine and high level features. IEEE Trans Audio Speech Lang Process 15:2085–2094

    Article  Google Scholar 

  17. Campbell WM (2006) Compensating for mismatch in high level speaker recognition. In: Speaker and language recognition workshop. IEEE, Odyssey

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. M. Jagdale .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jagdale, S.M., Shinde, A.A., Chitode, J.S. (2020). Robust Speaker Recognition Based on Low-Level- and Prosodic-Level-Features. In: Jain, V., Chaudhary, G., Taplamacioglu, M., Agarwal, M. (eds) Advances in Data Sciences, Security and Applications. Lecture Notes in Electrical Engineering, vol 612. Springer, Singapore. https://doi.org/10.1007/978-981-15-0372-6_20

Download citation

Publish with us

Policies and ethics