Skip to main content

Recent advances in speaker recognition

  • Text-independent Speaker Authentication
  • Conference paper
  • First Online:
Audio- and Video-based Biometric Person Authentication (AVBPA 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1206))

Abstract

This paper introduces recent advances in speaker recognition technology. The first part discusses general topics and issues. The second part is devoted to a discussion of more specific topics of recent interest that have led to interesting new approaches and techniques. They include VQ- and ergodic-HMM-based text-independent recognition methods, a text-prompted recognition method, parameter/distance normalization and model adaptation techniques, and methods of updating models and a priori thresholds in speaker verification. Although many recent advances and successes have been achieved in speaker recognition, there are still many problems for which good solutions remain to be found. The last part of this paper describes 16 open questions about speaker recognition. The paper concludes with a short discussion assessing the current status and future possibilities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. B. S. Atal, “Automatic Speaker Recognition Based on Pitch Contours”, J. Acoust. Soc. Am., Vol. 52, No. 6, pp. 1687–1697 (1972).

    Google Scholar 

  2. B. S. Atal, “Effectiveness of Linear Prediction Characteristics of the Speech Wave for Automatic Speaker Identification and Verification”, J. Acoust. Soc. Am., Vol. 55, No. 6, pp. 1304–1312 (1974).

    Google Scholar 

  3. M. J. Carey and E. S. Parris, “Speaker Verification Using Connected Words”, Proc. Institute of Acoustics, Vol. 14, Part 6, pp. 95–100 (1992).

    Google Scholar 

  4. M. J. Carey, E. S. Parris, H. Lloyd-Thomas and S. Bennet, “Robust Prosodic Features for Speaker Identification,” Proc. Int. Conf. Spoken Language Processing, Philadelphia, pp. 1800–1803 (1996).

    Google Scholar 

  5. G. R. Doddington, “Speaker Recognition—Identifying People by their Voices”, Proc. IEEE, Vol. 73, No. 11, pp. 1651–1664 (1985).

    Google Scholar 

  6. J. Eatock and J. S. Mason, “Automatically Focusing on Good Discriminating Speech Segments in Speaker Recognition”, Proc. Int. Conf. Spoken Language Processing, 5.2, pp. 133–136 (1990).

    Google Scholar 

  7. S. Furui, F. Itakura and S. Saito, “Talker Recognition by Longtime Averaged Speech Spectrum”, Trans. IECE, 55-A, Vol. 1, No. 10, pp. 549–556 (1972).

    Google Scholar 

  8. S. Furui, “An Analysis of Long-Term Variation of Feature Parameters of Speech and its Application to Talker Recognition”, Trans. IECE, 57-A, Vol. 12, pp. 880–887 (1974).

    Google Scholar 

  9. S. Furui, “Cepstral Analysis Technique for Automatic Speaker Verification”, IEEE Trans. Acoust. Speech, Signal Processing, Vol. 29, No. 2, pp. 254–272 (1981).

    Google Scholar 

  10. S. Furui, “Research on Individuality Features in Speech Waves and Automatic Speaker Recognition Techniques”, Speech Communication, Vol. 5, No. 2, pp. 183–197 (1986).

    Google Scholar 

  11. S. Furui, “Digital Speech Processing, Synthesis, and Recognition,” Marcel Dekker, New York (1989).

    Google Scholar 

  12. S. Furui, “Speaker-Independent and Speaker-Adaptive Recognition Techniques”, in Advances in Speech Signal Processing (eds. S. Furui and M. M. Sondhi), Marcel Dekker, New York, pp. 597–622 (1991).

    Google Scholar 

  13. S. Furui, “Speaker-Dependent-Feature Extraction, Recognition and Processing Techniques”, Speech Communication, Vol. 10, No. 5–6, pp. 505–520 (1991).

    Google Scholar 

  14. S. Furui, “An Overview of Speaker Recognition Technology,” ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, pp. 1–9 (1994).

    Google Scholar 

  15. M. J. F. Gales and S. J. Young, “HMM Recognition in Noise Using Parallel Model Combination,” Proc. Eurospeech, Berlin, pp. II-837–840 (1993).

    Google Scholar 

  16. J. L. Gauvain, L. F. Lamel and B. Prouts, “Experiments with Speaker Verification over the Telephone,” Proc. Eurospeech, Madrid, pp. 651–654 (1995).

    Google Scholar 

  17. H. Gish, M.-H. Siu and R. Rohlicek, “Segregation of Speakers for Speech Recognition and Speaker Identification,” Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, Toronto, S13.11, pp. 873–876 (1991).

    Google Scholar 

  18. J. Godfrey, D. Graff and A. Martin, “Public Databases for Speaker Recognition and Verification,” ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, pp. 39–42 (1994).

    Google Scholar 

  19. C. Griffin, T. Matsui and S. Furui, “Distance Measures for Text-Independent Speaker Recognition Based on MAR Model”, Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, Adelaide, 23.6, pp. I–309–312 (1994).

    Google Scholar 

  20. A. L. Higgins and R. E. Wohlford, “A New Method of Text-Independent Speaker Recognition”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 17.3, pp. 869–872 (1986).

    Google Scholar 

  21. A. Higgins, L. Bahler and J. Porter, “Speaker Verification Using Randomized Phrase Prompting”, Digital Signal Processing, Vol. 1, pp. 89–106 (1991).

    Google Scholar 

  22. B.-H. Juang and F. K. Soong, “Speaker Recognition Based on Source Coding Approaches”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, S5.4, pp. 613–616 (1990).

    Google Scholar 

  23. H. J. Kunzel, “Current Approaches to Forensic Speaker Recognition,” ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, pp.135–141 (1994).

    Google Scholar 

  24. K.-P. Li and E. H. Wrench Jr., “An Approach to Text-Independent Speaker Recognition with Short Utterances”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 12.9, pp. 555–558 (1983).

    Google Scholar 

  25. J. D. Markel, B. T. Oshika and A. H. Gray, “Long-Term Feature Averaging for Speaker Recognition”, IEEE Trans. Acoust. Speech Signal Processing, Vol. ASSP-25, No. 4, pp. 330–337 (1977).

    Google Scholar 

  26. J. D. Markel and S. B. Davi, “Text-Independent Speaker Recognition from a Large Linguistically Unconstrained Time-Spaced Data Base”, IEEE Trans. Acoust. Speech Signal Processing, Vol. ASSP-27, No. 1, pp. 74–82 (1979).

    Google Scholar 

  27. F. Martin, K. Shikano and Y. Minami, “Recognition of Noisy Speech by Composition of Hidden Markov Models,” Proc. Eurospeech, Berlin, pp. II-1031–1034 (1993).

    Google Scholar 

  28. T. Matsui and S. Furui, “Text-Independent Speaker Recognition Using Vocal Tract and Pitch Information”, Proc. Int. Conf. Spoken Language Processing, Kobe, 5.3, pp. 137–140 (1990).

    Google Scholar 

  29. T. Matsui and S. Furui, “A Text-Independent Speaker Recognition Method Robust Against Utterance Variations”, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, S6.3, pp. 377–380 (1991).

    Google Scholar 

  30. T. Matsui and S. Furui, “Comparison of Text-Independent Speaker Recognition Methods Using VQ-Distortion and Discrete/Continuous HMMs,” Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, San Francisco, pp. II-157–160 (1992).

    Google Scholar 

  31. T. Matsui and S. Furui, “Concatenated Phoneme Models for Text-Variable Speaker Recognition,” Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, Minneapolis, pp. II-391–394 (1993).

    Google Scholar 

  32. T. Matsui and S. Furui, “Similarity Normalization Method for Speaker Verification Based on a Posteriori Probability,” ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, pp. 59–62 (1994).

    Google Scholar 

  33. T. Matsui and S. Furui, “Speaker Adaptation of Tied-Mixture-Based Phoneme Models for Text-Prompted Speaker Recognition,” Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, Adelaide, 13.1 (1994).

    Google Scholar 

  34. T. Matsui and S. Furui, “Robust Methods of Updating Model and A Priori Threshold in Speaker Verification,” Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, Atlanta, pp. I-97–100 (1996).

    Google Scholar 

  35. T. Matsui and S. Furui, “Speaker Recognition Using HMM Composition in Noisy Environments”, Computer Speech and Language, Vol. 10, pp. 107–116 (1996)

    Google Scholar 

  36. C. Montacie et al., “Cinematic Techniques for Speech Processing: Temporal Decomposition and Multivariate Linear Prediction,” Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, San Francisco, pp. I-153–156 (1992).

    Google Scholar 

  37. J. M. Naik, L. P. Netsch, and G. R. Doddington, “Speaker Verification over Long Distance Telephone Lines”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, S10b.3, pp. 524–527 (1989).

    Google Scholar 

  38. J. Naik, “Speaker Verification over the Telephone Network: Databases, Algorithms and Performance Assessment,” ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, pp. 31–38 (1994).

    Google Scholar 

  39. M. Newman, L. Gillick, Y. Ito, D. McAllaster and B. Peskin, “Speaker Verification through Large Vocabulary Continuous Speech Recognition,” Proc. Int. Conf. Spoken Language Processing, Philadelphia, pp. 2419–2422 (1996).

    Google Scholar 

  40. D. O' Shaugnessy, “Speaker Recognition”, IEEE ASSP Magazine, 3, No. 4, pp. 4–17 (1986).

    Google Scholar 

  41. A. B. Poritz, “Linear Predictive Hidden Markov Models and the Speech Signal”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, S11.5, pp. 1291–1294 (1982).

    Google Scholar 

  42. D. Reynolds, “Speaker Identification and Verification Using Gaussian Mixture Speaker Models,” ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, pp.27–30 (1994).

    Google Scholar 

  43. R. C. Rose and R. A. Reynolds, “Text Independent Speaker Identification Using Automatic Acoustic Segmentation”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, S51.10, pp. 293–296 (1990).

    Google Scholar 

  44. R. C. Rose, E. M. Hofstetter, and D. A. Reynolds, “Integrated Models of Signal and Background with Application to Speaker Identification in Noise”, IEEE Trans. Speech and Audio Processing, Vol. 2, No. 2, pp. 245–257 (1994).

    Google Scholar 

  45. A. E. Rosenberg and F. K. Soong, “Evaluation of a Vector Quantization Talker Recognition System in Text Independent and Text Dependent Modes”, Computer Speech and Language, 22, pp. 143–157 (1987).

    Google Scholar 

  46. A. E. Rosenberg, C.-H. Lee and F. K. Soong, “Sub-Word Unit Talker Verification Using Hidden Markov Models”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, S5.3, pp. 269–272 (1990).

    Google Scholar 

  47. A. E. Rosenberg, C.-H. Lee, F. K. Soong and M. A. McGee, “Experiments in Automatic Talker Verification Using Sub-Word Unit Hidden Markov Models”, Proc. Int. Conf. Spoken Language Processing, 5.4, pp. 141–144 (1990).

    Google Scholar 

  48. A. E. Rosenberg, C.-H. Lee, and S. Gokcen, “Connected Word Talker Verification Using Whole Word Hidden Markov Models”, Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, Toronto, S6.4, pp. 381–384 (1991).

    Google Scholar 

  49. A. E. Rosenberg and F. K. Soong, “Recent Research in Automatic Speaker Recognition”, in Advances in Speech Signal Processing (eds. S. Furui and M. M. Sondhi), Marcel Dekker, New York, pp. 701–737 (1991).

    Google Scholar 

  50. A. E. Rosenberg, “The Use of Cohort Normalized Scores for Speaker Verification,” Proc. Int. Conf. Spoken Language Processing, Banff, Th.sAM.4.2, pp. 599–602 (1992).

    Google Scholar 

  51. M. Savic and S. K. Gupta, “Variable Parameter Speaker Verification System Based on Hidden Markov Modeling”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, S5.7, pp. 281–284 (1990).

    Google Scholar 

  52. A. Setlur and T. Jacobs, “Results of a Speaker Verification Service Trial Using HMM Models,” EUROSPEECH'95, Madrid, pp. 639–642 (1995)

    Google Scholar 

  53. K. Shikano, “Text-Independent Speaker Recognition Experiments Using Codebooks in Vector Quantization”, J. Acoust. Soc. Am. (abstract), Suppl. 1, No. 77, p. S11 (1985).

    Google Scholar 

  54. M.-H. Siu, G. Yu and H. Gish, “An Unsupervised, Sequential Learning Algorithm for the Segmentation of Speech Waveforms with Multiple Speakers,” Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, San Francisco, pp. I-189–192 (1992).

    Google Scholar 

  55. F. K. Soong, A. E. Rosenberg, and B.-H. Juang, “A Vector Quantization Approach to Speaker Recognition”, AT&T Technical Journal, No. 66, pp. 14–26 (1987).

    Google Scholar 

  56. F. K. Soong and A. E. Rosenberg, “On the Use of Instantaneous and Transitional Spectral Information in Speaker Recognition”, IEEE Trans. Acoust. Speech, Signal Processing, Vol. ASSP-36, No. 6, pp. 871–879 (1988).

    Google Scholar 

  57. M. Sugiyama, “Segment Based Text Independent Speaker Recognition,” Proc. Spring Meeting of Acoust. Soc. Japan (in Japanese), pp. 75–76 (1988).

    Google Scholar 

  58. N. Z. Tishby, “On the Application of Mixture AR Hidden Markov Models to Text Independent Speaker Recognition”, IEEE Trans. Acoust. Speech, Signal Processing, Vol. ASSP-30, No. 3, pp. 563–570 (1991).

    Google Scholar 

  59. L. Wilcox, F. Chen, D, Kimber and V. Balasubramanian, “Segmentation of Speech Using Speaker Identification,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. I-161–164 (1994).

    Google Scholar 

  60. Y.-C. Zheng and B.-Z. Yuan, “Text-Dependent Speaker Identification Using Circular Hidden Markov Models”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, S13.3, pp. 580–582 (1988).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Josef Bigün Gérard Chollet Gunilla Borgefors

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Furui, S. (1997). Recent advances in speaker recognition. In: Bigün, J., Chollet, G., Borgefors, G. (eds) Audio- and Video-based Biometric Person Authentication. AVBPA 1997. Lecture Notes in Computer Science, vol 1206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0016001

Download citation

  • DOI: https://doi.org/10.1007/BFb0016001

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-62660-2

  • Online ISBN: 978-3-540-68425-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics