Recent advances in speaker recognition

Furui, Sadaoki

doi:10.1007/BFb0016001

Sadaoki Furui^1,2

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1206))

Included in the following conference series:

International Conference on Audio- and Video-Based Biometric Person Authentication

2601 Accesses
18 Citations

Abstract

This paper introduces recent advances in speaker recognition technology. The first part discusses general topics and issues. The second part is devoted to a discussion of more specific topics of recent interest that have led to interesting new approaches and techniques. They include VQ- and ergodic-HMM-based text-independent recognition methods, a text-prompted recognition method, parameter/distance normalization and model adaptation techniques, and methods of updating models and a priori thresholds in speaker verification. Although many recent advances and successes have been achieved in speaker recognition, there are still many problems for which good solutions remain to be found. The last part of this paper describes 16 open questions about speaker recognition. The paper concludes with a short discussion assessing the current status and future possibilities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

B. S. Atal, “Automatic Speaker Recognition Based on Pitch Contours”, J. Acoust. Soc. Am., Vol. 52, No. 6, pp. 1687–1697 (1972).
Google Scholar
B. S. Atal, “Effectiveness of Linear Prediction Characteristics of the Speech Wave for Automatic Speaker Identification and Verification”, J. Acoust. Soc. Am., Vol. 55, No. 6, pp. 1304–1312 (1974).
Google Scholar
M. J. Carey and E. S. Parris, “Speaker Verification Using Connected Words”, Proc. Institute of Acoustics, Vol. 14, Part 6, pp. 95–100 (1992).
Google Scholar
M. J. Carey, E. S. Parris, H. Lloyd-Thomas and S. Bennet, “Robust Prosodic Features for Speaker Identification,” Proc. Int. Conf. Spoken Language Processing, Philadelphia, pp. 1800–1803 (1996).
Google Scholar
G. R. Doddington, “Speaker Recognition—Identifying People by their Voices”, Proc. IEEE, Vol. 73, No. 11, pp. 1651–1664 (1985).
Google Scholar
J. Eatock and J. S. Mason, “Automatically Focusing on Good Discriminating Speech Segments in Speaker Recognition”, Proc. Int. Conf. Spoken Language Processing, 5.2, pp. 133–136 (1990).
Google Scholar
S. Furui, F. Itakura and S. Saito, “Talker Recognition by Longtime Averaged Speech Spectrum”, Trans. IECE, 55-A, Vol. 1, No. 10, pp. 549–556 (1972).
Google Scholar
S. Furui, “An Analysis of Long-Term Variation of Feature Parameters of Speech and its Application to Talker Recognition”, Trans. IECE, 57-A, Vol. 12, pp. 880–887 (1974).
Google Scholar
S. Furui, “Cepstral Analysis Technique for Automatic Speaker Verification”, IEEE Trans. Acoust. Speech, Signal Processing, Vol. 29, No. 2, pp. 254–272 (1981).
Google Scholar
S. Furui, “Research on Individuality Features in Speech Waves and Automatic Speaker Recognition Techniques”, Speech Communication, Vol. 5, No. 2, pp. 183–197 (1986).
Google Scholar
S. Furui, “Digital Speech Processing, Synthesis, and Recognition,” Marcel Dekker, New York (1989).
Google Scholar
S. Furui, “Speaker-Independent and Speaker-Adaptive Recognition Techniques”, in Advances in Speech Signal Processing (eds. S. Furui and M. M. Sondhi), Marcel Dekker, New York, pp. 597–622 (1991).
Google Scholar
S. Furui, “Speaker-Dependent-Feature Extraction, Recognition and Processing Techniques”, Speech Communication, Vol. 10, No. 5–6, pp. 505–520 (1991).
Google Scholar
S. Furui, “An Overview of Speaker Recognition Technology,” ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, pp. 1–9 (1994).
Google Scholar
M. J. F. Gales and S. J. Young, “HMM Recognition in Noise Using Parallel Model Combination,” Proc. Eurospeech, Berlin, pp. II-837–840 (1993).
Google Scholar
J. L. Gauvain, L. F. Lamel and B. Prouts, “Experiments with Speaker Verification over the Telephone,” Proc. Eurospeech, Madrid, pp. 651–654 (1995).
Google Scholar
H. Gish, M.-H. Siu and R. Rohlicek, “Segregation of Speakers for Speech Recognition and Speaker Identification,” Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, Toronto, S13.11, pp. 873–876 (1991).
Google Scholar
J. Godfrey, D. Graff and A. Martin, “Public Databases for Speaker Recognition and Verification,” ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, pp. 39–42 (1994).
Google Scholar
C. Griffin, T. Matsui and S. Furui, “Distance Measures for Text-Independent Speaker Recognition Based on MAR Model”, Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, Adelaide, 23.6, pp. I–309–312 (1994).
Google Scholar
A. L. Higgins and R. E. Wohlford, “A New Method of Text-Independent Speaker Recognition”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 17.3, pp. 869–872 (1986).
Google Scholar
A. Higgins, L. Bahler and J. Porter, “Speaker Verification Using Randomized Phrase Prompting”, Digital Signal Processing, Vol. 1, pp. 89–106 (1991).
Google Scholar
B.-H. Juang and F. K. Soong, “Speaker Recognition Based on Source Coding Approaches”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, S5.4, pp. 613–616 (1990).
Google Scholar
H. J. Kunzel, “Current Approaches to Forensic Speaker Recognition,” ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, pp.135–141 (1994).
Google Scholar
K.-P. Li and E. H. Wrench Jr., “An Approach to Text-Independent Speaker Recognition with Short Utterances”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 12.9, pp. 555–558 (1983).
Google Scholar
J. D. Markel, B. T. Oshika and A. H. Gray, “Long-Term Feature Averaging for Speaker Recognition”, IEEE Trans. Acoust. Speech Signal Processing, Vol. ASSP-25, No. 4, pp. 330–337 (1977).
Google Scholar
J. D. Markel and S. B. Davi, “Text-Independent Speaker Recognition from a Large Linguistically Unconstrained Time-Spaced Data Base”, IEEE Trans. Acoust. Speech Signal Processing, Vol. ASSP-27, No. 1, pp. 74–82 (1979).
Google Scholar
F. Martin, K. Shikano and Y. Minami, “Recognition of Noisy Speech by Composition of Hidden Markov Models,” Proc. Eurospeech, Berlin, pp. II-1031–1034 (1993).
Google Scholar
T. Matsui and S. Furui, “Text-Independent Speaker Recognition Using Vocal Tract and Pitch Information”, Proc. Int. Conf. Spoken Language Processing, Kobe, 5.3, pp. 137–140 (1990).
Google Scholar
T. Matsui and S. Furui, “A Text-Independent Speaker Recognition Method Robust Against Utterance Variations”, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, S6.3, pp. 377–380 (1991).
Google Scholar
T. Matsui and S. Furui, “Comparison of Text-Independent Speaker Recognition Methods Using VQ-Distortion and Discrete/Continuous HMMs,” Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, San Francisco, pp. II-157–160 (1992).
Google Scholar
T. Matsui and S. Furui, “Concatenated Phoneme Models for Text-Variable Speaker Recognition,” Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, Minneapolis, pp. II-391–394 (1993).
Google Scholar
T. Matsui and S. Furui, “Similarity Normalization Method for Speaker Verification Based on a Posteriori Probability,” ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, pp. 59–62 (1994).
Google Scholar
T. Matsui and S. Furui, “Speaker Adaptation of Tied-Mixture-Based Phoneme Models for Text-Prompted Speaker Recognition,” Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, Adelaide, 13.1 (1994).
Google Scholar
T. Matsui and S. Furui, “Robust Methods of Updating Model and A Priori Threshold in Speaker Verification,” Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, Atlanta, pp. I-97–100 (1996).
Google Scholar
T. Matsui and S. Furui, “Speaker Recognition Using HMM Composition in Noisy Environments”, Computer Speech and Language, Vol. 10, pp. 107–116 (1996)
Google Scholar
C. Montacie et al., “Cinematic Techniques for Speech Processing: Temporal Decomposition and Multivariate Linear Prediction,” Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, San Francisco, pp. I-153–156 (1992).
Google Scholar
J. M. Naik, L. P. Netsch, and G. R. Doddington, “Speaker Verification over Long Distance Telephone Lines”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, S10b.3, pp. 524–527 (1989).
Google Scholar
J. Naik, “Speaker Verification over the Telephone Network: Databases, Algorithms and Performance Assessment,” ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, pp. 31–38 (1994).
Google Scholar
M. Newman, L. Gillick, Y. Ito, D. McAllaster and B. Peskin, “Speaker Verification through Large Vocabulary Continuous Speech Recognition,” Proc. Int. Conf. Spoken Language Processing, Philadelphia, pp. 2419–2422 (1996).
Google Scholar
D. O' Shaugnessy, “Speaker Recognition”, IEEE ASSP Magazine, 3, No. 4, pp. 4–17 (1986).
Google Scholar
A. B. Poritz, “Linear Predictive Hidden Markov Models and the Speech Signal”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, S11.5, pp. 1291–1294 (1982).
Google Scholar
D. Reynolds, “Speaker Identification and Verification Using Gaussian Mixture Speaker Models,” ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, pp.27–30 (1994).
Google Scholar
R. C. Rose and R. A. Reynolds, “Text Independent Speaker Identification Using Automatic Acoustic Segmentation”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, S51.10, pp. 293–296 (1990).
Google Scholar
R. C. Rose, E. M. Hofstetter, and D. A. Reynolds, “Integrated Models of Signal and Background with Application to Speaker Identification in Noise”, IEEE Trans. Speech and Audio Processing, Vol. 2, No. 2, pp. 245–257 (1994).
Google Scholar
A. E. Rosenberg and F. K. Soong, “Evaluation of a Vector Quantization Talker Recognition System in Text Independent and Text Dependent Modes”, Computer Speech and Language, 22, pp. 143–157 (1987).
Google Scholar
A. E. Rosenberg, C.-H. Lee and F. K. Soong, “Sub-Word Unit Talker Verification Using Hidden Markov Models”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, S5.3, pp. 269–272 (1990).
Google Scholar
A. E. Rosenberg, C.-H. Lee, F. K. Soong and M. A. McGee, “Experiments in Automatic Talker Verification Using Sub-Word Unit Hidden Markov Models”, Proc. Int. Conf. Spoken Language Processing, 5.4, pp. 141–144 (1990).
Google Scholar
A. E. Rosenberg, C.-H. Lee, and S. Gokcen, “Connected Word Talker Verification Using Whole Word Hidden Markov Models”, Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, Toronto, S6.4, pp. 381–384 (1991).
Google Scholar
A. E. Rosenberg and F. K. Soong, “Recent Research in Automatic Speaker Recognition”, in Advances in Speech Signal Processing (eds. S. Furui and M. M. Sondhi), Marcel Dekker, New York, pp. 701–737 (1991).
Google Scholar
A. E. Rosenberg, “The Use of Cohort Normalized Scores for Speaker Verification,” Proc. Int. Conf. Spoken Language Processing, Banff, Th.sAM.4.2, pp. 599–602 (1992).
Google Scholar
M. Savic and S. K. Gupta, “Variable Parameter Speaker Verification System Based on Hidden Markov Modeling”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, S5.7, pp. 281–284 (1990).
Google Scholar
A. Setlur and T. Jacobs, “Results of a Speaker Verification Service Trial Using HMM Models,” EUROSPEECH'95, Madrid, pp. 639–642 (1995)
Google Scholar
K. Shikano, “Text-Independent Speaker Recognition Experiments Using Codebooks in Vector Quantization”, J. Acoust. Soc. Am. (abstract), Suppl. 1, No. 77, p. S11 (1985).
Google Scholar
M.-H. Siu, G. Yu and H. Gish, “An Unsupervised, Sequential Learning Algorithm for the Segmentation of Speech Waveforms with Multiple Speakers,” Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, San Francisco, pp. I-189–192 (1992).
Google Scholar
F. K. Soong, A. E. Rosenberg, and B.-H. Juang, “A Vector Quantization Approach to Speaker Recognition”, AT&T Technical Journal, No. 66, pp. 14–26 (1987).
Google Scholar
F. K. Soong and A. E. Rosenberg, “On the Use of Instantaneous and Transitional Spectral Information in Speaker Recognition”, IEEE Trans. Acoust. Speech, Signal Processing, Vol. ASSP-36, No. 6, pp. 871–879 (1988).
Google Scholar
M. Sugiyama, “Segment Based Text Independent Speaker Recognition,” Proc. Spring Meeting of Acoust. Soc. Japan (in Japanese), pp. 75–76 (1988).
Google Scholar
N. Z. Tishby, “On the Application of Mixture AR Hidden Markov Models to Text Independent Speaker Recognition”, IEEE Trans. Acoust. Speech, Signal Processing, Vol. ASSP-30, No. 3, pp. 563–570 (1991).
Google Scholar
L. Wilcox, F. Chen, D, Kimber and V. Balasubramanian, “Segmentation of Speech Using Speaker Identification,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. I-161–164 (1994).
Google Scholar
Y.-C. Zheng and B.-Z. Yuan, “Text-Dependent Speaker Identification Using Circular Hidden Markov Models”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, S13.3, pp. 580–582 (1988).
Google Scholar

Download references

Author information

Authors and Affiliations

NTT Human Interface Laboratories, 3-9-11,Midori-cho, Musashino-shi, 180, Tokyo, Japan
Sadaoki Furui
Tokyo Institute of Technology, 2-12-1, O-okayama, Meguro-ku, 152, Tokyo, Japan
Sadaoki Furui

Authors

Sadaoki Furui
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Josef Bigün Gérard Chollet Gunilla Borgefors

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Furui, S. (1997). Recent advances in speaker recognition. In: Bigün, J., Chollet, G., Borgefors, G. (eds) Audio- and Video-based Biometric Person Authentication. AVBPA 1997. Lecture Notes in Computer Science, vol 1206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0016001

Download citation

DOI: https://doi.org/10.1007/BFb0016001
Published: 10 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62660-2
Online ISBN: 978-3-540-68425-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics