Abstract
This paper introduces recent advances in speaker recognition technology. The first part discusses general topics and issues. The second part is devoted to a discussion of more specific topics of recent interest that have led to interesting new approaches and techniques. They include VQ- and ergodic-HMM-based text-independent recognition methods, a text-prompted recognition method, parameter/distance normalization and model adaptation techniques, and methods of updating models and a priori thresholds in speaker verification. Although many recent advances and successes have been achieved in speaker recognition, there are still many problems for which good solutions remain to be found. The last part of this paper describes 16 open questions about speaker recognition. The paper concludes with a short discussion assessing the current status and future possibilities.
Preview
Unable to display preview. Download preview PDF.
References
B. S. Atal, “Automatic Speaker Recognition Based on Pitch Contours”, J. Acoust. Soc. Am., Vol. 52, No. 6, pp. 1687–1697 (1972).
B. S. Atal, “Effectiveness of Linear Prediction Characteristics of the Speech Wave for Automatic Speaker Identification and Verification”, J. Acoust. Soc. Am., Vol. 55, No. 6, pp. 1304–1312 (1974).
M. J. Carey and E. S. Parris, “Speaker Verification Using Connected Words”, Proc. Institute of Acoustics, Vol. 14, Part 6, pp. 95–100 (1992).
M. J. Carey, E. S. Parris, H. Lloyd-Thomas and S. Bennet, “Robust Prosodic Features for Speaker Identification,” Proc. Int. Conf. Spoken Language Processing, Philadelphia, pp. 1800–1803 (1996).
G. R. Doddington, “Speaker Recognition—Identifying People by their Voices”, Proc. IEEE, Vol. 73, No. 11, pp. 1651–1664 (1985).
J. Eatock and J. S. Mason, “Automatically Focusing on Good Discriminating Speech Segments in Speaker Recognition”, Proc. Int. Conf. Spoken Language Processing, 5.2, pp. 133–136 (1990).
S. Furui, F. Itakura and S. Saito, “Talker Recognition by Longtime Averaged Speech Spectrum”, Trans. IECE, 55-A, Vol. 1, No. 10, pp. 549–556 (1972).
S. Furui, “An Analysis of Long-Term Variation of Feature Parameters of Speech and its Application to Talker Recognition”, Trans. IECE, 57-A, Vol. 12, pp. 880–887 (1974).
S. Furui, “Cepstral Analysis Technique for Automatic Speaker Verification”, IEEE Trans. Acoust. Speech, Signal Processing, Vol. 29, No. 2, pp. 254–272 (1981).
S. Furui, “Research on Individuality Features in Speech Waves and Automatic Speaker Recognition Techniques”, Speech Communication, Vol. 5, No. 2, pp. 183–197 (1986).
S. Furui, “Digital Speech Processing, Synthesis, and Recognition,” Marcel Dekker, New York (1989).
S. Furui, “Speaker-Independent and Speaker-Adaptive Recognition Techniques”, in Advances in Speech Signal Processing (eds. S. Furui and M. M. Sondhi), Marcel Dekker, New York, pp. 597–622 (1991).
S. Furui, “Speaker-Dependent-Feature Extraction, Recognition and Processing Techniques”, Speech Communication, Vol. 10, No. 5–6, pp. 505–520 (1991).
S. Furui, “An Overview of Speaker Recognition Technology,” ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, pp. 1–9 (1994).
M. J. F. Gales and S. J. Young, “HMM Recognition in Noise Using Parallel Model Combination,” Proc. Eurospeech, Berlin, pp. II-837–840 (1993).
J. L. Gauvain, L. F. Lamel and B. Prouts, “Experiments with Speaker Verification over the Telephone,” Proc. Eurospeech, Madrid, pp. 651–654 (1995).
H. Gish, M.-H. Siu and R. Rohlicek, “Segregation of Speakers for Speech Recognition and Speaker Identification,” Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, Toronto, S13.11, pp. 873–876 (1991).
J. Godfrey, D. Graff and A. Martin, “Public Databases for Speaker Recognition and Verification,” ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, pp. 39–42 (1994).
C. Griffin, T. Matsui and S. Furui, “Distance Measures for Text-Independent Speaker Recognition Based on MAR Model”, Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, Adelaide, 23.6, pp. I–309–312 (1994).
A. L. Higgins and R. E. Wohlford, “A New Method of Text-Independent Speaker Recognition”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 17.3, pp. 869–872 (1986).
A. Higgins, L. Bahler and J. Porter, “Speaker Verification Using Randomized Phrase Prompting”, Digital Signal Processing, Vol. 1, pp. 89–106 (1991).
B.-H. Juang and F. K. Soong, “Speaker Recognition Based on Source Coding Approaches”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, S5.4, pp. 613–616 (1990).
H. J. Kunzel, “Current Approaches to Forensic Speaker Recognition,” ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, pp.135–141 (1994).
K.-P. Li and E. H. Wrench Jr., “An Approach to Text-Independent Speaker Recognition with Short Utterances”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 12.9, pp. 555–558 (1983).
J. D. Markel, B. T. Oshika and A. H. Gray, “Long-Term Feature Averaging for Speaker Recognition”, IEEE Trans. Acoust. Speech Signal Processing, Vol. ASSP-25, No. 4, pp. 330–337 (1977).
J. D. Markel and S. B. Davi, “Text-Independent Speaker Recognition from a Large Linguistically Unconstrained Time-Spaced Data Base”, IEEE Trans. Acoust. Speech Signal Processing, Vol. ASSP-27, No. 1, pp. 74–82 (1979).
F. Martin, K. Shikano and Y. Minami, “Recognition of Noisy Speech by Composition of Hidden Markov Models,” Proc. Eurospeech, Berlin, pp. II-1031–1034 (1993).
T. Matsui and S. Furui, “Text-Independent Speaker Recognition Using Vocal Tract and Pitch Information”, Proc. Int. Conf. Spoken Language Processing, Kobe, 5.3, pp. 137–140 (1990).
T. Matsui and S. Furui, “A Text-Independent Speaker Recognition Method Robust Against Utterance Variations”, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, S6.3, pp. 377–380 (1991).
T. Matsui and S. Furui, “Comparison of Text-Independent Speaker Recognition Methods Using VQ-Distortion and Discrete/Continuous HMMs,” Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, San Francisco, pp. II-157–160 (1992).
T. Matsui and S. Furui, “Concatenated Phoneme Models for Text-Variable Speaker Recognition,” Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, Minneapolis, pp. II-391–394 (1993).
T. Matsui and S. Furui, “Similarity Normalization Method for Speaker Verification Based on a Posteriori Probability,” ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, pp. 59–62 (1994).
T. Matsui and S. Furui, “Speaker Adaptation of Tied-Mixture-Based Phoneme Models for Text-Prompted Speaker Recognition,” Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, Adelaide, 13.1 (1994).
T. Matsui and S. Furui, “Robust Methods of Updating Model and A Priori Threshold in Speaker Verification,” Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, Atlanta, pp. I-97–100 (1996).
T. Matsui and S. Furui, “Speaker Recognition Using HMM Composition in Noisy Environments”, Computer Speech and Language, Vol. 10, pp. 107–116 (1996)
C. Montacie et al., “Cinematic Techniques for Speech Processing: Temporal Decomposition and Multivariate Linear Prediction,” Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, San Francisco, pp. I-153–156 (1992).
J. M. Naik, L. P. Netsch, and G. R. Doddington, “Speaker Verification over Long Distance Telephone Lines”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, S10b.3, pp. 524–527 (1989).
J. Naik, “Speaker Verification over the Telephone Network: Databases, Algorithms and Performance Assessment,” ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, pp. 31–38 (1994).
M. Newman, L. Gillick, Y. Ito, D. McAllaster and B. Peskin, “Speaker Verification through Large Vocabulary Continuous Speech Recognition,” Proc. Int. Conf. Spoken Language Processing, Philadelphia, pp. 2419–2422 (1996).
D. O' Shaugnessy, “Speaker Recognition”, IEEE ASSP Magazine, 3, No. 4, pp. 4–17 (1986).
A. B. Poritz, “Linear Predictive Hidden Markov Models and the Speech Signal”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, S11.5, pp. 1291–1294 (1982).
D. Reynolds, “Speaker Identification and Verification Using Gaussian Mixture Speaker Models,” ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, pp.27–30 (1994).
R. C. Rose and R. A. Reynolds, “Text Independent Speaker Identification Using Automatic Acoustic Segmentation”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, S51.10, pp. 293–296 (1990).
R. C. Rose, E. M. Hofstetter, and D. A. Reynolds, “Integrated Models of Signal and Background with Application to Speaker Identification in Noise”, IEEE Trans. Speech and Audio Processing, Vol. 2, No. 2, pp. 245–257 (1994).
A. E. Rosenberg and F. K. Soong, “Evaluation of a Vector Quantization Talker Recognition System in Text Independent and Text Dependent Modes”, Computer Speech and Language, 22, pp. 143–157 (1987).
A. E. Rosenberg, C.-H. Lee and F. K. Soong, “Sub-Word Unit Talker Verification Using Hidden Markov Models”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, S5.3, pp. 269–272 (1990).
A. E. Rosenberg, C.-H. Lee, F. K. Soong and M. A. McGee, “Experiments in Automatic Talker Verification Using Sub-Word Unit Hidden Markov Models”, Proc. Int. Conf. Spoken Language Processing, 5.4, pp. 141–144 (1990).
A. E. Rosenberg, C.-H. Lee, and S. Gokcen, “Connected Word Talker Verification Using Whole Word Hidden Markov Models”, Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, Toronto, S6.4, pp. 381–384 (1991).
A. E. Rosenberg and F. K. Soong, “Recent Research in Automatic Speaker Recognition”, in Advances in Speech Signal Processing (eds. S. Furui and M. M. Sondhi), Marcel Dekker, New York, pp. 701–737 (1991).
A. E. Rosenberg, “The Use of Cohort Normalized Scores for Speaker Verification,” Proc. Int. Conf. Spoken Language Processing, Banff, Th.sAM.4.2, pp. 599–602 (1992).
M. Savic and S. K. Gupta, “Variable Parameter Speaker Verification System Based on Hidden Markov Modeling”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, S5.7, pp. 281–284 (1990).
A. Setlur and T. Jacobs, “Results of a Speaker Verification Service Trial Using HMM Models,” EUROSPEECH'95, Madrid, pp. 639–642 (1995)
K. Shikano, “Text-Independent Speaker Recognition Experiments Using Codebooks in Vector Quantization”, J. Acoust. Soc. Am. (abstract), Suppl. 1, No. 77, p. S11 (1985).
M.-H. Siu, G. Yu and H. Gish, “An Unsupervised, Sequential Learning Algorithm for the Segmentation of Speech Waveforms with Multiple Speakers,” Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, San Francisco, pp. I-189–192 (1992).
F. K. Soong, A. E. Rosenberg, and B.-H. Juang, “A Vector Quantization Approach to Speaker Recognition”, AT&T Technical Journal, No. 66, pp. 14–26 (1987).
F. K. Soong and A. E. Rosenberg, “On the Use of Instantaneous and Transitional Spectral Information in Speaker Recognition”, IEEE Trans. Acoust. Speech, Signal Processing, Vol. ASSP-36, No. 6, pp. 871–879 (1988).
M. Sugiyama, “Segment Based Text Independent Speaker Recognition,” Proc. Spring Meeting of Acoust. Soc. Japan (in Japanese), pp. 75–76 (1988).
N. Z. Tishby, “On the Application of Mixture AR Hidden Markov Models to Text Independent Speaker Recognition”, IEEE Trans. Acoust. Speech, Signal Processing, Vol. ASSP-30, No. 3, pp. 563–570 (1991).
L. Wilcox, F. Chen, D, Kimber and V. Balasubramanian, “Segmentation of Speech Using Speaker Identification,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. I-161–164 (1994).
Y.-C. Zheng and B.-Z. Yuan, “Text-Dependent Speaker Identification Using Circular Hidden Markov Models”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, S13.3, pp. 580–582 (1988).
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Furui, S. (1997). Recent advances in speaker recognition. In: Bigün, J., Chollet, G., Borgefors, G. (eds) Audio- and Video-based Biometric Person Authentication. AVBPA 1997. Lecture Notes in Computer Science, vol 1206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0016001
Download citation
DOI: https://doi.org/10.1007/BFb0016001
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62660-2
Online ISBN: 978-3-540-68425-1
eBook Packages: Springer Book Archive