Text-Independent Speaker Verification from Mixed Speech of Multiple Speakers via Using Pole Distribution of Speech Signals

Tagomori, Toshiki; Matsuo, Kazuya; Kurogi, Shuichi

doi:10.1007/978-3-030-04224-0_37

Toshiki Tagomori¹⁵,
Kazuya Matsuo¹⁵ &
Shuichi Kurogi¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11306))

Included in the following conference series:

International Conference on Neural Information Processing

1738 Accesses
2 Citations

Abstract

This paper presents a method of text-independent speaker verification from mixed speech of multiple speakers via using pole distribution of speech signals. The poles of speech signal derived from all-pole speech production model are obtained via a neural net called bagging CAN2 (competitive associative net 2) for learning efficient piecewise linear approximation of nonlinear function. We show an analysis that poles of mixed speech are expected to be composed of the poles farther from zeros of ARMA (autoregressive moving average) models of constituent speeches. By means of experiments using unmixed and mixed speeches, we show the distribution of the poles of speeches has two typical regions: one involves poles which change suddenly with the change of the speech from unmixed to mixed, and the other involves poles which change continuously with the change of the mixing weight, which is considered to support the analysis. We execute experiments of speaker verification, and obtain the following properties of recall and precision as measures of verification performance: the recall decreases suddenly with the change of the speech from unmixed to mixed, while the precision does not decreases so much with the decrease of SNR (signal to noise ratio) until below 0 dB. Finally, we show the usefulness of the present method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Campbell, J.P.: Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)
Article Google Scholar
Beigi, H.: Fundamentals of Speaker Recognition. Springer, New York (2011). https://doi.org/10.1007/978-0-387-77592-0
Book MATH Google Scholar
Kurogi S., Ueno T., Sawa M.: A batch learning method for competitive associative net and its application to function approximation. In: Proceedings of SCI 2004, vol. V, pp. 24–28 (2004)
Google Scholar
Kurogi, S., Mineishi, S., Sato, S.: An analysis of speaker recognition using bagging CAN2 and pole distribution of speech signals. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds.) ICONIP 2010, part I. LNCS, vol. 6443, pp. 363–370. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17537-4_45
Chapter Google Scholar
Kurogi S., Nedachi N.: Reproduction and recognition of vowels using piecewise linear predictive coefficients obtained by competitive associative nets. In: Proceedings of SICE- ICCAS2006, CD-ROM (2006)
Google Scholar
Sakashita, S., Takeguchi, S., Matsuo, K., Kurogi, S.: Probabilistic prediction for text-prompted speaker verification capable of accepting spoken words with the same meaning but different pronunciations. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016, part IV. LNCS, vol. 9950, pp. 312–320. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46681-1_38
Chapter Google Scholar
Sakata, K., Sakashita, S., Matsuo, K., Kurogi, S.: Speaker detection in audio stream via probabilistic prediction using generalized GEBI. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016, part IV. LNCS, vol. 9950, pp. 302–311. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46681-1_37
Chapter Google Scholar
Bronkhorst A.W.: The cocktail-party problem revisited: early processing and selection of multi-talker speech. Atten. Percept. Psychophys. (2015). https://doi.org/10.3758/s13414-015-0882-9
Article Google Scholar
Wang, Y., Sun, W.: Multi-speaker recognition in cocktail party problem. In: Proceedings of International Conference on Communications, Signal Processing, and Systems arXiv:1712.01742 (2017)
Bimbot, N., et al.: A tutorial on text-independent speaker verification. J. Appl. Signal Process. 2004, 430–451 (2004)
Google Scholar
Kurogi, S.: Improving generalization performance via out-of-bag estimate using variable size of bags. J. Jpn. Neural Netw. Soc. 16(2), 81–92 (2009)
Google Scholar
Aldhaheri, W.R., Al-Saadi, F.E.: Robust text-independent speaker recognition with short utterance in noisy environment using SVD as a matching measure. J. King Saud Univ. Comput. Inf. Sci. Arch. 17, 25–44 (2004)
Google Scholar
Kurogi, S., Sato, S., Ichimaru, K.: Speaker recognition using pole distribution of speech signals obtained by bagging CAN2. In: Leung, C.S., Lee, M., Chan, J.H. (eds.) ICONIP 2009, part I. LNCS, vol. 5863, pp. 622–629. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10677-4_71
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Kyushu Institute of Technology, Tobata, Kitakyushu, Fukuoka, 804-8550, Japan
Toshiki Tagomori, Kazuya Matsuo & Shuichi Kurogi

Authors

Toshiki Tagomori
View author publications
You can also search for this author in PubMed Google Scholar
Kazuya Matsuo
View author publications
You can also search for this author in PubMed Google Scholar
Shuichi Kurogi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuichi Kurogi .

Editor information

Editors and Affiliations

The Chinese Academy of Sciences, Beijing, China
Long Cheng
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi Sing Leung
Kobe University, Kobe, Japan
Seiichi Ozawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tagomori, T., Matsuo, K., Kurogi, S. (2018). Text-Independent Speaker Verification from Mixed Speech of Multiple Speakers via Using Pole Distribution of Speech Signals. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11306. Springer, Cham. https://doi.org/10.1007/978-3-030-04224-0_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-04224-0_37
Published: 18 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04223-3
Online ISBN: 978-3-030-04224-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics