Abstract
This paper presents a method of text-independent speaker verification from mixed speech of multiple speakers via using pole distribution of speech signals. The poles of speech signal derived from all-pole speech production model are obtained via a neural net called bagging CAN2 (competitive associative net 2) for learning efficient piecewise linear approximation of nonlinear function. We show an analysis that poles of mixed speech are expected to be composed of the poles farther from zeros of ARMA (autoregressive moving average) models of constituent speeches. By means of experiments using unmixed and mixed speeches, we show the distribution of the poles of speeches has two typical regions: one involves poles which change suddenly with the change of the speech from unmixed to mixed, and the other involves poles which change continuously with the change of the mixing weight, which is considered to support the analysis. We execute experiments of speaker verification, and obtain the following properties of recall and precision as measures of verification performance: the recall decreases suddenly with the change of the speech from unmixed to mixed, while the precision does not decreases so much with the decrease of SNR (signal to noise ratio) until below 0 dB. Finally, we show the usefulness of the present method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Campbell, J.P.: Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)
Beigi, H.: Fundamentals of Speaker Recognition. Springer, New York (2011). https://doi.org/10.1007/978-0-387-77592-0
Kurogi S., Ueno T., Sawa M.: A batch learning method for competitive associative net and its application to function approximation. In: Proceedings of SCI 2004, vol. V, pp. 24–28 (2004)
Kurogi, S., Mineishi, S., Sato, S.: An analysis of speaker recognition using bagging CAN2 and pole distribution of speech signals. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds.) ICONIP 2010, part I. LNCS, vol. 6443, pp. 363–370. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17537-4_45
Kurogi S., Nedachi N.: Reproduction and recognition of vowels using piecewise linear predictive coefficients obtained by competitive associative nets. In: Proceedings of SICE- ICCAS2006, CD-ROM (2006)
Sakashita, S., Takeguchi, S., Matsuo, K., Kurogi, S.: Probabilistic prediction for text-prompted speaker verification capable of accepting spoken words with the same meaning but different pronunciations. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016, part IV. LNCS, vol. 9950, pp. 312–320. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46681-1_38
Sakata, K., Sakashita, S., Matsuo, K., Kurogi, S.: Speaker detection in audio stream via probabilistic prediction using generalized GEBI. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016, part IV. LNCS, vol. 9950, pp. 302–311. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46681-1_37
Bronkhorst A.W.: The cocktail-party problem revisited: early processing and selection of multi-talker speech. Atten. Percept. Psychophys. (2015). https://doi.org/10.3758/s13414-015-0882-9
Wang, Y., Sun, W.: Multi-speaker recognition in cocktail party problem. In: Proceedings of International Conference on Communications, Signal Processing, and Systems arXiv:1712.01742 (2017)
Bimbot, N., et al.: A tutorial on text-independent speaker verification. J. Appl. Signal Process. 2004, 430–451 (2004)
Kurogi, S.: Improving generalization performance via out-of-bag estimate using variable size of bags. J. Jpn. Neural Netw. Soc. 16(2), 81–92 (2009)
Aldhaheri, W.R., Al-Saadi, F.E.: Robust text-independent speaker recognition with short utterance in noisy environment using SVD as a matching measure. J. King Saud Univ. Comput. Inf. Sci. Arch. 17, 25–44 (2004)
Kurogi, S., Sato, S., Ichimaru, K.: Speaker recognition using pole distribution of speech signals obtained by bagging CAN2. In: Leung, C.S., Lee, M., Chan, J.H. (eds.) ICONIP 2009, part I. LNCS, vol. 5863, pp. 622–629. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10677-4_71
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Tagomori, T., Matsuo, K., Kurogi, S. (2018). Text-Independent Speaker Verification from Mixed Speech of Multiple Speakers via Using Pole Distribution of Speech Signals. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11306. Springer, Cham. https://doi.org/10.1007/978-3-030-04224-0_37
Download citation
DOI: https://doi.org/10.1007/978-3-030-04224-0_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04223-3
Online ISBN: 978-3-030-04224-0
eBook Packages: Computer ScienceComputer Science (R0)