Advertisement

Text-Independent Speaker Verification from Mixed Speech of Multiple Speakers via Using Pole Distribution of Speech Signals

  • Toshiki Tagomori
  • Kazuya Matsuo
  • Shuichi KurogiEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11306)

Abstract

This paper presents a method of text-independent speaker verification from mixed speech of multiple speakers via using pole distribution of speech signals. The poles of speech signal derived from all-pole speech production model are obtained via a neural net called bagging CAN2 (competitive associative net 2) for learning efficient piecewise linear approximation of nonlinear function. We show an analysis that poles of mixed speech are expected to be composed of the poles farther from zeros of ARMA (autoregressive moving average) models of constituent speeches. By means of experiments using unmixed and mixed speeches, we show the distribution of the poles of speeches has two typical regions: one involves poles which change suddenly with the change of the speech from unmixed to mixed, and the other involves poles which change continuously with the change of the mixing weight, which is considered to support the analysis. We execute experiments of speaker verification, and obtain the following properties of recall and precision as measures of verification performance: the recall decreases suddenly with the change of the speech from unmixed to mixed, while the precision does not decreases so much with the decrease of SNR (signal to noise ratio) until below 0 dB. Finally, we show the usefulness of the present method.

Keywords

Text-independent speaker verification Mixed speech of multiple speakers Pole distribution of speech signals 

References

  1. 1.
    Campbell, J.P.: Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)CrossRefGoogle Scholar
  2. 2.
    Beigi, H.: Fundamentals of Speaker Recognition. Springer, New York (2011).  https://doi.org/10.1007/978-0-387-77592-0CrossRefzbMATHGoogle Scholar
  3. 3.
    Kurogi S., Ueno T., Sawa M.: A batch learning method for competitive associative net and its application to function approximation. In: Proceedings of SCI 2004, vol. V, pp. 24–28 (2004)Google Scholar
  4. 4.
    Kurogi, S., Mineishi, S., Sato, S.: An analysis of speaker recognition using bagging CAN2 and pole distribution of speech signals. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds.) ICONIP 2010, part I. LNCS, vol. 6443, pp. 363–370. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-17537-4_45CrossRefGoogle Scholar
  5. 5.
    Kurogi S., Nedachi N.: Reproduction and recognition of vowels using piecewise linear predictive coefficients obtained by competitive associative nets. In: Proceedings of SICE- ICCAS2006, CD-ROM (2006)Google Scholar
  6. 6.
    Sakashita, S., Takeguchi, S., Matsuo, K., Kurogi, S.: Probabilistic prediction for text-prompted speaker verification capable of accepting spoken words with the same meaning but different pronunciations. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016, part IV. LNCS, vol. 9950, pp. 312–320. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46681-1_38CrossRefGoogle Scholar
  7. 7.
    Sakata, K., Sakashita, S., Matsuo, K., Kurogi, S.: Speaker detection in audio stream via probabilistic prediction using generalized GEBI. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016, part IV. LNCS, vol. 9950, pp. 302–311. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46681-1_37CrossRefGoogle Scholar
  8. 8.
    Bronkhorst A.W.: The cocktail-party problem revisited: early processing and selection of multi-talker speech. Atten. Percept. Psychophys. (2015).  https://doi.org/10.3758/s13414-015-0882-9CrossRefGoogle Scholar
  9. 9.
    Wang, Y., Sun, W.: Multi-speaker recognition in cocktail party problem. In: Proceedings of International Conference on Communications, Signal Processing, and Systems arXiv:1712.01742 (2017)
  10. 10.
    Bimbot, N., et al.: A tutorial on text-independent speaker verification. J. Appl. Signal Process. 2004, 430–451 (2004)Google Scholar
  11. 11.
    Kurogi, S.: Improving generalization performance via out-of-bag estimate using variable size of bags. J. Jpn. Neural Netw. Soc. 16(2), 81–92 (2009)Google Scholar
  12. 12.
    Aldhaheri, W.R., Al-Saadi, F.E.: Robust text-independent speaker recognition with short utterance in noisy environment using SVD as a matching measure. J. King Saud Univ. Comput. Inf. Sci. Arch. 17, 25–44 (2004)Google Scholar
  13. 13.
    Kurogi, S., Sato, S., Ichimaru, K.: Speaker recognition using pole distribution of speech signals obtained by bagging CAN2. In: Leung, C.S., Lee, M., Chan, J.H. (eds.) ICONIP 2009, part I. LNCS, vol. 5863, pp. 622–629. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-10677-4_71CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Toshiki Tagomori
    • 1
  • Kazuya Matsuo
    • 1
  • Shuichi Kurogi
    • 1
    Email author
  1. 1.Kyushu Institute of TechnologyKitakyushuJapan

Personalised recommendations