Speaker Detection in Audio Stream via Probabilistic Prediction Using Generalized GEBI

  • Koki Sakata
  • Shota Sakashita
  • Kazuya Matsuo
  • Shuichi KurogiEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9950)


This paper presents a method of speaker detection using probabilistic prediction for avoiding the tuning of thresholds to detect a speaker in an audio stream. We introduce g-GEBI (generalized GEBI) as a generalization of BI (Bayesian Inference) and GEBI (Gibbs-distribution-based Extended BI) to execute iterative detection of a speaker in audio stream uttered by more than one speaker. Then, we show a method of probabilistic prediction in multiclass classification to classify the results of speaker detection. By means of numerical experiments using recorded real speech data, we examine the properties and the effectiveness of the present method. Especially, we show that g-GEBI and g-BI (generalized BI) are more effective than the conventional BI and GEBI in incremental speaker detection task.


Probabilistic prediction Speaker detection Generalized Gibbs-distribution-based extended Bayesian inference 


  1. 1.
    Beigi, H.: Fundamentals of speaker recognition. Springer-Verlag New York Inc. (2011)Google Scholar
  2. 2.
    Kurogi, S., Sakashita, S., Takeguchi, S., Ueki, T., Matsuo, K.: Probabilistic prediction in multiclass classification derived for flexible text-prompted speaker verification. In: Arik, S., Huang, T., Lai, W.K., Liu, Q. (eds.) ICONIP 2015. LNCS, vol. 9489, pp. 216–225. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-26532-2_24 CrossRefGoogle Scholar
  3. 3.
    Kurogi, S., Ueki, T., Mizobe, Y., Nishida, T.: Text-prompted multistep speaker verification using Gibbs-distribution-based extended Bayesian inference for reducing verification errors. In: Lee, M., Hirose, A., Hou, Z.-G., Kil, R.M. (eds.) ICONIP 2013. LNCS, vol. 8228, pp. 184–192. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-42051-1_24 CrossRefGoogle Scholar
  4. 4.
    Kurogi, S., Ueki, T., Takeguchi, S., Mizobe, Y.: Properties of text-prompted multistep speaker verification using Gibbs-distribution-based extended Bayesian inference for rejecting unregistered speakers. In: Loo, C.K., Yap, K.S., Wong, K.W., Teoh, A., Huang, K. (eds.) ICONIP 2014. LNCS, vol. 8835, pp. 35–43. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-12640-1_5 Google Scholar
  5. 5.
    Slingo, J., Palmer, T.: Uncertainty in weather and climate prediction. Phil. Trans. R. Soc. A 369, 4751–4767 (2011)CrossRefzbMATHGoogle Scholar
  6. 6.
    Kurogi, S., Ueno, T., Sawa, M.: A batch learning method for competitive associative net and its application to function approximation. In: Proceedings of the SCI 2004, vol. V, pp. 24–28 (2004)Google Scholar
  7. 7.
    Kurogi, S., Mineishi, S., Sato, S.: An analysis of speaker recognition using bagging CAN2 and pole distribution of speech signals. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds.) ICONIP 2010. LNCS, vol. 6443, pp. 363–370. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-17537-4_45 CrossRefGoogle Scholar
  8. 8.
    Campbell, J.P.: Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Koki Sakata
    • 1
  • Shota Sakashita
    • 1
  • Kazuya Matsuo
    • 1
  • Shuichi Kurogi
    • 1
    Email author
  1. 1.Kyushu Institute of TechnologyKitakyushuJapan

Personalised recommendations