Speaker Detection in Audio Stream via Probabilistic Prediction Using Generalized GEBI

Sakata, Koki; Sakashita, Shota; Matsuo, Kazuya; Kurogi, Shuichi

doi:10.1007/978-3-319-46681-1_37

Koki Sakata¹⁹,
Shota Sakashita¹⁹,
Kazuya Matsuo¹⁹ &
…
Shuichi Kurogi¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9950))

Included in the following conference series:

International Conference on Neural Information Processing

2404 Accesses
3 Citations

Abstract

This paper presents a method of speaker detection using probabilistic prediction for avoiding the tuning of thresholds to detect a speaker in an audio stream. We introduce g-GEBI (generalized GEBI) as a generalization of BI (Bayesian Inference) and GEBI (Gibbs-distribution-based Extended BI) to execute iterative detection of a speaker in audio stream uttered by more than one speaker. Then, we show a method of probabilistic prediction in multiclass classification to classify the results of speaker detection. By means of numerical experiments using recorded real speech data, we examine the properties and the effectiveness of the present method. Especially, we show that g-GEBI and g-BI (generalized BI) are more effective than the conventional BI and GEBI in incremental speaker detection task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Beigi, H.: Fundamentals of speaker recognition. Springer-Verlag New York Inc. (2011)
Google Scholar
Kurogi, S., Sakashita, S., Takeguchi, S., Ueki, T., Matsuo, K.: Probabilistic prediction in multiclass classification derived for flexible text-prompted speaker verification. In: Arik, S., Huang, T., Lai, W.K., Liu, Q. (eds.) ICONIP 2015. LNCS, vol. 9489, pp. 216–225. Springer, Heidelberg (2015). doi:10.1007/978-3-319-26532-2_24
Chapter Google Scholar
Kurogi, S., Ueki, T., Mizobe, Y., Nishida, T.: Text-prompted multistep speaker verification using Gibbs-distribution-based extended Bayesian inference for reducing verification errors. In: Lee, M., Hirose, A., Hou, Z.-G., Kil, R.M. (eds.) ICONIP 2013. LNCS, vol. 8228, pp. 184–192. Springer, Heidelberg (2013). doi:10.1007/978-3-642-42051-1_24
Chapter Google Scholar
Kurogi, S., Ueki, T., Takeguchi, S., Mizobe, Y.: Properties of text-prompted multistep speaker verification using Gibbs-distribution-based extended Bayesian inference for rejecting unregistered speakers. In: Loo, C.K., Yap, K.S., Wong, K.W., Teoh, A., Huang, K. (eds.) ICONIP 2014. LNCS, vol. 8835, pp. 35–43. Springer, Heidelberg (2014). doi:10.1007/978-3-319-12640-1_5
Google Scholar
Slingo, J., Palmer, T.: Uncertainty in weather and climate prediction. Phil. Trans. R. Soc. A 369, 4751–4767 (2011)
Article MATH Google Scholar
Kurogi, S., Ueno, T., Sawa, M.: A batch learning method for competitive associative net and its application to function approximation. In: Proceedings of the SCI 2004, vol. V, pp. 24–28 (2004)
Google Scholar
Kurogi, S., Mineishi, S., Sato, S.: An analysis of speaker recognition using bagging CAN2 and pole distribution of speech signals. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds.) ICONIP 2010. LNCS, vol. 6443, pp. 363–370. Springer, Heidelberg (2010). doi:10.1007/978-3-642-17537-4_45
Chapter Google Scholar
Campbell, J.P.: Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Kyushu Institute of Technology, Tobata, Kitakyushu, Fukuoka, 804-8550, Japan
Koki Sakata, Shota Sakashita, Kazuya Matsuo & Shuichi Kurogi

Authors

Koki Sakata
View author publications
You can also search for this author in PubMed Google Scholar
Shota Sakashita
View author publications
You can also search for this author in PubMed Google Scholar
Kazuya Matsuo
View author publications
You can also search for this author in PubMed Google Scholar
Shuichi Kurogi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuichi Kurogi .

Editor information

Editors and Affiliations

The University of Tokyo , Tokyo, Japan
Akira Hirose
Kobe University , Kobe, Japan
Seiichi Ozawa
Okinawa Institute of Science and Technology Graduate University, Onna, Japan
Kenji Doya
Nara Institute of Science and Technology , Ikoma, Japan
Kazushi Ikeda
Kyungpook National University , Daegu, Korea (Republic of)
Minho Lee
Chinese Academy of Sciences , Beijing, China
Derong Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sakata, K., Sakashita, S., Matsuo, K., Kurogi, S. (2016). Speaker Detection in Audio Stream via Probabilistic Prediction Using Generalized GEBI. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9950. Springer, Cham. https://doi.org/10.1007/978-3-319-46681-1_37

Download citation

DOI: https://doi.org/10.1007/978-3-319-46681-1_37
Published: 30 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46680-4
Online ISBN: 978-3-319-46681-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics