A Real-Time Speech Enhancement Framework for Multi-party Meetings

Rotili, Rudy; Principi, Emanuele; Squartini, Stefano; Schuller, Björn

doi:10.1007/978-3-642-25020-0_11

Rudy Rotili²⁰,
Emanuele Principi²⁰,
Stefano Squartini²⁰ &
…
Björn Schuller²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7015))

Included in the following conference series:

International Conference on Nonlinear Speech Processing

955 Accesses
1 Citations

Abstract

This paper proposes a real-time speech enhancement framework working in presence of multiple sources in reverberated environments. The aim is to automatically reduce the distortions introduced by room reverberation in the available distant speech signals and thus to achieve a significant improvement of speech quality for each speaker. The overall framework is composed by three cooperating blocks, each one fulfilling a specific task: speaker diarization, room-impulse response identification and speech dereverberation. In particular the speaker diarization algorithm is essential to pilot the operations performed in the other two stages in accordance with speakers’ activity in the room. Extensive computer simulations have been performed by using a subset of the AMI database: Obtained results show the effectiveness of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Naylor, P., Gaubitch, N.: Speech Dereverberation. Signals and Communication Technology. Springer, Heidelberg (2010)
Book MATH Google Scholar
Rotili, R., De Simone, C., Perelli, A., Cifani, S., Squartini, S.: Joint multichannel blind speech separation and dereverberation: A real-time algorithmic implementation. In: Huang, D.-S., McGinnity, M., Heutte, L., Zhang, X.-P. (eds.) ICIC 2010. CCIS, vol. 93, pp. 85–93. Springer, Heidelberg (2010)
Chapter Google Scholar
Rotili, R., Principi, E., Squartini, S., Schuller, B.: Real-time speech recognition in a multi-talker reverberated acoustic scenario. In: Proc. of ICIC, August 11-14 (to appear, 2011)
Google Scholar
Rotili, R., Principi, E., Squartini, S., Piazza, F.: Real-time joint blind speech separation and dereverberation in presence of overlapping speakers. In: Liu, D., Zhang, H., Polycarpou, M., Alippi, C., He, H. (eds.) ISNN 2011, Part II. LNCS, vol. 6676, pp. 437–446. Springer, Heidelberg (2011)
Chapter Google Scholar
Araki, S., Hori, T., Fujimoto, M., Watanabe, S., Yoshioka, T., Nakatani, T., Nakamura, A.: Online meeting recognizer with multichannel speaker diarization. In: Proc. of Conf. on Signals, Systems and Computers, pp. 1697–1701 (November 2010)
Google Scholar
Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., et al.: The AMI meeting corpus: A pre-announcement. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 28–39. Springer, Heidelberg (2006)
Chapter Google Scholar
Huang, Y., Benesty, J.: A class of frequency-domain adaptive approaches to blind multichannel identification. IEEE Trans. on Speech and Audio Process. 51(1), 11–24 (2003)
MathSciNet Google Scholar
Rotili, R., Cifani, S., Principi, E., Squartini, S., Piazza, F.: A robust iterative inverse filtering approach for speech dereverberation in presence of disturbances. In: Proc. of IEEE APCCAS, pp. 434–437 (December 2008)
Google Scholar
Vinyals, O., Friedland, G.: Towards semantic analysis of conversations: A system for the live identification of speakers in meetings. In: Proc. of IEEE International Conference on Semantic Computing, pp. 426–431 (August 2008)
Google Scholar
Squartini, S., Ciavattini, E., Lattanzi, A., Zallocco, D., Bettarelli, F., Piazza, F.: NU-Tech: implementing DSP algorithms in a plug-in based software platform for real time audio applications. In: Proc. of 118th Conv. of the AES (2005)
Google Scholar
Habets, E.: Room impulse response (RIR) generator (May 2008), http://home.tiscali.nl/ehabets/rirgenerator.html
Wöllmer, M., Marchi, E., Squartini, S., Schuller, B.: Robust multi-stream keyword and non-linguistic vocalization detection for computationally intelligent virtual agents. In: Liu, D., Zhang, H., Polycarpou, M., Alippi, C., He, H. (eds.) ISNN 2011, Part II. LNCS, vol. 6676, pp. 496–505. Springer, Heidelberg (2011)
Chapter Google Scholar
Hung, H., Huang, Y., Friedland, G., Gatica-Perez, D.: Estimating dominance in multi-party meetings using speaker diarization. IEEE Trans. on Audio, Speech, and Lang. Process. 19(4), 847–860 (2011)
Article Google Scholar
Schuller, B., Batliner, A., Steidl, S., Seppi, D.: Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Communication, 1062–1087 (February 2011)
Google Scholar

Download references

Author information

Authors and Affiliations

A3LAB, Department of Biomedics, Electronics and Telecommunications, Università Politecnica delle Marche, Via Brecce Bianche 1, 60131, Ancona, Italy
Rudy Rotili, Emanuele Principi & Stefano Squartini
Institute for Human-Machine Communication, Technische Universität München, Arcisstr. 21, 80333, Munich, Germany
Björn Schuller

Authors

Rudy Rotili
View author publications
You can also search for this author in PubMed Google Scholar
Emanuele Principi
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Squartini
View author publications
You can also search for this author in PubMed Google Scholar
Björn Schuller
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Technological Development and Innovation in Communications (IDETIC), Signals and Communications Department, University of Las Palmas de Gran Canaria, Campus de Tafira, s/n, 35017, Las Palmas de Gran Canaria, Spain
Carlos M. Travieso-González & Jesús B. Alonso-Hernández &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rotili, R., Principi, E., Squartini, S., Schuller, B. (2011). A Real-Time Speech Enhancement Framework for Multi-party Meetings. In: Travieso-González, C.M., Alonso-Hernández, J.B. (eds) Advances in Nonlinear Speech Processing. NOLISP 2011. Lecture Notes in Computer Science(), vol 7015. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25020-0_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-25020-0_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25019-4
Online ISBN: 978-3-642-25020-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics