Abstract
This paper proposes a real-time speech enhancement framework working in presence of multiple sources in reverberated environments. The aim is to automatically reduce the distortions introduced by room reverberation in the available distant speech signals and thus to achieve a significant improvement of speech quality for each speaker. The overall framework is composed by three cooperating blocks, each one fulfilling a specific task: speaker diarization, room-impulse response identification and speech dereverberation. In particular the speaker diarization algorithm is essential to pilot the operations performed in the other two stages in accordance with speakers’ activity in the room. Extensive computer simulations have been performed by using a subset of the AMI database: Obtained results show the effectiveness of the approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Naylor, P., Gaubitch, N.: Speech Dereverberation. Signals and Communication Technology. Springer, Heidelberg (2010)
Rotili, R., De Simone, C., Perelli, A., Cifani, S., Squartini, S.: Joint multichannel blind speech separation and dereverberation: A real-time algorithmic implementation. In: Huang, D.-S., McGinnity, M., Heutte, L., Zhang, X.-P. (eds.) ICIC 2010. CCIS, vol. 93, pp. 85–93. Springer, Heidelberg (2010)
Rotili, R., Principi, E., Squartini, S., Schuller, B.: Real-time speech recognition in a multi-talker reverberated acoustic scenario. In: Proc. of ICIC, August 11-14 (to appear, 2011)
Rotili, R., Principi, E., Squartini, S., Piazza, F.: Real-time joint blind speech separation and dereverberation in presence of overlapping speakers. In: Liu, D., Zhang, H., Polycarpou, M., Alippi, C., He, H. (eds.) ISNN 2011, Part II. LNCS, vol. 6676, pp. 437–446. Springer, Heidelberg (2011)
Araki, S., Hori, T., Fujimoto, M., Watanabe, S., Yoshioka, T., Nakatani, T., Nakamura, A.: Online meeting recognizer with multichannel speaker diarization. In: Proc. of Conf. on Signals, Systems and Computers, pp. 1697–1701 (November 2010)
Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., et al.: The AMI meeting corpus: A pre-announcement. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 28–39. Springer, Heidelberg (2006)
Huang, Y., Benesty, J.: A class of frequency-domain adaptive approaches to blind multichannel identification. IEEE Trans. on Speech and Audio Process. 51(1), 11–24 (2003)
Rotili, R., Cifani, S., Principi, E., Squartini, S., Piazza, F.: A robust iterative inverse filtering approach for speech dereverberation in presence of disturbances. In: Proc. of IEEE APCCAS, pp. 434–437 (December 2008)
Vinyals, O., Friedland, G.: Towards semantic analysis of conversations: A system for the live identification of speakers in meetings. In: Proc. of IEEE International Conference on Semantic Computing, pp. 426–431 (August 2008)
Squartini, S., Ciavattini, E., Lattanzi, A., Zallocco, D., Bettarelli, F., Piazza, F.: NU-Tech: implementing DSP algorithms in a plug-in based software platform for real time audio applications. In: Proc. of 118th Conv. of the AES (2005)
Habets, E.: Room impulse response (RIR) generator (May 2008), http://home.tiscali.nl/ehabets/rirgenerator.html
Wöllmer, M., Marchi, E., Squartini, S., Schuller, B.: Robust multi-stream keyword and non-linguistic vocalization detection for computationally intelligent virtual agents. In: Liu, D., Zhang, H., Polycarpou, M., Alippi, C., He, H. (eds.) ISNN 2011, Part II. LNCS, vol. 6676, pp. 496–505. Springer, Heidelberg (2011)
Hung, H., Huang, Y., Friedland, G., Gatica-Perez, D.: Estimating dominance in multi-party meetings using speaker diarization. IEEE Trans. on Audio, Speech, and Lang. Process. 19(4), 847–860 (2011)
Schuller, B., Batliner, A., Steidl, S., Seppi, D.: Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Communication, 1062–1087 (February 2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rotili, R., Principi, E., Squartini, S., Schuller, B. (2011). A Real-Time Speech Enhancement Framework for Multi-party Meetings. In: Travieso-González, C.M., Alonso-Hernández, J.B. (eds) Advances in Nonlinear Speech Processing. NOLISP 2011. Lecture Notes in Computer Science(), vol 7015. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25020-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-25020-0_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25019-4
Online ISBN: 978-3-642-25020-0
eBook Packages: Computer ScienceComputer Science (R0)