Skip to main content

A Real-Time Speech Enhancement Framework for Multi-party Meetings

  • Conference paper
Advances in Nonlinear Speech Processing (NOLISP 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7015))

Included in the following conference series:

Abstract

This paper proposes a real-time speech enhancement framework working in presence of multiple sources in reverberated environments. The aim is to automatically reduce the distortions introduced by room reverberation in the available distant speech signals and thus to achieve a significant improvement of speech quality for each speaker. The overall framework is composed by three cooperating blocks, each one fulfilling a specific task: speaker diarization, room-impulse response identification and speech dereverberation. In particular the speaker diarization algorithm is essential to pilot the operations performed in the other two stages in accordance with speakers’ activity in the room. Extensive computer simulations have been performed by using a subset of the AMI database: Obtained results show the effectiveness of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Naylor, P., Gaubitch, N.: Speech Dereverberation. Signals and Communication Technology. Springer, Heidelberg (2010)

    Book  MATH  Google Scholar 

  2. Rotili, R., De Simone, C., Perelli, A., Cifani, S., Squartini, S.: Joint multichannel blind speech separation and dereverberation: A real-time algorithmic implementation. In: Huang, D.-S., McGinnity, M., Heutte, L., Zhang, X.-P. (eds.) ICIC 2010. CCIS, vol. 93, pp. 85–93. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  3. Rotili, R., Principi, E., Squartini, S., Schuller, B.: Real-time speech recognition in a multi-talker reverberated acoustic scenario. In: Proc. of ICIC, August 11-14 (to appear, 2011)

    Google Scholar 

  4. Rotili, R., Principi, E., Squartini, S., Piazza, F.: Real-time joint blind speech separation and dereverberation in presence of overlapping speakers. In: Liu, D., Zhang, H., Polycarpou, M., Alippi, C., He, H. (eds.) ISNN 2011, Part II. LNCS, vol. 6676, pp. 437–446. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  5. Araki, S., Hori, T., Fujimoto, M., Watanabe, S., Yoshioka, T., Nakatani, T., Nakamura, A.: Online meeting recognizer with multichannel speaker diarization. In: Proc. of Conf. on Signals, Systems and Computers, pp. 1697–1701 (November 2010)

    Google Scholar 

  6. Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., et al.: The AMI meeting corpus: A pre-announcement. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 28–39. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  7. Huang, Y., Benesty, J.: A class of frequency-domain adaptive approaches to blind multichannel identification. IEEE Trans. on Speech and Audio Process. 51(1), 11–24 (2003)

    MathSciNet  Google Scholar 

  8. Rotili, R., Cifani, S., Principi, E., Squartini, S., Piazza, F.: A robust iterative inverse filtering approach for speech dereverberation in presence of disturbances. In: Proc. of IEEE APCCAS, pp. 434–437 (December 2008)

    Google Scholar 

  9. Vinyals, O., Friedland, G.: Towards semantic analysis of conversations: A system for the live identification of speakers in meetings. In: Proc. of IEEE International Conference on Semantic Computing, pp. 426–431 (August 2008)

    Google Scholar 

  10. Squartini, S., Ciavattini, E., Lattanzi, A., Zallocco, D., Bettarelli, F., Piazza, F.: NU-Tech: implementing DSP algorithms in a plug-in based software platform for real time audio applications. In: Proc. of 118th Conv. of the AES (2005)

    Google Scholar 

  11. Habets, E.: Room impulse response (RIR) generator (May 2008), http://home.tiscali.nl/ehabets/rirgenerator.html

  12. Wöllmer, M., Marchi, E., Squartini, S., Schuller, B.: Robust multi-stream keyword and non-linguistic vocalization detection for computationally intelligent virtual agents. In: Liu, D., Zhang, H., Polycarpou, M., Alippi, C., He, H. (eds.) ISNN 2011, Part II. LNCS, vol. 6676, pp. 496–505. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  13. Hung, H., Huang, Y., Friedland, G., Gatica-Perez, D.: Estimating dominance in multi-party meetings using speaker diarization. IEEE Trans. on Audio, Speech, and Lang. Process. 19(4), 847–860 (2011)

    Article  Google Scholar 

  14. Schuller, B., Batliner, A., Steidl, S., Seppi, D.: Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Communication, 1062–1087 (February 2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rotili, R., Principi, E., Squartini, S., Schuller, B. (2011). A Real-Time Speech Enhancement Framework for Multi-party Meetings. In: Travieso-González, C.M., Alonso-Hernández, J.B. (eds) Advances in Nonlinear Speech Processing. NOLISP 2011. Lecture Notes in Computer Science(), vol 7015. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25020-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25020-0_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25019-4

  • Online ISBN: 978-3-642-25020-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics