Abstract
The purpose of blind speech deconvolution is to recover both the original speech source and the room impulse response (RIR) from the observed reverberant speech. This can be beneficial for improving speech intelligibility and speech perception. However, the problem is ill-posed, which often requires additional knowledge to solve. In order to address this problem, prior information (such as the sparseness of signal or acoustic channel) is often exploited. In this paper, we propose a joint \(L1-L2\) regularisation based blind speech deconvolution method for a single-input and single-output (SISO) acoustic system with a high level of reverberation, where both the sparsity and density of the room impulse responses (RIR) are considered, by imposing an L1 and L2 norm constraint on their early and late part respectively. By employing an alternating strategy, both the source signal and early part in the RIR can be well reconstructed while the late part of the RIR can be suppressed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Allen, J.B., Berkley, D.A.: Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am. 65(4), 943–950 (1979)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)
Choudhary, S., Mitra, U.: Fundamental limits of blind deconvolution part I: Ambiguity kernel. arXiv preprint arXiv:1411.3810 (2014)
Choudhary, S., Mitra, U.: Fundamental limits of blind deconvolution part II: Sparsity-ambiguity trade-offs. arXiv preprint arXiv:1503.03184 (2015)
Chouzenoux, E., Pesquet, J.C., Repetti, A.: A block coordinate variable metric forward-backward algorithm. J. Glob. Optim. 66(3), 457–485 (2016)
Combettes, P.L., Pesquet, J.C.: Proximal splitting methods in signal processing. In: Bauschke, H., Burachik, R., Combettes, P., Elser, V., Luke, D., Wolkowicz, H. (eds.) Fixed-Point Algorithms for Inverse Problems in Science and Engineering. SOIA, vol. 49, pp. 185–212. Springer, New York (2011). https://doi.org/10.1007/978-1-4419-9569-8_10
Grant, M., Boyd, S., Grant, M., Boyd, S., Blondel, V., Boyd, S., Kimura, H.: CVX: Matlab software for disciplined convex programming, version 2.1. In: Recent Advances in Learning and Control, pp. 95–110 (2008)
Guan, J., Wang, X., Wang, W., Huang, L.: Sparse blind speech deconvolution with dynamic range regularization and indicator function. Circ. Syst. Sig. Process. 36(10), 4145–4160 (2017)
Hu, Y., Kokkinakis, K.: Effects of early and late reflections on intelligibility of reverberated speech by cochlear implant listeners. J. Acoust. Soc. Am. 135(1), EL22–EL28 (2014)
Kim, S.J., Koh, K., Lustig, M., Boyd, S., Gorinevsky, D.: An interior-point method for large-scale l1-regularized least squares. IEEE J. Sel. Top. Sig. Process. 1(4), 606–617 (2007)
Klatte, M., Lachmann, T., Meis, M., et al.: Effects of noise and reverberation on speech perception and listening comprehension of children and adults in a classroom-like setting. Noise Health 12(49), 270 (2010)
Mosayyebpour, S., Esmaeili, M., Gulliver, T.A.: Single-microphone early and late reverberation suppression in noisy speech. IEEE Trans. Audio, Speech, Lang. Process. 21(2), 322–335 (2013)
Nakatani, T., Miyoshi, M., Kinoshita, K.: One microphone blind dereverberation based on quasi-periodicity of speech signals. In: NIPS, pp. 1417–1424 (2003)
Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 127–239 (2014)
Schuller, B.: Affective speaker state analysis in the presence of reverberation. Int. J. Speech Technol. 14(2), 77–87 (2011)
Zhao, S., Gao, Y., Jiang, X., Yao, H., Chua, T.S., Sun, X.: Exploring principles-of-art features for image emotion recognition. In: ACM International Conference on Multimedia, pp. 47–56 (2014)
Zhao, S., Yao, H., Gao, Y., Ji, R.R., Ding, G.: Continuous probability distribution prediction of image emotions via multi-task shared sparse regression. IEEE Transactions on Multimedia PP(99), 1 (2016)
Zhao, S., Yao, H., Jiang, X., Sun, X.: Predicting discrete probability distribution of image emotions. In: IEEE International Conference on Image Processing, pp. 2459–2463. IEEE (2015)
Acknowledgements
The work was conducted when J. Guan visited the Centre for Vision, Speech and Signal Processing (CVSSP), the University of Surrey. This work was supported in part by International Exchange and Cooperation Foundation of Shenzhen City, China (No. GJHZ20150312114149569).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Guan, J., Wang, X., Xie, Z., Qi, S., Wang, W. (2018). Joint \(L1-L2\) Regularisation for Blind Speech Deconvolution. In: Zeng, B., Huang, Q., El Saddik, A., Li, H., Jiang, S., Fan, X. (eds) Advances in Multimedia Information Processing – PCM 2017. PCM 2017. Lecture Notes in Computer Science(), vol 10735. Springer, Cham. https://doi.org/10.1007/978-3-319-77380-3_80
Download citation
DOI: https://doi.org/10.1007/978-3-319-77380-3_80
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77379-7
Online ISBN: 978-3-319-77380-3
eBook Packages: Computer ScienceComputer Science (R0)