# Recurrent Neural Network-Based Dictionary Learning for Compressive Speech Sensing

## Abstract

We propose a novel dictionary learning technique for compressive sensing of speech signals based on the recurrent neural network. First, we exploit the recurrent neural network to solve an \(\ell _{0}\)-norm optimization problem based on a sequential linear prediction model for estimating the linear prediction coefficients for voiced and unvoiced speech, respectively. Then, the extracted linear prediction coefficient vectors are clustered through an improved Linde–Buzo–Gray algorithm to generate codebooks for voiced and unvoiced speech, respectively. A dictionary is then constructed for each type of speech by concatenating a union of structured matrices derived from the column vectors in the corresponding codebook. Next, a decision module is designed to determine the appropriate dictionary for the recovery algorithm in the compressive sensing system. Finally, based on the sequential linear prediction model and the proposed dictionary, a sequential recovery algorithm is proposed to further improve the quality of the reconstructed speech. Experimental results show that when compared to the selected state-of-the-art approaches, our proposed method can achieve superior performance in terms of several objective measures including segmental signal-to-noise ratio, perceptual evaluation of speech quality and short-time objective intelligibility under both noise-free and noise-aware conditions.

## Keywords

Recurrent neural network Linear prediction coefficient Clustering Sequential recovery algorithm Compressive sensing## Notes

### Acknowledgements

This work is supported in part by the Natural Sciences and Engineering Research Council of Canada, the National Natural Science Foundation of China (Grant Nos. 61601248, 61771263, 61871241) and the University Natural Science Research Foundation of Jiangsu Province, China (Grant No. 16KJB510037).

## References

- 1.C.C. Aggarwal, C.K. Reddy,
*Data Clustering: Algorithms and Applications*(CRC Press, New York, 2013), pp. 60–65Google Scholar - 2.M. Aharon, M. Elad, A. Bruckstein, K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process.
**54**(11), 4311–4322 (2006)zbMATHGoogle Scholar - 3.N.S. Altman, An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat.
**46**(3), 175–185 (1992)MathSciNetGoogle Scholar - 4.C.L. Bao, H. Ji, Y.H. Quan, Z.W. Shen, Dictionary learning for sparse coding: algorithms and convergence analysis. IEEE Trans. Pattern Anal. Mach. Intell.
**38**(7), 1356–1369 (2016)Google Scholar - 5.E.J. Candes, M.B. Wakin, An introduction to compressive sampling. IEEE Signal Process. Mag.
**25**(2), 20–21 (2008)Google Scholar - 6.E.J. Candes, J.K. Romberg, T. Tao, Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math.
**59**(8), 1207–1223 (2006)MathSciNetzbMATHGoogle Scholar - 7.E.J. Candes, T. Tao, Near-optimal signal recovery from random projections: universal encoding strategies? IEEE Trans. Inf. Theory
**52**(12), 5406–5425 (2006)MathSciNetzbMATHGoogle Scholar - 8.E.J. Candes, J. Romberg, T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory
**52**(2), 489–509 (2006)MathSciNetzbMATHGoogle Scholar - 9.S.S. Chen, D.L. Donoho, M.A. Saunders, Atomic decomposition by basis pursuit. SIAM Rev.
**43**(1), 129–159 (2001)MathSciNetzbMATHGoogle Scholar - 10.D.L. Donoho, Compressed sensing. IEEE Trans. Inf. Theory
**52**(4), 1289–1306 (2006)MathSciNetzbMATHGoogle Scholar - 11.Y.C. Eldar, G. Kutyniok,
*Compressed Sensing: Theory and Applications*(Cambridge University Press, New York, 2012), pp. 20–25Google Scholar - 12.K. Engan, S.O. Aase, J.H. Husoy, Multi-frame compression: theory and design. Signal Process.
**80**(10), 2121–2140 (2000)zbMATHGoogle Scholar - 13.S. Foucart, H. Rauhut,
*A Mathematical Introduction to Compressive Sensing*(Birkhauser, New York, 2013), pp. 40–50zbMATHGoogle Scholar - 14.D. Giacobello, M.G. Christensen, M.N. Murthi, S.H. Jensen, M. Moonen, Sparse linear prediction and its applications to speech processing. IEEE Trans. Audio Speech Lang. Process.
**20**(5), 1610–1644 (2012)Google Scholar - 15.R. Gribonval, M. Nielsen, Sparse representations in unions of bases. IEEE Trans. Inf. Theory
**49**(12), 3320–3325 (2003)MathSciNetzbMATHGoogle Scholar - 16.A. Hosseini, J. Wang, S.M. Hosseini, A recurrent neural network for solving a class of generalized convex optimization problems. Neural Netw.
**44**, 78–86 (2013)zbMATHGoogle Scholar - 17.X.L. Hu, J. Wang, A recurrent neural network for solving a class of general variational inequalities. IEEE Trans. Syst. Man Cybern. B (Cybern.)
**37**(3), 528–539 (2007)Google Scholar - 18.Y. Hu, P.C. Loizou, Evaluation of objective quality measures for speech enhancement. IEEE Trans. Audio Speech Lang. Process.
**16**(1), 229–238 (2008)Google Scholar - 19.J.N. Laska, P.T. Boufounos, M.A. Davenport, R.G. Baraniuk, Democracy in action: quantization, saturation, and compressive sensing. Appl. Comput. Harmon. Anal.
**31**(3), 429–443 (2011)MathSciNetzbMATHGoogle Scholar - 20.S.H. Liu, Y.D. Zhang, T. Shan, R. Tao, Structure-aware Bayesian compressive sensing for frequency-hopping spectrum estimation with missing observations. IEEE Trans. Signal Process.
**66**(8), 2153–2166 (2018)MathSciNetGoogle Scholar - 21.J. Mairal, F. Bach, J. Ponce, G. Sapiro, Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res.
**11**(1), 19–60 (2010)MathSciNetzbMATHGoogle Scholar - 22.D. Needle, J.A. Tropp, CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal.
**26**(3), 301–321 (2009)MathSciNetzbMATHGoogle Scholar - 23.R. Rubinstein, M. Zibulevsky, M. Elad, Double sparsity: learning sparse dictionaries for sparse signal approximation. IEEE Trans. Signal Process.
**58**(3), 1553–1564 (2010)MathSciNetzbMATHGoogle Scholar - 24.S.J. Sengijpta,
*Fundamentals of Statistical Signal Processing: Estimation Theory*(Taylor and Francis Group, Abingdon, 1995), pp. 100–105Google Scholar - 25.P. Sharma, V. Abrol, A.D. Dileep, A.K. Sao, Sparse coding based features for speech units classification. Comput. Speech Lang.
**47**, 333–350 (2018)Google Scholar - 26.C.D. Sigg, T. Dikk, J.M. Buhmann, Speech enhancement using generative dictionary learning. IEEE Trans. Audio Speech Lang. Process.
**20**(6), 1698–1712 (2012)Google Scholar - 27.P. Stoica, R.L. Moses,
*Spectral Analysis of Signals*(Pearson Prentice Hall, Upper Saddle River, 2005), pp. 80–90Google Scholar - 28.L.H. Sun, Z. Yang, Y.Y. Ji, L. Ye, Reconstruction of compressed speech sensing based on overcomplete linear prediction dictionary. Chin. J. Sci. Instrum.
**4**, 733–739 (2012)Google Scholar - 29.C.H. Taal, R.C. Hendriks, R. Heusdens, J. Jensen, An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans. Audio Speech Lang. Process.
**19**(7), 2125–2136 (2011)Google Scholar - 30.D. Tank, J. Hopfield, Simple neural optimization networks: An A/D converter, signal decision circuit, and a linear programming circuit. IEEE Trans. Circuits Syst.
**33**(5), 533–541 (1986)Google Scholar - 31.I. Tosic, P. Frossard, Dictionary learning. IEEE Signal Process. Mag.
**28**(2), 27–38 (2011)zbMATHGoogle Scholar - 32.J.A. Tropp, A.C. Gilbert, Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inf. Theory
**53**(12), 4655–4666 (2007)MathSciNetzbMATHGoogle Scholar - 33.T.H. Vu, V. Monga, Fast low-rank shared dictionary learning for image classification. IEEE Trans. Image Process.
**26**(11), 5160–5175 (2017)MathSciNetGoogle Scholar - 34.J.C. Wang, Y.S. Lee, C.H. Lin, S.F. Wang, C.H. Shih, C.H. Wu, Compressive sensing-based speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process.
**24**(11), 2122–2131 (2016)Google Scholar - 35.D.L. Wu, W.P. Zhu, M. Swamy, The theory of compressive sensing matching pursuit considering time-domain noise with application to speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process.
**22**(3), 682–696 (2014)Google Scholar - 36.Y.S. Xia, M.S. Kamel, A generalized least absolute deviation method for parameter estimation of autoregressive signals. IEEE Trans. Neural Netw.
**19**(1), 107–118 (2008)Google Scholar - 37.Y.S. Xia, M.S. Kamel, H. Leung, A fast algorithm for AR parameter estimation using a novel noise-constrained least-squares method. Neural Netw.
**23**(3), 396–405 (2010)Google Scholar - 38.Y.S. Xia, J. Wang, Low-dimensional recurrent neural network-based Kalman filter for speech enhancement. Neural Netw.
**67**, 131–139 (2015)Google Scholar - 39.Z. Zhang, Y. Xu, J. Yang, X.L. Li, D. Zhang, A survey of sparse representation: algorithms and applications. IEEE Access
**3**, 490–500 (2015)Google Scholar