Advertisement

A Primer on Deep Learning Architectures and Applications in Speech Processing

  • Tokunbo OgunfunmiEmail author
  • Ravi Prakash Ramachandran
  • Roberto Togneri
  • Yuanjun Zhao
  • Xianjun Xia
Article
  • 25 Downloads

Abstract

In the recent past years, deep-learning-based machine learning methods have demonstrated remarkable success for a wide range of learning tasks in multiple domains. They are suitable for complex classification and regression problems in applications such as computer vision, speech recognition and other pattern analysis branches. The purpose of this article is to contribute a timely review and introduction of state-of-the-art and popular discriminative DNN, CNN and RNN deep learning techniques, the basic framework and algorithms, hardware implementations, applications in speech, and the overall benefits of deep learning.

Keywords

Deep learning Signal processing Discriminative algorithms 

Notes

References

  1. 1.
    O. Abdel-Hamid, A.R. Mohamed, H. Jiang, L. Deng, G. Penn, D. Yu, Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(10), 1533–1545 (2014)CrossRefGoogle Scholar
  2. 2.
    F. Abuzaid., Optimizing cpu performance for convolutional neural networks. Online. Available: http://cs231n.stanford.edu/reports/2015/pdfs/fabuzaid final report.pdf
  3. 3.
    M. Alwani, H. Chen, M. Ferdman, P. Milder, Fused-layer CNN accelerators, in The 49th Annual IEEE/ACM International Symposium on Microarchitecture (IEEE Press, New Jersey, 2016)Google Scholar
  4. 4.
    P. Angelov, A. Sperduti, Challenges in deep learning, in Proceedings of ESANN (2016), pp. 489–495Google Scholar
  5. 5.
    A. Ansari, K. Gunnam, T. Ogunfunmi, An efficient reconfigurable hardware accelerator for convolutional neural networks, in 51st Asilomar Conference on Signals, Systems, and Computers (IEEE, 2017), pp. 1337–1341Google Scholar
  6. 6.
    A. Ansari, T. Ogunfunmi, An Efficient Network Agnostic Architecture Design and Analysis for Convolutional Neural Networks. submitted to the IEEE JETCAS, Special Issue on Customized sub-systems and circuits for deep learning (2019)Google Scholar
  7. 7.
    A. Bhandare, M. Bhide, P. Gokhale, R. Chandavarkar, Applications of convolutional neural networks. Int. J. Comput. Sci. Inf. Technol. 7, 2206–2215 (2016)Google Scholar
  8. 8.
    C.M. Bishop, Pattern Recognition and Machine Learning (Springer, Berlin, 2006)zbMATHGoogle Scholar
  9. 9.
    S. Böck, M. Schedl, Polyphonic piano note transcription with recurrent neural networks, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2012), pp. 121–124Google Scholar
  10. 10.
    W.M. Campbell, K.T. Assaleh, C.C. Broun, Speaker recognition with polynomial classifiers. IEEE Trans. Speech Audio Process. 10(4), 205–212 (2002)CrossRefGoogle Scholar
  11. 11.
    S. Chakradhar, M. Sankaradas, V. Jakkula, S. Cadambi, A dynamically configurable coprocessor for convolutional neural networks. ACM SIGARCH Comput. Archit. News 38(3), 247–257 (2010)CrossRefGoogle Scholar
  12. 12.
    J.H. Chen, A. Gersho, Adaptive postfiltering for quality enhancement of coded speech. IEEE Trans. Speech Audio Process. 3(1), 59–71 (1995)CrossRefGoogle Scholar
  13. 13.
    Y.H. Chen, J. Emer, V. Sze, Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. ACM SIGARCH Comput. Archit. News 44(3), 367–379 (2016)CrossRefGoogle Scholar
  14. 14.
    Y.H. Chen, T. Krishna, J.S. Emer, V. Sze, Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52(1), 127–138 (2017)CrossRefGoogle Scholar
  15. 15.
    S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, E. Shelhamer, cudnn: efficient primitives for deep learning (2014). arXiv preprint arXiv:1410.0759
  16. 16.
    K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014), pp. 1724–1734Google Scholar
  17. 17.
    J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling (2014). arXiv preprint. arXiv:1412.3555
  18. 18.
    D.A. Clevert, T. Unterthiner, S. Hochreiter, Fast and accurate deep network learning by exponential linear units (ELUS). arXiv preprint arXiv:1511.07289
  19. 19.
    J. Cong, Z. Fang, M. Lo, H. Wang, J. Xu, S. Zhang, Understanding performance differences of FPGAs and GPUs, in 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). (IEEE, 2018), pp. 93–96Google Scholar
  20. 20.
    N. Dehak, P.J. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)CrossRefGoogle Scholar
  21. 21.
    L. Deng, N. Jaitly, Deep discriminative and generative models for speech pattern recognition, in Handbook of Pattern Recognition and Computer Vision, ed. by C.H. Chen (World Scientific, Singapore, 2016), pp. 27–52CrossRefGoogle Scholar
  22. 22.
    R. Dey, F.M. Salemt, Gate-variants of Gated Recurrent Unit (GRU) neural networks, in 60th International Midwest Symposium on Circuits and Systems (MWSCAS) (2017), pp. 1597–1600Google Scholar
  23. 23.
    J.S. Edwards, R.P. Ramachandran, U. Thayasivam, Robust speaker verification with a two classifier format and feature enhancement, in IEEE international symposium on circuits and systems (ISCAS) (2017), pp. 1–4Google Scholar
  24. 24.
    M. El Ayadi, M.S. Kamel, F. Karray, Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit. 44(3), 572–587 (2011)CrossRefzbMATHGoogle Scholar
  25. 25.
    D. Erhan, Y. Bengio, A. Courville, P.A. Manzagol, P. Vincent, S. Bengio, Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11, 625–660 (2010)MathSciNetzbMATHGoogle Scholar
  26. 26.
    A. Fazel, S. Chakrabartty, An overview of statistical pattern recognition techniques for speaker verification. IEEE Circuits Syst. Mag. 11(2), 62–81 (2011)CrossRefGoogle Scholar
  27. 27.
    J. Fowers, G. Brown, P. Cooke, G. Stitt, A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications, in Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (2012), pp. 47–56Google Scholar
  28. 28.
    S.W. Fu, Y. Tsao, X. Lu, H. Kawai, Raw waveform-based speech enhancement by fully convolutional networks (2017). arXiv preprint arXiv:1703.02205
  29. 29.
    S.W. Fu, Y. Tsao, X. Lu, SNR-aware convolutional neural network modeling for speech enhancement, in Interspeech (2016), pp. 3768–3772Google Scholar
  30. 30.
    F.A. Gers, J. Schmidhuber, F. Cummins, Learning to forget: continual prediction with LSTM, in 9th International Conference on Artificial Neural Networks (ICANN) (1999), pp. 850–855Google Scholar
  31. 31.
    P.K. Ghosh, A. Tsiartas, S. Narayanan, Robust voice activity detection using long-term signal variability. IEEE Trans. Audio Speech Lang. Process. 19(3), 600–613 (2011)CrossRefGoogle Scholar
  32. 32.
    X. Glorot, A. Bordes, Y. Bengio, Deep sparse rectifier neural networks, in Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (2011), pp. 315–323Google Scholar
  33. 33.
    I. Goodfellow, Y. Bengio, A. Courville, Y. Bengio, Deep Learning (MIT Press, Cambridge, 2016)zbMATHGoogle Scholar
  34. 34.
    S. Han, B. Dally, Efficient methods and hardware for deep learning. University Lecture (2017)Google Scholar
  35. 35.
    S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M.A. Horowitz, W.J. Dally, EIE: efficient inference engine on compressed deep neural network, in ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) (2016), pp. 243–254Google Scholar
  36. 36.
    K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), pp. 770–778Google Scholar
  37. 37.
    G.E. Hinton, S. Osindero, Y.W. Teh, A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  39. 39.
    S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  40. 40.
    C.W. Huang, S. Narayanan, Characterizing types of convolution in deep convolutional recurrent neural networks for robust speech emotion recognition (2017). arXiv preprint arXiv:1706.02901
  41. 41.
    N.P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, R. Boyle, et al., In-datacenter performance analysis of a tensor processing unit, in 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) (2017), pp. 1–12Google Scholar
  42. 42.
    P. Kenny, G. Boulianne, P. Ouellet, P. Dumouchel, Joint factor analysis versus eigenchannels in speaker recognition. IEEE Trans. Audio Speech Lang. Process. 15(4), 1435–1447 (2007)CrossRefGoogle Scholar
  43. 43.
    H.K. Kim, R.V. Cox, R.C. Rose, Performance improvement of a bitstream-based front-end for wireless speech recognition in adverse environments. IEEE Trans. Speech Audio Process. 10(8), 591–604 (2002)CrossRefGoogle Scholar
  44. 44.
    D.P. Kingma, M. Welling, Auto-encoding variational bayes (2013). arXiv preprint. arXiv:1312.6114
  45. 45.
    T. Kinnunen, H. Li, An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52(1), 12–40 (2010)CrossRefGoogle Scholar
  46. 46.
    P.W. Koh, P. Liang, Understanding black-box predictions via influence functions, in Proceedings of the 34th International Conference on Machine Learning-Volume 70 (JMLR. org, 2017), pp. 1885–1894Google Scholar
  47. 47.
    A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (2012), pp. 1097–1105Google Scholar
  48. 48.
    H.T. Kung, B. McDanel, S.Q. Zhang, Mapping systolic arrays onto 3D circuit structures: accelerating convolutional neural network inference, in IEEE Workshop on Signal Processing Systems (2018)Google Scholar
  49. 49.
    G. Lacey, G.W. Taylor, S. Areibi, Deep learning on fpgas: past, present, and future (2016). arXiv preprint arXiv:1602.04283
  50. 50.
    Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521(7553), 436 (2015)CrossRefGoogle Scholar
  51. 51.
    Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  52. 52.
    Z. Li, J. Eichel, A. Mishra, A. Achkar, K. Naik, A CPU-based algorithm for traffic optimization based on sparse convolutional neural networks, in Electrical and Computer Engineering (CCECE), 2017 IEEE 30th Canadian Conference IEEE (2017), pp. 1–5Google Scholar
  53. 53.
    M. Lin, Q. Chen, S. Yan, Network in network (2013). arXiv preprint. arXiv:1312.4400
  54. 54.
    Z.C. Lipton, J. Berkowitz, C. Elkan, A critical review of recurrent neural networks for sequence learning (2015). arXiv preprint. arXiv:1506.00019
  55. 55.
    P.C. Loizou, Speech Enhancement: Theory and Practice (CRC Press, Boca Raton, 2007)CrossRefGoogle Scholar
  56. 56.
    A. Makhzani, B. Frey, K-sparse autoencoders (2013). arXiv preprint. arXiv:1312.5663
  57. 57.
    T. May, S. Van De Par, A. Kohlrausch, Noise-robust speaker recognition combining missing data techniques and universal background modeling. IEEE Trans. Audio, SpeechLang. Process. 20(1), 108–121 (2012)CrossRefGoogle Scholar
  58. 58.
    A. McCree, Reducing speech coding distortion for speaker identification, in Ninth International Conference on Spoken Language Processing (2006)Google Scholar
  59. 59.
    W.S. McCulloch, W. Pitts, A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5(4), 115–133 (1943)MathSciNetCrossRefzbMATHGoogle Scholar
  60. 60.
    M. McLaren, Y. Lei, N. Scheffer, L. Ferrer, Application of convolutional neural networks to speaker recognition in noisy conditions, in Fifteenth Annual Conference of the International Speech Communication Association (2014)Google Scholar
  61. 61.
    J. Ming, T.J. Hazen, J.R. Glass, D.A. Reynolds, Robust speaker recognition in noisy conditions. IEEE Trans. Audio Speech Lang. Process. 15(5), 1711–1723 (2007)CrossRefGoogle Scholar
  62. 62.
    V. Mitra, H. Franco, Time-frequency convolutional networks for robust speech recognition. in 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) (IEEE, 2015), pp. 317–323Google Scholar
  63. 63.
    H. Muckenhirn, M.M. Doss, S. Marcell, Towards directly modeling raw speech signal for speaker verification using CNNs, in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018), pp. 4884–4888Google Scholar
  64. 64.
    R.W. Mudrowsky, R.P. Ramachandran, U. Thayasivam, S.S. Shetty, Robust speaker recognition in the presence of speech coding distortion for remote access applications, in Proceedings of the International Conference on Data Mining (DMIN) (2016), p. 176Google Scholar
  65. 65.
    C. Murphy, Y. Fu, Xilinx all programmable devices: a superior platform for compute-intensive systems. Xilinx White Paper (2017)Google Scholar
  66. 66.
    V. Nair, G.E. Hinton, Rectified linear units improve restricted Boltzmann machines, in Proceedings of the 27th International Conference on Machine Learning (ICML) (2010), pp. 807–814Google Scholar
  67. 67.
    E. Nurvitadhi, J. Sim, D. Sheffield, A. Mishra, S. Krishnan, D. Marr, Accelerating recurrent neural networks in analytics servers: comparison of FPGA, CPU, GPU, and ASIC, in 2016 26th International Conference on Field Programmable Logic and Applications (FPL) (IEEE, 2016), pp. 1–4Google Scholar
  68. 68.
    E. Nurvitadhi, G. Venkatesh, J. Sim, D. Marr, R. Huang, J. Ong Gee Hock, G. Boudoukh, et al., Can FPGAs beat GPUs in accelerating next-generation deep neural networks? in Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (ACM, 2017), pp. 5–14Google Scholar
  69. 69.
    R. Ondusko, M. Marbach, R.P. Ramachandran, L.M. Head, Blind signal-to-noise ratio estimation of speech based on vector quantizer classifiers and decision level fusion. J. Signal Process. Syst. 89(2), 335–345 (2017)CrossRefGoogle Scholar
  70. 70.
    K. Ovtcharov, O. Ruwase, J.Y. Kim, J. Fowers, K. Strauss, E.S. Chung, Accelerating deep convolutional neural networks using specialized hardware. Microsoft Res. Whitepaper 2(11), 1–4 (2015)Google Scholar
  71. 71.
    G. Parascandolo, T. Heittola, H. Huttunen, T. Virtanen, Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1291–1303 (2017)CrossRefGoogle Scholar
  72. 72.
    M. Parchami, W.P. Zhu, B. Champagne, E. Plourde, Recent developments in speech enhancement in the short-time Fourier transform domain. IEEE Circuits Syst. Mag. 16(3), 45–77 (2016)CrossRefGoogle Scholar
  73. 73.
    C. Poultney, S. Chopra, Y.L. Cun, Efficient learning of sparse representations with an energy-based model, in Advances in Neural Information Processing Systems (2007), pp. 1137–1144Google Scholar
  74. 74.
    Y. Qian, M. Bi, T. Tan, K. Yu, Very deep convolutional neural networks for noise robust speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 24(12), 2263–2276 (2016)CrossRefGoogle Scholar
  75. 75.
    V. Ramamoorthy, N.S. Jayant, R.V. Cox, M.M. Sondhi, Enhancement of ADPCM speech coding with backward-adaptive algorithms for postfiltering and noise feedback. IEEE J. Select. Areas Commun. 6(2), 364–382 (1988)CrossRefGoogle Scholar
  76. 76.
    S. Rifai, P. Vincent, X. Muller, X. Glorot, Y. Bengio, Contractive auto-encoders: explicit invariance during feature extraction, in Proceedings of the 28th International Conference on International Conference on Machine Learning (Omnipress, 2011), pp. 833–840Google Scholar
  77. 77.
    F. Rosenblatt, The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386 (1958)CrossRefGoogle Scholar
  78. 78.
    D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning internal representations by error propagation (No. ICS-8506). California Univ San Diego La Jolla Inst for Cognitive Science (1985)Google Scholar
  79. 79.
    O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, A.C. Berg, Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  80. 80.
    S. Sabour, N. Frosst, G.E. Hinton, Dynamic routing between capsules, in Advances in Neural Information Processing Systems (2017), pp. 3856–3866Google Scholar
  81. 81.
    T.N. Sainath, R.J. Weiss, A. Senior, K.W. Wilson, O. Vinyals, Learning the speech front-end with raw waveform CLDNNs, in Sixteenth Annual Conference of the International Speech Communication Association (2015)Google Scholar
  82. 82.
    J. Schmidhuber, Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)CrossRefGoogle Scholar
  83. 83.
    Y. Shen, M. Ferdman, P. Milder, Maximizing CNN accelerator efficiency through resource partitioning, in ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) (2017), pp. 535–547Google Scholar
  84. 84.
    K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint. arXiv:1409.1556
  85. 85.
    B.Y. Smolenski, R.P. Ramachandran, Usable speech processing: a filterless approach in the presence of interference. IEEE Circuits Syst. Mag. 11(2), 8–22 (2011)CrossRefGoogle Scholar
  86. 86.
    B.V. Srinivasan, Y. Luo, D. Garcia-Romero, D.N. Zotkin, R.A. Duraiswami, symmetric kernel partial least squares framework for speaker recognition. IEEE Trans. Audio Speech Lang. Process. 21(7), 1415–1423 (2013)CrossRefGoogle Scholar
  87. 87.
    K. Sundararajan, D.L. Woodard, Deep learning for biometrics: a survey. ACM Comput. Surv. (CSUR) 51(3), 65 (2018)CrossRefGoogle Scholar
  88. 88.
    V. Sze, Y.H. Chen, T.J. Yang, J.S. Emer, Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105(12), 2295–2329 (2017)CrossRefGoogle Scholar
  89. 89.
    C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015), pp. 1–9Google Scholar
  90. 90.
    R. Togneri, D. Pullella, An overview of speaker identification: accuracy and robustness issues. IEEE Circuits Syst. Mag. 11(2), 23–61 (2011)CrossRefGoogle Scholar
  91. 91.
    G. Trigeorgis, F. Ringeval, R. Brueckner, E. Marchi, M.A. Nicolaou, B. Schuller, S. Zafeiriou, Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network, in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2016) pp. 5200–5204Google Scholar
  92. 92.
    Z. Tufekci, J.N. Gowdy, S. Gurbuz, E. Patterson, Applied mel-frequency discrete wavelet coefficients and parallel model compensation for noise-robust speech recognition. Speech Commun. 48(10), 1294–1307 (2006)CrossRefGoogle Scholar
  93. 93.
    V. Vanhoucke, A. Senior, M.Z. Mao, Improving the speed of neural networks on CPUs, in Proceedings of Deep Learning and Unsupervised Feature Learning NIPS Workshop (2011)Google Scholar
  94. 94.
    P. Vincent, H. Larochelle, Y. Bengio, P.A. Manzagol, Extracting and composing robust features with denoising autoencoders, in ACM Proceedings of the 25th International Conference on Machine Learning(2008), pp. 1096–1103Google Scholar
  95. 95.
    P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.A. Manzagol, Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)MathSciNetzbMATHGoogle Scholar
  96. 96.
    N. Wang, P.C. Ching, N. Zheng, T. Lee, Robust speaker recognition using denoised vocal source and vocal tract features. IEEE Trans. Audio Speech Lang. Process. 19(1), 196–205 (2011)CrossRefGoogle Scholar
  97. 97.
    Y. Wang, L. Neves, F. Metze, Audio-based multimedia event detection using deep recurrent neural networks, in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2016), pp. 2742–2746Google Scholar
  98. 98.
    F. Weninger, H. Erdogan, S. Watanabe, E. Vincent, J. Le Roux, J.R. Hershey, B. Schuller, Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR, in International Conference on Latent Variable Analysis and Signal Separation (Springer, Cham, 2015), pp. 91–99Google Scholar
  99. 99.
    P. Werbos, Beyond regression: new tools for prediction and analysis in the behavioral sciences. PhD thesis, Harvard University (1974)Google Scholar
  100. 100.
    Y. Xu, J. Du, L.R. Dai, C.H. Lee, A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 23(1), 7–19 (2015)CrossRefGoogle Scholar
  101. 101.
    N. Yang, R. Muraleedharan, J. Kohl, I. Demirkol, W. Heinzelman, M. Sturge-Apple, Speech-based emotion classification using multiclass SVM with hybrid kernel and thresholding fusion. In spoken language technology workshop (slt) (2012), pp. 455–460Google Scholar
  102. 102.
    R. Zazo Candil, T.N. Sainath, G. Simko, C. Parada, Feature learning with raw-waveform CLDNNs for voice activity detection, in Interspeech (2016), pp. 3668–3672Google Scholar
  103. 103.
    M.D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in European Conference on Computer Vision (Springer, Cham, 2014), pp. 818–833Google Scholar
  104. 104.
    C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, J. Cong, Optimizing fpga-based accelerator design for deep convolutional neural networks, in Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (ACM, 2015), pp. 161–170Google Scholar
  105. 105.
    Z. Zhao, H. Liu, T. Fingscheidt, Convolutional neural networks to enhance coded speech (2018). arXiv preprint arXiv:1806.09411

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Tokunbo Ogunfunmi
    • 1
    Email author
  • Ravi Prakash Ramachandran
    • 2
  • Roberto Togneri
    • 3
  • Yuanjun Zhao
    • 3
  • Xianjun Xia
    • 3
  1. 1.Department of Electrical EngineeringSanta Clara UniversitySanta ClaraUSA
  2. 2.Department of Electrical and Computer EngineeringRowan UniversityGlassboroUSA
  3. 3.Department of EEC EngineeringThe University of Western AustraliaPerthAustralia

Personalised recommendations