Signal speech reconstruction and noise removal using convolutional denoising audioencoders with neural deep learning

  • Houda AbouzidEmail author
  • Otman Chakkor
  • Oscar Gabriel Reyes
  • Sebastian Ventura


Datasets exist in real life in many formats (audio, music, image,...). In our case, we have them from various sources mixed together. Our mixtures represent noisy audio data that need to be extracted (features), compressed and analysed in order to be presented in a standard way. The resulted data will be used for the Blind Source Separation task. In this paper, we deal with two types of autoencoders: convolutional and denoising. The novelty of our work is to reconstruct the audio signal in the output of the neural network after extracting the meaningful features that present the pure and the powerful information. Simulation results show a great performance, yielding of 87% for the reconstructed signals that will be included in the automated system used for real word applications.


Denoising autoencoder Convolutional autoencoder BSS Keras Deep learning Neural network 



Drs Reyes and Ventura want to acknowledge the economical support of the Spanish Ministry of Economy and Competitiveness and the Fund of Regional Development (Project TIN2017-83445-P).


  1. 1.
    Li, Y., Wang, F., Chen, Y., Cichocki, A., & Sejnowski, T. (2017). The effects of audiovisual inputs on solving the cocktail party problem in the human brain: An fmri study. Cerebral Cortex, 28, 1–15.Google Scholar
  2. 2.
    Févotte, C., & Cardoso, J.-F. (2005). Maximum likelihood approach for blind audio source separation using time-frequency gaussian source models. In IEEE workshop on applications of signal processing to audio and acoustics (pp. 78–81). IEEE.Google Scholar
  3. 3.
    Duong, N. Q. K., Vincent, E., & Gribonval, R. (2010). Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Transactions on Audio, Speech, and Language Processing, 18(7), 1830–1840.Google Scholar
  4. 4.
    Romano, J. M. T., Romis, A., Cavalcante, C. C., & Suyama, R. (2016). Unsupervised signal processing: Channel equalization and source separation. Boca Raton: CRC Press.Google Scholar
  5. 5.
    Zhang, R., Zhu, J.-Y., Isola, P., Geng, X., Lin, A. S., Yu, T., et al. (2017). Real-time user-guided image colorization with learned deep priors. arXiv preprint arXiv:1705.02999.
  6. 6.
    Chandna, P., Miron, M., Janer, J., & Gómez, E. (2017). Monoaural audio source separation using deep convolutional neural networks. In International conference on latent variable analysis and signal separation (pp. 258–266). Springer.Google Scholar
  7. 7.
    Dubey, N., & Mehra, R. (2015). Blind audio source separation (bass): An unsupervised approach. International Journal of Electrical and Electronics Engineering, 2, 29–33.Google Scholar
  8. 8.
    Zhao, M., Wang, D., Zhang, Z., & Zhang, X. (2015). Music removal by convolutional denoising autoencoder in speech recognition. In 2015 Asia-Pacific signal and information processing association annual summit and conference (APSIPA) (pp. 338–341). IEEE.Google Scholar
  9. 9.
    Katsamanis, A., Black, M., Georgiou, P. G., Goldstein, L., & Narayanan, S. (2011). Sailalign: Robust long speech-text alignment. In Proceedings of workshop on new tools and methods for very-large scale phonetics research.Google Scholar
  10. 10.
    Houda, A., & Otman, C. (2015). Blind audio source separation: State-of-art. International Journal of Computer Applications, 130(4), 1–6.Google Scholar
  11. 11.
    Houda, A., & Otman, C. (2017). A novel method based on gaussianity and sparsity for signal separation algorithms. International Journal of Electrical and Computer Engineering (IJECE), 7(4), 1906–1914.Google Scholar
  12. 12.
    Kim, E., Hannan, D., & Kenyon, G. (2017). Deep sparse coding for invariant multimodal halle berry neurons. arXiv preprint arXiv:1711.07998.
  13. 13.
    Middlebrooks, J. C., & Simon, J. Z. (2017). Ear and brain mechanisms for parsing the auditory scene. In The Auditory System at the Cocktail Party (pp. 1–6). Springer.Google Scholar
  14. 14.
    Saruwatari, H., Kurita, S., Takeda, K., Itakura, F., Nishikawa, T., & Shikano, K. (2003). Blind source separation combining independent component analysis and beamforming. EURASIP Journal on Advances in Signal Processing, 2003(11), 569270.zbMATHGoogle Scholar
  15. 15.
    Leglaive, S., Badeau, R., & Richard, G. (2017). Separating time-frequency sources from time-domain convolutive mixtures using non-negative matrix factorization. In 2017 IEEE workshop on applications of signal processing to audio and acoustics (WASPAA) (pp 264–268). IEEE.Google Scholar
  16. 16.
    Jang, G., Kim, H.-G., & Oh, Y.-H. (2014). Audio source separation using a deep autoencoder. arXiv preprint arXiv:1412.7193.
  17. 17.
    Wang, D., & Chen, J. (2018). Supervised speech separation based on deep learning: An overview. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 975, 8887.Google Scholar
  18. 18.
    Abouzid, H, & Chakkor, O. (2017). Blind source separation approach for audio signals based on support vector machine classification. In Proceedings of the 2nd international conference on computing and wireless communication systems (p. 39). ACM.Google Scholar
  19. 19.
    Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10(1–3), 19–41.Google Scholar
  20. 20.
    Pawar, R. V., Jalnekar, R. M., & Chitode, J. S. (2018). Review of various stages in speaker recognition system, performance measures and recognition toolkits. Analog Integrated Circuits and Signal Processing, 94(2), 247–257.Google Scholar
  21. 21.
    Bello, J. P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., & Sandler, M. B. (2005). A tutorial on onset detection in music signals. IEEE Transactions on Speech and Audio Processing, 13(5), 1035–1047.Google Scholar
  22. 22.
    Mary, L. (2011). Extraction and representation of prosody for speaker, speech and language recognition. Berlin: Springer.zbMATHGoogle Scholar
  23. 23.
    Degara-Quintela, N., Pena, A., Sobreira-Seoane, M., & Torres-Guijarro, S. Knowledge-based onset detection in musical applications.Google Scholar
  24. 24.
    Dannenberg, R. B. (1984). An on-line algorithm for real-time accompaniment. In ICMC (Vol. 84, pp. 193–198).Google Scholar
  25. 25.
    Sarroff, A. M, & Casey, M. A. (2014). Musical audio synthesis using autoencoding neural nets. In ICMC.Google Scholar
  26. 26.
    Abouzid, H., & Chakkor, O. (2018). Dimension reduction techniques for signal separation algorithms. In International conference on big data, cloud and applications (pp. 326–340). Springer.Google Scholar
  27. 27.
    Liutkus, A., Stöter, F.-R., Rafii, Z., Kitamura, D., Rivet, B., Ito, N., et al. (2017). The 2016 signal separation evaluation campaign. In International conference on latent variable analysis and signal separation (pp. 323–332). Springer.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Telecommunications DepartmentENSATE, Abdelmalek Essaadi UniversityTétouanMorocco
  2. 2.Department of Computer Science and Numerical AnalysisUniversity of CordobaCórdobaSpain

Personalised recommendations