Circuits, Systems, and Signal Processing

, Volume 37, Issue 8, pp 3651–3670 | Cite as

Image Processing Techniques for Segments Grouping in Monaural Speech Separation

  • S. ShobaEmail author
  • R. Rajavel


Monaural speech separation is the process of separating the target speech from the noisy speech mixture recorded using single microphone. It is a challenging problem in speech signal processing, and recently, computational auditory scene analysis (CASA) finds a reasonable solution to solve this problem. This research work proposes an image analysis-based algorithm to enhance the binary T–F mask obtained in the initial segmentation stage of CASA-based monaural speech separation systems to improve the speech quality. The proposed algorithm consists of labeling the initial segmentation mask, boundary extraction, active pixel detection and finally eliminating the noisy non-active pixels. In labeling, the T–F mask obtained from the initial segmentation is labeled as periodicity pixel matrix and non-periodicity pixel matrix. Next boundaries are created by connecting all the possible nearby periodicity pixel matrix and non-periodicity pixel matrix as speech boundary. Some speech boundary may include noisy T–F units as holes, and these holes are treated using the proposed algorithm to properly classify them as the speech-dominant or noise-dominant T–F units in the active pixel detection process. Finally, the noisy T–F units are eliminated. The performance of the proposed algorithm is evaluated using TIMIT speech database. The experimental results show that the proposed algorithm improves the quality of the separated speech by increasing the signal-to-noise ratio by an average value of 9.64 dB and reduces the noise residue by 25.55% as compared to the noisy speech mixture.


CASA Image analysis Morphological image processing Speech separation Noise residue 


  1. 1.
    A.K.H. Al-Ali, D. David, S. Bouchra, C. Vinod, G.R. Naik, Enhanced forensic speaker verification using a combination of DWT and MFCC feature warping in the presence of noise and reverberation conditions. IEEE Access (2017). Google Scholar
  2. 2.
    A. Bednar, M.B. Francis, C.L. Edmund, Different spatio-temporal electroencephalography features drive the successful decoding of binaural and monaural cues for sound localization. Eur. J. Neurosci. 45(5), 679–689 (2017)CrossRefGoogle Scholar
  3. 3.
    S.F. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27, 113–120 (1979)CrossRefGoogle Scholar
  4. 4.
    G.J. Brown, M.P. Cooke, Computational auditory scene analysis. Comput. Speech Lang. 8(4), 297–336 (1994)CrossRefGoogle Scholar
  5. 5.
    G.J. Brown, D.L. Wang, Separation of speech by computational auditory scene analysis, in Speech Enhancement, ed. by J. Benesty, S. Makino, J. Chen (Springer, New York, 2005), pp. 371–402CrossRefGoogle Scholar
  6. 6.
    M. Dharmalingam, M.C. JohnWiselin, R. Rajavel, Optimizing the objective measure of speech quality in monaural speech separation, in Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics, pp. 545–552 (2016)Google Scholar
  7. 7.
    Y. Ephraim, H.L. Trees, A signal subspace approach for speech enhancement. IEEE Trans. Speech Audio Process. 3(4), 251–266 (1995)CrossRefGoogle Scholar
  8. 8.
    N. Harish, R. Rajavel, Monaural speech separation system based on optimum soft mask, in Proceedings of IEEE International Conference on Computational Intelligence and Computing Research, Coimbatore, India, pp. 1–5 (2014)Google Scholar
  9. 9.
    G. Hu, D. Wang, Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Trans. Neural Netw. 15(5), 1135–1150 (2004)CrossRefGoogle Scholar
  10. 10.
    G. Hu, D. Wang, An auditory scene analysis approach to monaural speech segregation, in Topics in Acoustic Echo and Noise Control, ed. by E. Hansler, G. Schmidt (Springer, New York, 2006), pp. 485–515Google Scholar
  11. 11.
    G. Hu, D. Wang, Auditory segmentation based on onset and offset analysis. IEEE Trans. Audio Speech Lang. Process. 15(2), 396–405 (2007)CrossRefGoogle Scholar
  12. 12.
    K. Hu, D. Wang, Unvoiced speech segregation from non-speech interference via CASA and spectral subtraction. IEEE Trans. Audio Speech Lang. Process. 19(6), 1600–1609 (2011)CrossRefGoogle Scholar
  13. 13.
    A. Hyvarinen, J. Karhunen, E. Oja, Independent Component Analysis (Wiley, New York, 2001)CrossRefGoogle Scholar
  14. 14.
    J. Jensen, J.H.L. Hansen, Speech enhancement using a constrained iterative sinusoidal model. IEEE Trans. Speech Audio Process. 9(7), 731–740 (2001)CrossRefGoogle Scholar
  15. 15.
    F.L. Lamel, R.H. Kassel, S. Seneff, Speech database development: design and analysis of the acoustic-phonetic corpus, in Proceedings of DARPA Speech Recognition Workshop, Report No SAIC -86/1546 (1986)Google Scholar
  16. 16.
    R. Meddis, Simulation of auditory-neural transduction: further studies. J. Acoust. Soc. Am. 83(3), 1056–1063 (1988)CrossRefGoogle Scholar
  17. 17.
    G.R. Naik, W. Wang, Audio analysis of statistically instantaneous signals with mixed Gaussian probability distributions. Int. J. Electron. 99(10), 1333–135 (2012)CrossRefGoogle Scholar
  18. 18.
    G.R. Naik, Measure of quality of source separation for sub-and super-Gaussian audio mixtures. Informatica 23(4), 581–599 (2012)MathSciNetzbMATHGoogle Scholar
  19. 19.
    R.D. Patterson, I. Nimmo-Smith, J. Holdsworth, et al., An efficient auditory filterbank based on the gammatone function. MRC Applied Psychology Unit (1988)Google Scholar
  20. 20.
    R. Pichevar, J. Rouat, A quantitative evaluation of a bio-inspired sound segregation technique for two- and three-source mixtures, in Nonlinear Speech Modeling and Applications (Lecture Notes in Computer Science), ed. by G. Chollet, A. Esposito, M. Faundez-Zanuy, M. Marinaro (Springer, Berlin, 2005), pp. 430–435Google Scholar
  21. 21.
    H. Sameti, H. Sheikhzadeh, L. Deng et al., HMM-based strategies for enhancement of speech signals embedded in non-stationary noise. IEEE Trans. Speech Audio Process. 6(5), 445–455 (1998)CrossRefGoogle Scholar
  22. 22.
    S. Shoba, R. Rajavel, Adaptive energy threshold selection for monaural speech separation, in Proceedings of IEEE International Conference on Communication and Signal Process, Melmaruvathur, India (2017)Google Scholar
  23. 23.
    I. Trowitzsch, Robust detection of environmental sounds in binaural auditory scenes. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1344–1356 (2017)CrossRefGoogle Scholar
  24. 24.
    D.L. Wang, H. Kun, Towards generalizing classification based speech separation. IEEE Trans. Audio Speech Lang. Process 21(1), 68–177 (2013)Google Scholar
  25. 25.
    D. Wang, Tandem algorithm for pitch estimation and voiced speech segregation. IEEE Trans. Audio Speech Lang. Process 18(8), 2067–2079 (2012)Google Scholar
  26. 26.
    D.L. Wang, G.J. Brown, Separation of speech from interfering sounds based on oscillatory correlation. IEEE Trans. Neural Netw. 10, 684–697 (1999)CrossRefGoogle Scholar
  27. 27.
    Y. Wang, J. Lin, Improved monaural speech segregation based on computational auditory scene analysis. J. Audio Speech Music Process. (2013). Google Scholar
  28. 28.
    M. Weintraub, A Theory and Computational Model of Auditory Monaural Sound Separation. Ph.D. dissertation (Stanford University, Standford, CA, 1985)Google Scholar
  29. 29.
    X. Zhang, D. Wang, Deep learning based binaural speech separation in reverberant environments. IEEE/ACM Trans. Audio Speech Lang. Process. 25(5), 1075–1084 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  1. 1.Department of Electronics and Communication EngineeringSSN College of EngineeringChennaiIndia

Personalised recommendations