Skip to main content

Modulation Processing for Speech Enhancement

  • Chapter
  • First Online:
Speech and Audio Processing for Coding, Enhancement and Recognition

Abstract

Many of the traditionally speech enhancement methods reduce noise from corrupted speech by processing the magnitude spectrum in a short-time Fourier analysis-modification-synthesis (AMS) based framework. More recently, use of the modulation domain for speech processing has been investigated, however early efforts in this direction did not account for the changing properties of the modulation spectrum across time. Motivated by this and evidence of the significance of the modulation domain, we investigated the processing of the modulation spectrum on a short-time basis for speech enhancement. For this purpose, a modulation domain-based AMS framework was used, in which the trajectories of each acoustic frequency bin were processed frame-wise in a secondary AMS framework. A number of different enhancement algorithms were investigated for the enhancement of speech in the short-time modulation domain. These included spectral subtraction and MMSE magnitude estimation. In each case, the respective algorithm was used to modify the short-time modulation magnitude spectrum within the modulation AMS framework. Here we review the findings of this investigation, comparing the quality of stimuli enhanced using these modulation based approaches to stimuli enhanced using corresponding modification algorithms applied in the acoustic domain. Results presented show modulation domain based approaches to have improved quality compared to their acoustic domain counterparts. Further, MMSE modulation magnitude estimation (MME) is shown to have improved speech quality compared to Modulation spectral subtraction (ModSSub) stimuli. MME stimuli are found to have good removal of noise without the introduction of musical noise, problematic in spectral subtraction based enhancement. Results also show that ModSSub has minimal musical noise compared to acoustic Spectral subtraction, for appropriately selected modulation frame duration. For modulation domain based methods, modulation frame duration is shown to be an important parameter, with quality generally improved by use of shorter frame durations. From the results of experiments conducted, it is concluded that the short-time modulation domain provides an effective alternative to the short-time acoustic domain for speech processing. Further, that in this domain, MME provides effective noise suppression without the introduction of musical noise distortion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that for references made to the magnitude, phase or complex spectra throughout this text, the STFT modifier is implied unless otherwise stated. The acoustic and modulation modifiers are also included to disambiguate between acoustic and modulation domains.

References

  1. J. Allen, L. Rabiner, A unified approach to short-time Fourier analysis and synthesis. Proc. IEEE 65(11), 1558–1564 (1977)

    Article  Google Scholar 

  2. T. Arai, M. Pavel, H. Hermansky, C. Avendano, Intelligibility of speech with filtered time trajectories of spectral envelopes, in Proceedings of International Conference on Spoken Language Processing (ICSLP), Philadelphia, PA, Oct 1996, pp. 2490–2493

    Google Scholar 

  3. L. Atlas, Modulation spectral transforms: application to speech separation and modification. Tech. Rep. 155. IEICE, University of Washington, Washington, WA (2003)

    Google Scholar 

  4. L. Atlas, S. Shamma, Joint acoustic and modulation frequency. EURASIP J. Appl. Signal Process. 2003(7), 668–675 (2003)

    Article  MATH  Google Scholar 

  5. L. Atlas, M. Vinton, Modulation frequency and efficient audio coding, in Proceedings of the SPIE The International Society for Optical Engineering, vol. 4474 (2001), pp. 1–8

    Google Scholar 

  6. S. Bacon, D. Grantham, Modulation masking: effects of modulation frequency, depth, and phase. J. Acoust. Soc. Am. 85(6), 2575–2580 (1989)

    Article  Google Scholar 

  7. M. Berouti, R. Schwartz, J. Makhoul, Enhancement of speech corrupted by acoustic noise, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 4., Washington, DC, Apr 1979, pp. 208–211

    Google Scholar 

  8. S. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27(2), 113–120 (1979)

    Article  Google Scholar 

  9. O. Cappe, Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE Trans. Speech Audio Process. 2(2), 345–349 (1994)

    Article  Google Scholar 

  10. I. Cohen, Relaxed statistical model for speech enhancement and a priori SNR estimation. IEEE Trans. Speech Audio Process. 13(5), 870–881 (2005)

    Article  Google Scholar 

  11. D. Depireux, J. Simon, D. Klein, S. Shamma, Spectrotemporal response field characterization with dynamic ripples in ferrect primary auditory cortex. J. Neurophysiol. 85(3), 1220–1234 (2001)

    Google Scholar 

  12. R. Drullman, J. Festen, R. Plomp, Effect of reducing slow temporal modulations on speech reception. J. Acoust. Soc. Am. 95(5), 2670–2680 (1994)

    Article  Google Scholar 

  13. R. Drullman, J. Festen, R. Plomp, Effect of temporal envelope smearing on speech reception. J. Acoust. Soc. Am. 95(2), 1053–1064 (1994)

    Article  Google Scholar 

  14. Y. Ephraim, D. Malah, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)

    Article  Google Scholar 

  15. Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 33(2), 443–445 (1985)

    Article  Google Scholar 

  16. T. Falk, S. Stadler, W.B. Kleijn, W.-Y. Chan, Noise suppression based on extending a speech-dominated modulation band, in Proceedings of the ISCA Conference of the International Speech Communication Association (INTERSPEECH), Antwerp, Aug 2007, pp. 970–973

    Google Scholar 

  17. R. Goldsworthy, J. Greenberg, Analysis of speech-based speech transmission index methods with implications for nonlinear operations. J. Acoust. Soc. Am. 116(6), 3679–3689 (2004)

    Article  Google Scholar 

  18. R. Gray, A. Buzo, A. Gray, Y. Matsuyama, Distortion measures for speech processing. IEEE Trans. Acoust. Speech Signal Process. 28(4), 367–376 (1980)

    Article  MATH  Google Scholar 

  19. S. Greenberg, T. Arai, The relation between speech intelligibility and the complex modulation spectrum, in Proceedings of the ISCA European Conference on Speech Communication and Technology (EUROSPEECH), Aalborg, Sept 2001, pp. 473–476

    Google Scholar 

  20. D. Griffin, J. Lim, Signal estimation from modified short-time Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 32(2), 236–243 (1984)

    Article  Google Scholar 

  21. H. Hermansky, N. Morgan, RASTA processing of speech. IEEE Trans. Speech Audio Process. 2, 578–589 (1994)

    Article  Google Scholar 

  22. H. Hermansky, E. Wan, C. Avendano, Speech enhancement based on temporal processing, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, Detroit, MI, May 1995, pp. 405–408

    Google Scholar 

  23. T. Houtgast, H. Steeneken, A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria. J. Acoust. Soc. Am. 77(3), 1069–1077 (1985)

    Article  Google Scholar 

  24. X. Huang, A. Acero, H. Hon, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development (Prentice Hall, Upper Saddle River, 2001)

    Google Scholar 

  25. S. Kamath, P. Loizou, A multi-band spectral subtraction method for enhancing speech corrupted by colored noise, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2002)

    Google Scholar 

  26. N. Kanedera, T. Arai, H. Hermansky, M. Pavel, On the relative importance of various components of the modulation spectrum for automatic speech recognition. Speech Commun. 28(1), 43–55 (1999)

    Article  Google Scholar 

  27. D. Kim, A cue for objective speech quality estimation in temporal envelope representations. IEEE Signal Process. Lett. 11(10), 849–852 (2004)

    Article  Google Scholar 

  28. D. Kim, Anique: an auditory model for single-ended speech quality estimation. IEEE Trans. Speech Audio Process. 13(5), 821–831 (2005)

    Article  Google Scholar 

  29. B. Kingsbury, N. Morgan, S. Greenberg, Robust speech recognition using the modulation spectrogram. Speech Commun. 25(1–3), 117–132 (1998)

    Article  Google Scholar 

  30. T. Kinnunen, Joint acoustic-modulation frequency for speaker recognition, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1. Toulouse, May 2006, pp. 665–668

    Google Scholar 

  31. T. Kinnunen, K. Lee, H. Li, Dimension reduction of the modulation spectrogram for speaker verification, in Proceedings of ISCA Speaker and Language Recognition Workshop (ODYSSEY), Stellenbosch, Jan 2008

    Google Scholar 

  32. N. Kowalski, D. Depireux, S. Shamma, Analysis of dynamic spectra in ferret primary auditory cortex: I. Characteristics of single unit responses to moving ripple spectra. J. Neurophysiol. 76(5), 3503–3523 (1996)

    Google Scholar 

  33. J. Lim, A. Oppenheim, Enhancement and bandwidth compression of noisy speech. Proc. IEEE 67(12), 1586–1604 (1979)

    Article  Google Scholar 

  34. P. Loizou, Speech Enhancement: Theory and Practice (Taylor and Francis, Boca Raton, 2007)

    Google Scholar 

  35. X. Lu, S. Matsuda, M. Unoki, S. Nakamura, Temporal contrast normalization and edge-preserved smoothing of temporal modulation structures of speech for robust speech recognition. Speech Commun. 52(1), 1–11 (2010)

    Article  Google Scholar 

  36. J. Lyons, K. Paliwal, Effect of compressing the dynamic range of the power spectrum in modulation filtering based speech enhancement, in Proceedings of ISCA Conference of the International Speech Communication Association (INTERSPEECH), Brisbane, Sep 2008, pp. 387–390

    Google Scholar 

  37. N. Malayath, H. Hermansky, S. Kajarekar, B. Yegnanarayana, Data-driven temporal filters and alternatives to GMM in speaker verification. Digit. Signal Proces. 10(1–3), 55–74 (2000)

    Article  Google Scholar 

  38. R. McAulay, M. Malpass, Speech enhancement using a soft-decision noise suppression filter. IEEE Trans. Acoust. Speech Signal Process. 28(2), 137–145 (1980)

    Article  Google Scholar 

  39. N. Mesgarani, S. Shamma, Speech enhancement based on filtering the spectrotemporal modulations, in Proceedings of IEEE International Conference Acoustics Speech and Signal Processing (ICASSP), vol. 1, Philadelphia, PA, Mar 2005, pp. 1105–1108

    Google Scholar 

  40. C. Nadeu, P. Pachés-Leal, B.-H. Juang, Filtering the time sequences of spectral parameters for speech recognition. Speech Commun. 22(4), 315–332 (1997)

    Article  Google Scholar 

  41. K. Paliwal, B. Schwerin, K. Wójcicki, Role of modulation magnitude and phase spectrum towards speech intelligibility. Speech Commun. 53(3), 327–339 (2011)

    Article  Google Scholar 

  42. K. Paliwal, B. Schwerin, K. Wójcicki, Speech enhancement using minimum mean-square error short-time spectral modulation magnitude estimator. Speech Commun. 54(2), 282–305 (2012)

    Article  Google Scholar 

  43. K. Paliwal, K. Wójcicki, Effect of analysis window duration on speech intelligibility. IEEE Signal Process. Lett. 15, 785–788 (2008)

    Article  Google Scholar 

  44. K. Paliwal, K. Wójcicki, B. Schwerin, Single-channel speech enhancement using spectral subtraction in the short-time modulation domain. Speech Commun. 52(5), 450–475 (2010)

    Article  Google Scholar 

  45. K. Payton, L. Braida, A method to determine the speech transmission index from speech waveforms. J. Acoust. Soc. Am. 106(6), 3637–3648 (1999)

    Article  Google Scholar 

  46. J. Picone, Signal modeling techniques in speech recognition. Proc. IEEE 81(9), 1215–1247 (1993)

    Article  Google Scholar 

  47. S. Quackenbush, T. Barnwell, M. Clements, Objective Measures of Speech Quality (Prentice Hall, Englewood Cliffs, 1988)

    Google Scholar 

  48. T. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice (Prentice Hall, Upper Saddle River, 2002)

    Google Scholar 

  49. L. Rabiner, R. Schafer, Theory and Applications of Digital Speech Processing (Pearson Higher Education, Upper Saddle River, 2011)

    Google Scholar 

  50. A. Rix, J. Beerends, M. Hollier, A. Hekstra, Perceptual Evaluation of Speech Quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. ITU-T Recommendation P.862 (2001)

    Google Scholar 

  51. P. Scalart, J. Filho, Speech enhancement based on a priori signal to noise estimation, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Process (ICASSP), vol. 2. Atlanta, GA, May 1996, pp. 629–632

    Google Scholar 

  52. C. Schreiner, J. Urbas, Representation of amplitude modulation in the auditory cortex of the cat: I. The anterior auditory field (AAF). Hear. Res. 21(3), 227–241 (1986)

    Google Scholar 

  53. B. Schwerin, K. Paliwal, Using STFT real and imaginary parts of modulation signals for MMSE-based speech enhancement. Speech Commun. 58, 49–68 (2014)

    Article  Google Scholar 

  54. S. Shamma, Auditory cortical representation of complex acoustic spectra as inferred from the ripple analysis method. Netw. Comput. Neural Syst. 7(3), 439–476 (1996)

    Article  MATH  Google Scholar 

  55. B. Shannon, K. Paliwal, Role of phase estimation in speech enhancement, in Proceedings of International Conference on Spoken Language Processing (ICSLP), Pittsburgh, PA, Sep 2006, pp. 1423–1426

    Google Scholar 

  56. S. Sheft, W. Yost, Temporal integration in amplitude modulation detection. J. Acoust. Soc. Am. 88(2), 796–805 (1990)

    Article  Google Scholar 

  57. S. So, K. Paliwal, Modulation-domain Kalman filtering for single-channel speech enhancement. Speech Commun. 53(6), 818–829 (2011)

    Article  Google Scholar 

  58. J. Sohn, N.S. Kim, W. Sung, A statistical model-based voice activity detection. IEEE Signal Process. Lett. 6(1), 1–3 (1999)

    Article  Google Scholar 

  59. H. Steeneken, T. Houtgast, A physical method for measuring speech-transmission quality. J. Acoust. Soc. Am. 67(1), 318–326 (1980)

    Article  Google Scholar 

  60. J. Thompson, L. Atlas, A non-uniform modulation transform for audio coding with increased time resolution, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Process (ICASSP), vol. 5, Hong Kong, Apr 2003, pp. 397–400

    Google Scholar 

  61. V. Tyagi, I. McCowan, H. Misra, H. Bourland, Mel-cepstrum modulation spectrum (MCMS) features for robust ASR, in Proceedings of IEEE Workshop Automatic Speech Recognition and Understanding (ASRU), St. Thomas, VI, Dec 2003

    Google Scholar 

  62. P. Vary, R. Martin, Digital Speech Transmission: Enhancement, Coding and Error Concealment (Wiley, West Sussex, 2006)

    Book  Google Scholar 

  63. N. Virag, Single channel speech enhancement based on masking properties of the human auditory system. IEEE Trans. Speech Audio Process. 7(2), 126–137 (1999)

    Article  Google Scholar 

  64. S.V. Vuuren, H. Hermanshy, On the importance of components of the modulation spectrum for speaker verification, in Proceedings of International Conference on Spoken Language Processing (ICSLP), vol. 7, Sydney, Nov 1998, pp. 3205–3208

    Google Scholar 

  65. D. Wang, J. Lim, The unimportance of phase in speech enhancement. IEEE Trans. Acoust. Speech Signal Process. 30(4), 679–681 (1982)

    Article  Google Scholar 

  66. X. Xiao, E. Chng, H. Li, Normalization of the speech modulation spectra for robust speech recognition, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Process (ICASSP), vol. 4, Monolulu, HI, Apr 2007, pp. 1021–1024

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kuldip Paliwal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media New York

About this chapter

Cite this chapter

Paliwal, K., Schwerin, B. (2015). Modulation Processing for Speech Enhancement. In: Ogunfunmi, T., Togneri, R., Narasimha, M. (eds) Speech and Audio Processing for Coding, Enhancement and Recognition. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-1456-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-1456-2_10

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4939-1455-5

  • Online ISBN: 978-1-4939-1456-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics