Modulation Processing for Speech Enhancement

Paliwal, Kuldip; Schwerin, Belinda

doi:10.1007/978-1-4939-1456-2_10

Kuldip Paliwal⁴ &
Belinda Schwerin⁴

1948 Accesses
1 Citations

Abstract

Many of the traditionally speech enhancement methods reduce noise from corrupted speech by processing the magnitude spectrum in a short-time Fourier analysis-modification-synthesis (AMS) based framework. More recently, use of the modulation domain for speech processing has been investigated, however early efforts in this direction did not account for the changing properties of the modulation spectrum across time. Motivated by this and evidence of the significance of the modulation domain, we investigated the processing of the modulation spectrum on a short-time basis for speech enhancement. For this purpose, a modulation domain-based AMS framework was used, in which the trajectories of each acoustic frequency bin were processed frame-wise in a secondary AMS framework. A number of different enhancement algorithms were investigated for the enhancement of speech in the short-time modulation domain. These included spectral subtraction and MMSE magnitude estimation. In each case, the respective algorithm was used to modify the short-time modulation magnitude spectrum within the modulation AMS framework. Here we review the findings of this investigation, comparing the quality of stimuli enhanced using these modulation based approaches to stimuli enhanced using corresponding modification algorithms applied in the acoustic domain. Results presented show modulation domain based approaches to have improved quality compared to their acoustic domain counterparts. Further, MMSE modulation magnitude estimation (MME) is shown to have improved speech quality compared to Modulation spectral subtraction (ModSSub) stimuli. MME stimuli are found to have good removal of noise without the introduction of musical noise, problematic in spectral subtraction based enhancement. Results also show that ModSSub has minimal musical noise compared to acoustic Spectral subtraction, for appropriately selected modulation frame duration. For modulation domain based methods, modulation frame duration is shown to be an important parameter, with quality generally improved by use of shorter frame durations. From the results of experiments conducted, it is concluded that the short-time modulation domain provides an effective alternative to the short-time acoustic domain for speech processing. Further, that in this domain, MME provides effective noise suppression without the introduction of musical noise distortion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that for references made to the magnitude, phase or complex spectra throughout this text, the STFT modifier is implied unless otherwise stated. The acoustic and modulation modifiers are also included to disambiguate between acoustic and modulation domains.

References

J. Allen, L. Rabiner, A unified approach to short-time Fourier analysis and synthesis. Proc. IEEE 65(11), 1558–1564 (1977)
Article Google Scholar
T. Arai, M. Pavel, H. Hermansky, C. Avendano, Intelligibility of speech with filtered time trajectories of spectral envelopes, in Proceedings of International Conference on Spoken Language Processing (ICSLP), Philadelphia, PA, Oct 1996, pp. 2490–2493
Google Scholar
L. Atlas, Modulation spectral transforms: application to speech separation and modification. Tech. Rep. 155. IEICE, University of Washington, Washington, WA (2003)
Google Scholar
L. Atlas, S. Shamma, Joint acoustic and modulation frequency. EURASIP J. Appl. Signal Process. 2003(7), 668–675 (2003)
Article MATH Google Scholar
L. Atlas, M. Vinton, Modulation frequency and efficient audio coding, in Proceedings of the SPIE The International Society for Optical Engineering, vol. 4474 (2001), pp. 1–8
Google Scholar
S. Bacon, D. Grantham, Modulation masking: effects of modulation frequency, depth, and phase. J. Acoust. Soc. Am. 85(6), 2575–2580 (1989)
Article Google Scholar
M. Berouti, R. Schwartz, J. Makhoul, Enhancement of speech corrupted by acoustic noise, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 4., Washington, DC, Apr 1979, pp. 208–211
Google Scholar
S. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27(2), 113–120 (1979)
Article Google Scholar
O. Cappe, Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE Trans. Speech Audio Process. 2(2), 345–349 (1994)
Article Google Scholar
I. Cohen, Relaxed statistical model for speech enhancement and a priori SNR estimation. IEEE Trans. Speech Audio Process. 13(5), 870–881 (2005)
Article Google Scholar
D. Depireux, J. Simon, D. Klein, S. Shamma, Spectrotemporal response field characterization with dynamic ripples in ferrect primary auditory cortex. J. Neurophysiol. 85(3), 1220–1234 (2001)
Google Scholar
R. Drullman, J. Festen, R. Plomp, Effect of reducing slow temporal modulations on speech reception. J. Acoust. Soc. Am. 95(5), 2670–2680 (1994)
Article Google Scholar
R. Drullman, J. Festen, R. Plomp, Effect of temporal envelope smearing on speech reception. J. Acoust. Soc. Am. 95(2), 1053–1064 (1994)
Article Google Scholar
Y. Ephraim, D. Malah, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)
Article Google Scholar
Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 33(2), 443–445 (1985)
Article Google Scholar
T. Falk, S. Stadler, W.B. Kleijn, W.-Y. Chan, Noise suppression based on extending a speech-dominated modulation band, in Proceedings of the ISCA Conference of the International Speech Communication Association (INTERSPEECH), Antwerp, Aug 2007, pp. 970–973
Google Scholar
R. Goldsworthy, J. Greenberg, Analysis of speech-based speech transmission index methods with implications for nonlinear operations. J. Acoust. Soc. Am. 116(6), 3679–3689 (2004)
Article Google Scholar
R. Gray, A. Buzo, A. Gray, Y. Matsuyama, Distortion measures for speech processing. IEEE Trans. Acoust. Speech Signal Process. 28(4), 367–376 (1980)
Article MATH Google Scholar
S. Greenberg, T. Arai, The relation between speech intelligibility and the complex modulation spectrum, in Proceedings of the ISCA European Conference on Speech Communication and Technology (EUROSPEECH), Aalborg, Sept 2001, pp. 473–476
Google Scholar
D. Griffin, J. Lim, Signal estimation from modified short-time Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 32(2), 236–243 (1984)
Article Google Scholar
H. Hermansky, N. Morgan, RASTA processing of speech. IEEE Trans. Speech Audio Process. 2, 578–589 (1994)
Article Google Scholar
H. Hermansky, E. Wan, C. Avendano, Speech enhancement based on temporal processing, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, Detroit, MI, May 1995, pp. 405–408
Google Scholar
T. Houtgast, H. Steeneken, A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria. J. Acoust. Soc. Am. 77(3), 1069–1077 (1985)
Article Google Scholar
X. Huang, A. Acero, H. Hon, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development (Prentice Hall, Upper Saddle River, 2001)
Google Scholar
S. Kamath, P. Loizou, A multi-band spectral subtraction method for enhancing speech corrupted by colored noise, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2002)
Google Scholar
N. Kanedera, T. Arai, H. Hermansky, M. Pavel, On the relative importance of various components of the modulation spectrum for automatic speech recognition. Speech Commun. 28(1), 43–55 (1999)
Article Google Scholar
D. Kim, A cue for objective speech quality estimation in temporal envelope representations. IEEE Signal Process. Lett. 11(10), 849–852 (2004)
Article Google Scholar
D. Kim, Anique: an auditory model for single-ended speech quality estimation. IEEE Trans. Speech Audio Process. 13(5), 821–831 (2005)
Article Google Scholar
B. Kingsbury, N. Morgan, S. Greenberg, Robust speech recognition using the modulation spectrogram. Speech Commun. 25(1–3), 117–132 (1998)
Article Google Scholar
T. Kinnunen, Joint acoustic-modulation frequency for speaker recognition, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1. Toulouse, May 2006, pp. 665–668
Google Scholar
T. Kinnunen, K. Lee, H. Li, Dimension reduction of the modulation spectrogram for speaker verification, in Proceedings of ISCA Speaker and Language Recognition Workshop (ODYSSEY), Stellenbosch, Jan 2008
Google Scholar
N. Kowalski, D. Depireux, S. Shamma, Analysis of dynamic spectra in ferret primary auditory cortex: I. Characteristics of single unit responses to moving ripple spectra. J. Neurophysiol. 76(5), 3503–3523 (1996)
Google Scholar
J. Lim, A. Oppenheim, Enhancement and bandwidth compression of noisy speech. Proc. IEEE 67(12), 1586–1604 (1979)
Article Google Scholar
P. Loizou, Speech Enhancement: Theory and Practice (Taylor and Francis, Boca Raton, 2007)
Google Scholar
X. Lu, S. Matsuda, M. Unoki, S. Nakamura, Temporal contrast normalization and edge-preserved smoothing of temporal modulation structures of speech for robust speech recognition. Speech Commun. 52(1), 1–11 (2010)
Article Google Scholar
J. Lyons, K. Paliwal, Effect of compressing the dynamic range of the power spectrum in modulation filtering based speech enhancement, in Proceedings of ISCA Conference of the International Speech Communication Association (INTERSPEECH), Brisbane, Sep 2008, pp. 387–390
Google Scholar
N. Malayath, H. Hermansky, S. Kajarekar, B. Yegnanarayana, Data-driven temporal filters and alternatives to GMM in speaker verification. Digit. Signal Proces. 10(1–3), 55–74 (2000)
Article Google Scholar
R. McAulay, M. Malpass, Speech enhancement using a soft-decision noise suppression filter. IEEE Trans. Acoust. Speech Signal Process. 28(2), 137–145 (1980)
Article Google Scholar
N. Mesgarani, S. Shamma, Speech enhancement based on filtering the spectrotemporal modulations, in Proceedings of IEEE International Conference Acoustics Speech and Signal Processing (ICASSP), vol. 1, Philadelphia, PA, Mar 2005, pp. 1105–1108
Google Scholar
C. Nadeu, P. Pachés-Leal, B.-H. Juang, Filtering the time sequences of spectral parameters for speech recognition. Speech Commun. 22(4), 315–332 (1997)
Article Google Scholar
K. Paliwal, B. Schwerin, K. Wójcicki, Role of modulation magnitude and phase spectrum towards speech intelligibility. Speech Commun. 53(3), 327–339 (2011)
Article Google Scholar
K. Paliwal, B. Schwerin, K. Wójcicki, Speech enhancement using minimum mean-square error short-time spectral modulation magnitude estimator. Speech Commun. 54(2), 282–305 (2012)
Article Google Scholar
K. Paliwal, K. Wójcicki, Effect of analysis window duration on speech intelligibility. IEEE Signal Process. Lett. 15, 785–788 (2008)
Article Google Scholar
K. Paliwal, K. Wójcicki, B. Schwerin, Single-channel speech enhancement using spectral subtraction in the short-time modulation domain. Speech Commun. 52(5), 450–475 (2010)
Article Google Scholar
K. Payton, L. Braida, A method to determine the speech transmission index from speech waveforms. J. Acoust. Soc. Am. 106(6), 3637–3648 (1999)
Article Google Scholar
J. Picone, Signal modeling techniques in speech recognition. Proc. IEEE 81(9), 1215–1247 (1993)
Article Google Scholar
S. Quackenbush, T. Barnwell, M. Clements, Objective Measures of Speech Quality (Prentice Hall, Englewood Cliffs, 1988)
Google Scholar
T. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice (Prentice Hall, Upper Saddle River, 2002)
Google Scholar
L. Rabiner, R. Schafer, Theory and Applications of Digital Speech Processing (Pearson Higher Education, Upper Saddle River, 2011)
Google Scholar
A. Rix, J. Beerends, M. Hollier, A. Hekstra, Perceptual Evaluation of Speech Quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. ITU-T Recommendation P.862 (2001)
Google Scholar
P. Scalart, J. Filho, Speech enhancement based on a priori signal to noise estimation, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Process (ICASSP), vol. 2. Atlanta, GA, May 1996, pp. 629–632
Google Scholar
C. Schreiner, J. Urbas, Representation of amplitude modulation in the auditory cortex of the cat: I. The anterior auditory field (AAF). Hear. Res. 21(3), 227–241 (1986)
Google Scholar
B. Schwerin, K. Paliwal, Using STFT real and imaginary parts of modulation signals for MMSE-based speech enhancement. Speech Commun. 58, 49–68 (2014)
Article Google Scholar
S. Shamma, Auditory cortical representation of complex acoustic spectra as inferred from the ripple analysis method. Netw. Comput. Neural Syst. 7(3), 439–476 (1996)
Article MATH Google Scholar
B. Shannon, K. Paliwal, Role of phase estimation in speech enhancement, in Proceedings of International Conference on Spoken Language Processing (ICSLP), Pittsburgh, PA, Sep 2006, pp. 1423–1426
Google Scholar
S. Sheft, W. Yost, Temporal integration in amplitude modulation detection. J. Acoust. Soc. Am. 88(2), 796–805 (1990)
Article Google Scholar
S. So, K. Paliwal, Modulation-domain Kalman filtering for single-channel speech enhancement. Speech Commun. 53(6), 818–829 (2011)
Article Google Scholar
J. Sohn, N.S. Kim, W. Sung, A statistical model-based voice activity detection. IEEE Signal Process. Lett. 6(1), 1–3 (1999)
Article Google Scholar
H. Steeneken, T. Houtgast, A physical method for measuring speech-transmission quality. J. Acoust. Soc. Am. 67(1), 318–326 (1980)
Article Google Scholar
J. Thompson, L. Atlas, A non-uniform modulation transform for audio coding with increased time resolution, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Process (ICASSP), vol. 5, Hong Kong, Apr 2003, pp. 397–400
Google Scholar
V. Tyagi, I. McCowan, H. Misra, H. Bourland, Mel-cepstrum modulation spectrum (MCMS) features for robust ASR, in Proceedings of IEEE Workshop Automatic Speech Recognition and Understanding (ASRU), St. Thomas, VI, Dec 2003
Google Scholar
P. Vary, R. Martin, Digital Speech Transmission: Enhancement, Coding and Error Concealment (Wiley, West Sussex, 2006)
Book Google Scholar
N. Virag, Single channel speech enhancement based on masking properties of the human auditory system. IEEE Trans. Speech Audio Process. 7(2), 126–137 (1999)
Article Google Scholar
S.V. Vuuren, H. Hermanshy, On the importance of components of the modulation spectrum for speaker verification, in Proceedings of International Conference on Spoken Language Processing (ICSLP), vol. 7, Sydney, Nov 1998, pp. 3205–3208
Google Scholar
D. Wang, J. Lim, The unimportance of phase in speech enhancement. IEEE Trans. Acoust. Speech Signal Process. 30(4), 679–681 (1982)
Article Google Scholar
X. Xiao, E. Chng, H. Li, Normalization of the speech modulation spectra for robust speech recognition, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Process (ICASSP), vol. 4, Monolulu, HI, Apr 2007, pp. 1021–1024
Google Scholar

Download references

Author information

Authors and Affiliations

Griffith School of Engineering, Nathan Campus, Griffith University, Brisbane, QLD, Australia
Kuldip Paliwal & Belinda Schwerin

Authors

Kuldip Paliwal
View author publications
You can also search for this author in PubMed Google Scholar
Belinda Schwerin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kuldip Paliwal .

Editor information

Editors and Affiliations

Dept. of Electrical Engineering, Santa Clara University, Santa Clara, California, USA
Tokunbo Ogunfunmi
School of EE&C Engineering, The University of Western Australia, Crawley, West Australia, Australia
Roberto Togneri
Qualcomm Inc., Santa Clara, California, USA
Madihally (Sim) Narasimha

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Paliwal, K., Schwerin, B. (2015). Modulation Processing for Speech Enhancement. In: Ogunfunmi, T., Togneri, R., Narasimha, M. (eds) Speech and Audio Processing for Coding, Enhancement and Recognition. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-1456-2_10

Download citation

DOI: https://doi.org/10.1007/978-1-4939-1456-2_10
Published: 18 September 2014
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-1455-5
Online ISBN: 978-1-4939-1456-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics