Skip to main content

Part of the book series: The Kluwer International Series in Engineering and Computer Science ((SECS,volume 341))

  • 205 Accesses

Summary

After presenting the basic principles of speech analysis, we focus on the mathematical techniques which constitute the foundations of most of the methods currently in use in speech processing, such as the Fourier transforms and the linear prediction analysis. Then, we review typical parameter sets generally proposed to encode the speech signal prior recognition. While these methods give a reasonable representation of speech spectra, they do not provide a very accurate temporal localization of a signal’s spectral components. Two classes of techniques having the potential to deal with this problem, such as time-frequency analyses and wavelets, are presented. Finally, we address the problem of robust speech analysis and give a brief overview of the fields of higher-order spectral analysis and auditory modeling, illustrating our presentation with recent applications of these techniques to speech processing. We conclude this chapter by mentioning the limits of standard analysis methods in the presence of noise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Alinat, P. (1973). Reconnaissance des Phonèmes au Moyen d’une Cochlée Artificielle. Ph.D. thesis. Université de Nice, Thèse de Docteur Ingénieur.

    Google Scholar 

  • Ambikairajah, E., Keane, M., Kilmartin, L., and Tattersall, G. (1993). The application of the wavelet transform for speech processing. In EUROSPEECH, pages 151–154.

    Google Scholar 

  • Atal, B. (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J. Acoust. Soc. Amer., 55:1304–1312.

    Article  Google Scholar 

  • Atal, B. and Hanauer, S. (1971). Speech analysis and synthesis by linear prediction of the speech wave. J. Acoust. Soc. Am., 50:637–655.

    Article  Google Scholar 

  • Atal, B. and Schroeder, M. (1968). Predictive coding of speech signals. In 6th International Congress on Acoustic, Tokyo, pages 21–28.

    Google Scholar 

  • Atlas, L., Loughlin, P., and Pitton, J. (1991). Truly nonstationary techniques for the analysis and display of voiced speech. In ICASSP, pages 433–436.

    Google Scholar 

  • Beet, S. (1990). Automatic speech recognition using a reduced auditory representation and position-tolerant discrimination. Computer Speech and Language, 4:17–33.

    Article  Google Scholar 

  • Beet, S., Powrie, H., Moore, R., and Tomlinson, M. (1988). Improved speech recognition using a reduced auditory representation. In ICASSP, pages 75–78.

    Google Scholar 

  • Bladon, A. (1985). Acoustic phonetics, auditory phonetics, speaker sex and speech recognition: A thread. In Fallside, F. and Woods, W. A., editors, Computer Speech Processing, pages 29–39. Prentice Hall International.

    Google Scholar 

  • Bladon, A. (1987). The auditory modelling dilemma, and a phonetic response. In Eleventh ICphS, pages 319–324.

    Google Scholar 

  • Blomberg, M., Carlson, R., Elenius, K., and Granström, B. (1984). Auditory models in isolated word recognition. In ICASSP, pages 17.9.1–17.9.4.

    Google Scholar 

  • Bregman, A. (1990). Auditory Scene Analysis. M.I.T. Press.

    Google Scholar 

  • Brown, G. and Cooke, M. (1995). Temporal synchronisation in a neural oscillator model of primitive auditory stream segregation. In IJCAI Workshop on Computational Auditory Scene Analysis.

    Google Scholar 

  • Burg, J. (1995). Maximum Entropy Spectral Analysis. Ph.D. thesis. Stanford University.

    Google Scholar 

  • Cadzow, J. (1980). High performance spectral estimation — a new ARMA method. IEEE Trans. ASSP, ASSP-28(5):524–529.

    Article  MathSciNet  Google Scholar 

  • Caelen, J. (1979). Un modèle d’oreille; analyse de la parole continue; reconnaissance phonémique. Université Paul Sabatier de Toulouse, Thèse d’Etat.

    Google Scholar 

  • Caelen, J. (1985). Space/time data-information in the ARIAL-project ear model. Speech Communication, 4:163–180.

    Article  Google Scholar 

  • Carlson, R. and Granström, B. (1982). Towards an auditory spectrogram. In Carlson, R. and Granström, B., editors, The Representation of Speech in the Peripheral Auditory System, pages 109–114. Elsevier Biomedical Press.

    Google Scholar 

  • Chester, D., Taylor, F., and Doyle, M. (1984). The Wigner distribution in speech processing applications. Journal of the Franklin Institute, 318:415–430.

    Article  Google Scholar 

  • Chistovich, L., al., (1982). Temporal processing of peripheral auditory patterns of speech. In Carlson, R. and Granström, B., editors, The Representation of Speech in the Peripheral Auditory System, pages 165–180. Elsevier Biomedical Press.

    Google Scholar 

  • Choi, H. and Williams, W. (1989). Improved time-frequency representation of multi-component signals using exponential kernels. IEEE Trans. ASSP, 37:862–871.

    Article  Google Scholar 

  • Claasen, T. and Mecklenbrauker, W. (1980). The Wigner distribution, a tool for time-frequency signal analysis. Part3: Relations with other time-frequency signal transformations. Philips J. Res., 35:373–389.

    Google Scholar 

  • Cohen, J. (1985). Application of an adaptive auditory model to speech recognition. In Workshop on Speech Recognition, Montréal, pages 8–9.

    Google Scholar 

  • Cohen, J. (1989a). Application of an auditory model to speech recognition. J. Acoust. Soc. Am., 85(6):2623–2629.

    Article  Google Scholar 

  • Cohen, L. (1966). Generalized phase-space distribution functions. Journal Math. Phys.,7(5):781–786.

    Article  Google Scholar 

  • Cohen, L. (1989b). Time-frequency distributions — A review. Proc. IEEE, 77(7):941–981.

    Article  Google Scholar 

  • Cooke, M. (1986). A computer model of peripheral auditory processing incorporating phase-locking, suppression, and adaptation effects. Speech Communication, 5(3–4):261–281.

    Article  MathSciNet  Google Scholar 

  • d’Alessandro, C. (1992). Speech analysis and synthesis using an auditory-based wavelet representation. In ETRW: Comparing Signal Representations, Sheffield, England, pages 31–38.

    Google Scholar 

  • Davis, S. and Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. ASSP, ASSP-28(4):357–366.

    Article  Google Scholar 

  • Delgutte, B. (1982). Some correlates of phonetic distinctions at the level of the auditory nerve. In Carlson, R. and Granström, B., editors, The representation of Speech in the Peripheral Auditory System, pages 131–149. Elsevier Biomedical Press.

    Google Scholar 

  • Delgutte, B. (1984). Codage de la Parole dans le Nerf Auditif. Ph.D. thesis, Université Pierre et Marie Curie, Paris 6.

    Google Scholar 

  • Delgutte, B. (1986). Comment on the use of peripheral auditory models in speech recognition. In Perkell, J. S. and Klatt, D. H., editors, Variance and Variability in Speech Processes, pages 320–323. Lawrence Erlbaum Associates.

    Google Scholar 

  • Dolmazon, J. (1982). Representation of speech-like sounds in the peripheral auditory system in light of a model. In Carlson, R. and Granström, B., editors, The Representation of Speech in the Peripheral Auditory System, pages 151–164. Elsevier Biomedical Press.

    Google Scholar 

  • Ephraim, Y., Wilpon, J., and Rabiner, L. (1987). A linear predictive front-end processor for speech recognition in noisy environments. In ICASSP, pages 1324–1327.

    Google Scholar 

  • Favero, R. and Gurgen, F. (1994). Using wavelet dyadic grids and neural networks for speech recognition. In ICSLP, pages 1539–1542.

    Google Scholar 

  • Fineberg, A. and Yu, K. (1994). A time-frequency analysis technique for speech recognition signal processing. In ICSLP, pages 1615–1618.

    Google Scholar 

  • Gao, Y., Huang, T., Chen, S., and Haton, J.-P. (1992). Auditory model-based speech processing. In ICSLP, pages 73–76.

    Google Scholar 

  • Gao, Y., Huang, T., and Haton, J.-P. (1993). Central auditory model for spectral processing. In ICASSP, pages 704–707.

    Google Scholar 

  • Garudradi, H. (1988). Identification of invariant acoustic cues in stop consonants using the Wigner distribution. Ph.D. thesis. University of British Columbia.

    Google Scholar 

  • Gerard, C. and Baudry, M. (1993). Parametrization centiseconde du signal de parole en milieu bruité. In Haton, J.-P., editor. Actes du Séminaire Reconnaissance Automatique de la Parole. GDR-PRC Communication Homme-Machine.

    Google Scholar 

  • Gersho, A. and Cuperman, V. (1983). Vector quantization: A pattern-matching technique for speech coding. IEEE Comm. Magazine, 21(9): 15–21.

    Article  Google Scholar 

  • Gersho, A. and Shoham, Y. (1984). Hierarchical vector quantization of speech with dynamic codebook allocation. In ICASSP, pages 10.7.1–10.7.4.

    Google Scholar 

  • Ghitza, O. (1986). Speech analysis/synthesis based on matching the synthesized and the original representations in the auditory nerve level. In ICASSP, pages 1995–1998.

    Google Scholar 

  • Ghitza, O. (1987). Robustness against noise: The role of timing-synchrony measurement. In ICASSP, pages 2372–2375.

    Google Scholar 

  • Ghitza, O. (1988). Auditory neural feedback as a basis for speech processing. In ICASSP, pages 91–94.

    Google Scholar 

  • Gray, R. (1984). Vector quantization. IEEEASSP Magazine, 1:4–29.

    Article  Google Scholar 

  • Green, P., Cooke, M., and Crawford, M. (1995). Auditory scene analysis and hidden Markov model recognition of speech in noise. In ICASSP, pages 401–404.

    Google Scholar 

  • Greenberg, S. (1988a). The ear as a speech analyzer. Journal of Phonetics, 15(4): 139–149.

    Google Scholar 

  • Greenberg, S. (1988b). A special issue on the representation of speech in the auditory periphery. Journal of Phonetics, 15(4).

    Google Scholar 

  • Hanson, B. and Applebaum, T. (1993). Subband or cepstral domain filtering for recognition of Lombard and channel-distorted speech. In ICASSP, pages II79–II.82.

    Google Scholar 

  • Hanson, B. and Wakita, H. (1986). Spectral slope based distortion measures for all-pole models of speech. In ICASSP, pages 757–780.

    Google Scholar 

  • Hermansky, H. (1987). An efficient speaker-independent automatic speech recognition by simulation of some properties of human auditory perception. In IC-ASSP, pages 1159–1162.

    Google Scholar 

  • Hermansky, H., Hanson, B., and Wakita, H. (1985). Low-dimensional representation of vowels based on all-pole modeling in the psychophysical domain. Speech Communication, 4(1–3): 181–187.

    Article  Google Scholar 

  • Hermansky, H., Morgan, N., Bayya, A., and Kohn, P. (1991). Compensation for the effect of the communication channel in auditory-like analysis of speech (RAS-TA-PLP). In EUROSPEECH, pages 1367–1370.

    Google Scholar 

  • Howitt, A. (1987). Application of the Wigner distribution to speech analysis. S.M. Thesis, Massachusetts Institute of Technology.

    Google Scholar 

  • Huber, P., Kleiner, B., Gasser, T., and Dumermuth, G. (1971). Statistical methods for investigating phase relations in stationary stochastic processes. IEEE Trans, on Audio Electroacoustics, pages 78–86.

    Google Scholar 

  • Hunt, M. and Lefèbvre, C. (1986). Speech recognition using a cochlear model. In ICASSP, pages 1979–1982.

    Google Scholar 

  • Hunt, M. and Lefèbvre, C. (1988). Speaker dependent and independent speech recognition experiments with an auditory model. In ICASSP, pages 215–218.

    Google Scholar 

  • Hwang, W.-L. and Mallat, S. (1992). Singularities and noise discrimination with wavelets. In ICASSP, pages 377–380.

    Google Scholar 

  • Itakura, F. and Saito, S. (1968). Analysis synthesis telephony based upon the maximum likelihood method. In Kohasi, Y., editor, 6th International Congress on Acoustics, Tokyo, pages C-5–5.

    Google Scholar 

  • Itakura, F. and Umezaki, T. (1987). Distance measure for speech recognition based on the smoothed group delay spectrum. In ICASSP, pages 1257–1280.

    Google Scholar 

  • Juang, B. H., Rabiner, L., and Wilpon, J. (1986). On the use of bandpass liftering in speech recognition. In ICASSP, pages 765–768.

    Google Scholar 

  • Junqua, J.-C. (1987). Evaluation of ASR front-ends in speaker-dependent and speaker-independent recognition. J. Acoust. Soc. Am., 81 S1:S93.

    Article  Google Scholar 

  • Junqua, J.-C. (1989). Toward robustness in isolated-word automatic speech recognition. Ph.D. thesis. University of Nancy I, STL Monograph.

    Google Scholar 

  • Junqua, J.-C., Wakita, H., and Hermansky, H. (1993). Evaluation and optimization of perceptually-based front-end. IEEE Trans, on Speech and Audio Processing,1(1):39–48.

    Article  Google Scholar 

  • Kadambe, S. and Boudreaux-Bartels, G. (1991). A comparison of wavelet functions for pitch detection of speech signals. In ICASSP, pages 449–452.

    Google Scholar 

  • Karjalainen, M. (1987). Auditory models for speech processing. In Eleventh ICphS, pages 2.11–2.20.

    Google Scholar 

  • Klatt, D. (1982). Prediction of perceived phonetic distance from critical-band spectra: A first step. In ICASSP, pages 1278–1281.

    Google Scholar 

  • Koljonen, J. and Karjalainen, M. (1984). Use of computational psychoacoustical models in speech processing: Coding and objective performance evaluation. In ICASSP, pages 1.9.1–1.9.4.

    Google Scholar 

  • Kraniauskas, P. (1994). A plain man’s guide to the FFT. IEEE Signal Processing Magazine, 11(2):24–35.

    Article  Google Scholar 

  • Leung, S., Wong, O., and Lai, K. (1991). Decomposition of the LPC excitation using wavelet functions. In EUROSPEECH, pages 1327–1331.

    Google Scholar 

  • Lim, J. (1978). Estimation of LPC coefficients from speech waveforms degraded by additive random noise. In ICASSP, pages 599–601.

    Google Scholar 

  • Linde, Y., Buzo, A., and Gray, R. (1980). An algorithm for vector quantizer design. IEEE Trans, on Communication, 28(l):84–95.

    Article  Google Scholar 

  • Lyon, R. F. (1983). A computational model of binaural localization and separation. In ICASSP, pages 1148–1151.

    Google Scholar 

  • Makhoul, J. (1973). Spectral analysis of speech by linear prediction. IEEE Trans. AS-SP, ASSP-21(3): 140–148.

    Google Scholar 

  • Makhoul, J. (1974). Selective linear prediction and analysis-by-synthesis in speech analysis. Technical Report 2578, Bolt Beranek and Newman Inc., Cambridge, Mass.

    Google Scholar 

  • Makhoul, J. (1975). Linear prediction: A tutorial review. IEEE Trans. ASSP, ASSP-63:561,580.

    Google Scholar 

  • Makhoul, J. and Schwartz, R. (1985). Ignorance modeling: Comments from performing fine phonetic distinctions, r. cole, r. m. stern, and m. j. lasry. In Perkell, J. and Klatt, D., editors, Variability and Invariance in Speech Processes. Lawrence Erlbaum Associates.

    Google Scholar 

  • Mansour, D. and Juang, B. (1988). The short-time modified coherence representation and its application for noisy speech recognition. In ICASSP, pages 525–528.

    Google Scholar 

  • Markel, J. and Gray, A. (1976). Linear Prediction of Speech. Springer-Verlag.

    Book  MATH  Google Scholar 

  • Masgrau, E., Salavedra, J., Moreno, A., and Ardanuy, A. (1992). Speech enhancement by adaptive Wiener filtering based on cumulant AR modeling. In ETRW: Speech Processing in Adverse Conditions, pages 143–146.

    Google Scholar 

  • Massoro, D. (1987). Speech Perception by Ear and Eye. Lawrence Erlbaum Associates.

    Google Scholar 

  • Moreno, A. and Fonollosa, J. (1992a). Cumulant-based voicing decision in noise corrupted speech. In ICSLP, pages 531–534.

    Google Scholar 

  • Moreno, A. and Fonollosa, J. (1992b). Pitch determination of noisy speech using higher order statistics. In ICASSP, pages 133–136.

    Google Scholar 

  • Moreno, A., Tortola, S., Vidal, J., and Fonollosa, J. (1995). New HOS-based parameter estimation methods for speech recognition in noisy environments. In ICASSP, pages 429–432.

    Google Scholar 

  • Nikias, C. and Mendel, J. (1991). Higher-order spectral analysis. In ICASSP, Tutorial 4.

    Google Scholar 

  • Nikias, C. and Raghuveer, M. (1987). Bispectrum estimation: A digital signal processing framework. Proc. IEEE, 75(7):869–891.

    Article  Google Scholar 

  • Ohshima, Y. and Stern, R. (1994). Environmental robustness in automatic speech recognition using physiologically-motivated signal processing. In ICSLP, pages 1347–1350.

    Google Scholar 

  • Oppenheim, A. and Schafer, R. (1975). Digital Signal Processing. Prentice-Hall.

    MATH  Google Scholar 

  • Paliwal, K. (1988). A study of line spectrum pair frequencies for speech recognition. In ICASSP, pages 485–488.

    Google Scholar 

  • Paliwal, K. (1992). Dimensionality reduction of the enhanced feature set for HMM speech recognizer. Digital Signal Processing, 2:157–173.

    Article  Google Scholar 

  • Paliwal, K. and Sondhi, M. (1991). Recognition of noisy speech using cumulant-based linear prediction analysis. In ICASSP, pages 429–432.

    Google Scholar 

  • Park, S.-W. (1994). Speech compression using ARMA model and wavelet transform. In ICASSP, pages 209–212.

    Google Scholar 

  • Picone, J. (1993). Signal modeling techniques in speech recognition. Proc. IEEE, 81(9): 1215–1247.

    Article  Google Scholar 

  • Rabiner, L. and Juang, B.-H. (1993). Fundamentals of Speech Recognition. Prentice Hall.

    Google Scholar 

  • Rabiner, L., Pan, K., and Soong, F. (1984). On the performance of isolated word speech recognizers using vector quantization and temporal energy contours. AT&T Technical Journal, 63(7): 1245–1260.

    Google Scholar 

  • Rabiner, L. and Schafer, R. (1978). Digital Processing of Speech Signals. Prentice-Hall.

    Google Scholar 

  • Raghuveer, M. and Nikias, C. (1985). Bispectrum estimation: A parametric approach. IEEE Trans. ASSP, ASSP-33(4): 1213–1230.

    Article  Google Scholar 

  • Rioul, O. and Vetterli, M. (1991). Wavelets and signal processing. IEEE Signal Processing Magazine, pages 14–38.

    Google Scholar 

  • Rupert, A., Caspary, D., and Moushegian, G. (1977). Response characteristics of cochlear nucleus neurons to vowel sounds. Ann. Otol., 86:37–48.

    Google Scholar 

  • Sambur, M. and Jayant, N. (1976). LPC analysis/synthesis from speech inputs containing quantizing noise or additive noise. IEEE Trans. ASSP, ASSP-24(6):488–494.

    Article  Google Scholar 

  • Sambur, M. and Rabiner, L. (1975). A speaker-independent digit-recognition system. Bell Syst. Tech. J., 54:81–102.

    Google Scholar 

  • Schwartz, J. (1981). Apport de la psychoacoustique à la modélisation du système auditif chez l’homme. Ph.D. thesis. Université de Grenoble, Thèse de TI.N.P de Grenoble.

    Google Scholar 

  • Schwartz, R. and Makhoul, J. (1975). Where the phonemes are: Dealing with ambiguity in acoustic-phonetic recognition. IEEE Trans. ASSP, ASSP-23.50–53.

    Article  Google Scholar 

  • Seetharaman, S. and Jernigan, M. (1988). Speech signal reconstruction based on higher order spectra. In ICASSP, pages 703–706.

    Google Scholar 

  • Seneff, S. (1984). Pitch and spectral estimation of speech based on auditory synchrony model. In ICASSP, pages 36.2–36.5.

    Google Scholar 

  • Seneff, S. (1986). A computational model for the peripheral auditory system: Application to speech recognition research. In ICASSP, pages 1983–1986.

    Google Scholar 

  • Seneff, S. (1988). A joint synchrony/mean-rate model of auditory speech processing. Journal of Phonetics, 16(l):55–76.

    Google Scholar 

  • Shamma, S. (1986). Encoding the acoustic spectrum in the spatio-temporal responses of the auditory nerve. In Moore, B.C. J. and Patterson, R. D., editors, Auditory Frequency Selectivity, pages 289–296. New York, Plenum.

    Google Scholar 

  • Shamma, S. (1988). The acoustic features of speech sounds in a model of auditory processing: Vowels and voiceless fricatives. Journal of Phonetics, 16:77–91.

    Google Scholar 

  • Steiglitz, K. (1976). On the simultaneous estimation of poles and zeros in speech analysis. IEEE Trans. ASSP, ASSP-25:194–202.

    Google Scholar 

  • Teolis, A. and Benedetto, J. (1994). Noise suppression using a wavelet model. In ICASSP, pages 17–20.

    Google Scholar 

  • Tierney, J. (1980). A study of LPC analysis of speech in additive noise. IEEE Trans. ASSP, ASSP-28(4).

    Google Scholar 

  • Van Alphen, P. and Pols, L. (1991). Comparing various feature vectors in automatic speech recognition. In EUROSPEECH, pages 533–536.

    Google Scholar 

  • Vetterli, M. and Herley, C. (1990). Wavelets and filter banks: Relationships and new results. In ICASSP, pages 1723–1726.

    Google Scholar 

  • Vidal, J., Masgrau, E., Moreno, A., and Fonollosa, J. (1992). Speech analysis using higher order statistics. In ETRW: Comparing Signal Representations, Sheffield, England, pages 391–396.

    Google Scholar 

  • Wakita, H. (1973). Direct estimation of the vocal tract shape by inverse filtering of acoustic speech waveforms. IEEE Trans. ASSP, AU-21(5):417–427.

    Google Scholar 

  • Wakita, H. (1981). Linear prediction voice synthesizers. Speech Tech., Fall, pages 17–22.

    Google Scholar 

  • Wakita, H. and Zhao, Y. (1992). On the time-frequency display of speech signals using a generalized time-frequency representation with a cone-shaped kernel. In ETRW: Comparing Signal Representations, Sheffield, England, pages 401–408.

    Google Scholar 

  • Wells, B. (1985). Voiced/unvoiced decision based on the bispectrum. In ICASSP, pages 1589–1592.

    Google Scholar 

  • Wigner, E. (1932). On the quantum correction for thermodynamic equilibrium. Physical Review, 40:749–759.

    Article  MATH  Google Scholar 

  • Wilde, S. and Curtis, K. (1992). The wavelet transform for speech analysis. In ICSLP, pages 1621–1624.

    Google Scholar 

  • Wilpon, J. (1989). A study on the effects of telephone transmission noise on speaker-independent recognition. In Lea, W., editor, Towards Robustness in Speech Recognition, pages 190–206. Speech Science Publications.

    Google Scholar 

  • Wokurek, M., Rubin, G., and Hlawatsch, F. (1987). Wigner distribution — a new method for high resolution time-frequency analysis of speech signals. In Eleventh ICphS, pages 44–47.

    Google Scholar 

  • Young, E. and Sachs, M. (1979). Representation of steady-state vowels in the temporal aspects of the discharges patterns of populations of auditory-nerve fibers. J. Acoust. Soc. Am., 66:1381–1403.

    Article  Google Scholar 

  • Zhao, Y., Atlas, L., and Marks, R. (1990). The use of cone-shaped kernels for generalized time-frequency representations of nonstationary signals. IEEE Trans. ASSP, ASSP-38(7):1084–1091.

    Article  Google Scholar 

  • Zwicker, E. and Scharf, B. (1965). A model of loudness summation. Psychological Review, 72(l):3–26.

    Article  Google Scholar 

  • Zwicker, E. and Terhardt, E. (1979). Automatic speech recognition using psychoa-coustic models. J. Acoust. Soc. Am., 65(2).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Kluwer Academic Publishers

About this chapter

Cite this chapter

Junqua, JC., Haton, JP. (1996). Background on Speech Analysis. In: Robustness in Automatic Speech Recognition. The Kluwer International Series in Engineering and Computer Science, vol 341. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1297-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4613-1297-0_2

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4612-8555-7

  • Online ISBN: 978-1-4613-1297-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics