Skip to main content

Signal Analysis and Modelling for Speech Processing

  • Chapter
Signal Analysis and Prediction

Part of the book series: Applied and Numerical Harmonic Analysis ((ANHA))

Abstract

This chapter presents a survey of standard and advanced methods for the analysis and modelling speech signals. First it introduces several speech processing functions as part of voice communication systems technology and proceeds to a brief description of human speech production. Prom this, a two-tier physical model of speech emerges which embraces the speech organ movements at the articulatory tier and the coupled aerodynamic flow and sound propagation at the aero-acoustic tier. Both of these physical tiers appear as separate components in most computational speech signal models. Their discussion addresses both the standard view of linear short-time stationarity and more advanced concepts from non-stationary processes (underspread processes, cyclostationarity) and non-linear systems (neural networks, non-linear oscillators).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H. D. I. Abarbanel, R. Brown, J. J. SiDorowich, and L. Sh. Tsimring. The analysis of observed chaotic data in physical systems. Rev. Mod. Phys., 65(4):1331–1392, 1993.

    Article  MathSciNet  Google Scholar 

  2. B. S. Atal. Efficient coding of LPC parameters by temporal decomposition. In Proc. Int. Conf. Acoust. Speech Sign. Process., pp. 81-84, Boston, MA, 1983.

    Google Scholar 

  3. B. S. Atal and S. L. Hanauer. Speech analysis and synthesis by linear prediction of the speech wave. J. Acoust. Soc. Am., 50(2(Part2)):637–655, 1971.

    Article  Google Scholar 

  4. J. S. Bay and H. Hemami. Modelling of a neural pattern generator with coupled nonlinear oscillators. IEEE Trans. Biomed. Eng., BME-34(4):297–306, 1987.

    Article  Google Scholar 

  5. A. Benveniste. Design of adaptive algorithms for the tracking of time-varying systems. Int. J. Adapt. Control and Sign. Process., 1:3–29, 1987.

    Article  MATH  Google Scholar 

  6. H.-P. Bernhard. Sprachsignalanalyse mit Phasenraummethoden (Analysis of speech signals with phase space methods, in German). In Fortschritte der Akustik — DAGA’95, pp. 1015-1018. Deutsche Gesellschaft für Akustik, Oldenburg, Germany, 1995.

    Google Scholar 

  7. H.-P. Bernhard. The Mutual Information Function and its Application to Signal Processing. Ph. D. thesis, Vienna University of Technology, Vienna, Austria, 1997.

    Google Scholar 

  8. H.-P. Bernhard and G. Kubin. Speech production and chaos. In Proc. Xllth Int. Congr. Phonetic Sci., pp. 394–397, Aix-en-Provence, France, Aug. 1991.

    Google Scholar 

  9. H.-P. Bernhard and G. Kubin. A fast mutual information calculation algorithm. In M. J. J. Holt et al., eds., Signal Processing VII: Theories and Applications. 1:50–53. Elsevier, Amsterdam, 1994.

    Google Scholar 

  10. F. Bimbot et al. Temporal decomposition and acoustic-phonetic decoding of speech. In Proc. Int. Conf. Acoust. Speech Sign. Process., pp. 315-318, New York, 1988.

    Google Scholar 

  11. F. Bimbot, G. Chollet, and A. Paolini, eds. Special section on automatic speaker recognition, identification and verification. Speech Commun., 17(1-2), 1995.

    Google Scholar 

  12. F. Bimbot et al. Standard and target driven AR-vector models for speech analysis and speaker recognition. In Proc. Int. Conf. Acoust. Speech Sign. Process., II-5-II-8. San Francisco, CA, 1992.

    Google Scholar 

  13. M. Birgmeier. Kalman-Trained Neural Networks for Signal Processing Applications. Doctoral dissertation, Vienna University of Technology, Vienna, Austria, 1996.

    Google Scholar 

  14. M. Birgmeier. Nonlinear prediction of speech signals using radial basis function networks. In Proc. VIII Europ. Signal Process. Conf., EUSIPCO’96, pp. 459-462, Trieste, Italy, 1996.

    Google Scholar 

  15. M. Birgmeier, H.-P. Bernhard, and G. Kubin. Nonlinear long-term prediction of speech signals. In Proc. Int. Conf. Acoust. Speech Sign. Process., pp. 1283-1286, Munich, Germany, 1997.

    Google Scholar 

  16. H. Bölcskei and F. Halwatsch. Discrete Zak transforms, polyphase transforms, and applications. IEEE Trans. Signal Process., 45(4), 1997.

    Google Scholar 

  17. M. Casdagli et al. Nonlinear modelling of chaotic time series: theory and applications. In J. H. Kim and J. Stringer, eds., Applied Chaos, pp. 335-380. Wiley, New York, 1992.

    Google Scholar 

  18. P.R. Cook. Noise and aperiodicity in the glottal source: a study of singer voices. In Proc. Xllth Int. Congr. Phonetic Sci., 1:166–170, Aix-en-Provence, Prance, 1991.

    Google Scholar 

  19. M. Cooke, S. Beet, and M. Crawford, eds. Visual Representations of Speech Signals. Wiley, Chichester, England, 1993.

    Google Scholar 

  20. A. De Lima Veiga and Y. Grenier. A multi-step excited model for speech parameter trajectories. In Proc. Int. Conf. Acoust. Speech Sign. Process., pp. 67-70, New York, 1988.

    Google Scholar 

  21. J. R. B. de Marca and M. Copperi, eds. Special issue on speech coding for telecommunications. Europ. Trans. Telecomm., 5(5), 1994.

    Google Scholar 

  22. L. Deng. A generalized hidden Markov model with state-conditioned trend functions of time for the speech signal. Signal Process., 27:65–78, 1992.

    Article  MATH  Google Scholar 

  23. G. Fant. Acoustic Theory of Speech Production, 2nd ed. Mouton, The Hague (The Netherlands), 2nd ed., 1970.

    Google Scholar 

  24. S. Furui and M.M. Sondhi, eds. Advances in Speech Signal Processing. Marcel Dekker, New York, 1992.

    Google Scholar 

  25. W. A. Gardner, ed. Cyclostationarity in Communications and Signal Processing. IEEE Press, New York, 1994.

    MATH  Google Scholar 

  26. A. Gersho. Advances in speech and audio coding. Proc. IEEE, 82(6):900–918, 1994.

    Article  Google Scholar 

  27. O. Ghitza and M.M. Sondhi. Hidden Markov models with templates as non-stationary states: Application to speech recognition. Comp. Speech Lang., 2:101–119, 1993.

    Article  Google Scholar 

  28. Y. Grenier. Time-dependent ARMA modelling of nonstationary signals. IEEE Trans. Acoust. Speech Signal Process., ASSP-31(4):899–911, 1983.

    Article  Google Scholar 

  29. G. C. Hegerl and H. Höge. Numerical simulation of the glottal flow by a model based on the compressible Navier-Stokes equations. In Proc. Int. Conf. Acoust. Speech Sign. Process., pp. 477-480, Toronto, Ont, 1991.

    Google Scholar 

  30. H. Hermansky and N. Morgan. Rasta processing of speech. IEEE Trans. Speech Audio Process., 2(4):578–589, 1994.

    Article  Google Scholar 

  31. N. Jayant, J. Johnston, and R. Safranek. Signal compression based on models of human perception. Proc. IEEE, 81(10):1385–1422,1993.

    Article  Google Scholar 

  32. B. H. Juang and L. R. Rabiner. Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs, NJ, 1994.

    Google Scholar 

  33. J. A. S. Kelso et al. A qualitative dynamic analysis of reiterant speech production: Phase portraits, kinematics, and dynamic modelling. J. Acoust. Soc. Am., 77(l):266–280, 1985.

    Article  Google Scholar 

  34. G. Kitagawa and W. Gersch. A smoothness prior time-varying AR coefficient modelling of nonstationary covariance time series. IEEE Trans. Autom. Contr., AC-30(l):48–56, 1985.

    Article  MathSciNet  MATH  Google Scholar 

  35. W. B. Kleijn and K. K. Paliwal, eds. Speech Coding and Synthesis. Elsevier, Amsterdam, 1995.

    Google Scholar 

  36. W. Bastiaan Kleijn and W. Granzow. Methods for waveform interpolation in speech coding. Digital Signal Processing, l(4):215–230, 1991.

    Article  Google Scholar 

  37. W. Bastiaan Kleijn and J. Haagen. A speech coder based on decomposition of characteristic waveforms. In Proc. Int. Conf. Acoust. Speech Sign. Process., pp. 508-511, Detroit, MI, May 1995.

    Google Scholar 

  38. W. Kozek. Matched generalized Gabor expansion of nonstationary processes. In Proc. IEEE Int. Conf. Signals, Systems, and Computers, pp. 499-503, Pacific Grove, CA, Nov. 1993.

    Google Scholar 

  39. W. Kozek. Matched Weyl-Heisenberg Expansions of Nonstationary Environments. Ph. D. thesis, Vienna University of Technology, Vienna, Austria, 1996.

    Google Scholar 

  40. W. Kozek. Adaptation of Weyl-Heisenberg frames to underspread environments. In Hans G. Feichtinger and Thomas Strohmer, eds., Gabor Analysis and Algorithms — Theory and Applications. chap. 10. Birkhäuser, Boston, 1997.

    Google Scholar 

  41. W. Kozek and H. G. Feichtinger. Time-frequency structured decorrelation of speech signals via nonseparable Gabor frames. In Proc. Int. Conf. Acoust. Speech Sign. Process., Munich, Germany, Apr. 1997.

    Google Scholar 

  42. P. Kroon and W. B. Kleijn. Linear-prediction based analysis-by-synthesis coding. In W. B. Kleijn and K. K. Paliwal, eds., Speech Coding and Synthesis, pp. 70–119. Elsevier, Amsterdam, The Netherlands, 1995.

    Google Scholar 

  43. G. Kubin. Coefficient filtering — a common framework for the adaptation in time-varying environments. In D. Docampo and A. R. Figueras, eds., Adaptive Algorithms: Applications and Non-Classical Schemes, pp. 91-110, Vigo, Spain, 1991.

    Google Scholar 

  44. G. Kubin. A mixed bag of tools for WI speech coding and beyond. AT&T Bell Laboratories, Murray Hill, NJ, 1995.

    Google Scholar 

  45. G. Kubin. Nonlinear processing of speech. In W. B. Kleijn and K. K. Paliwal, eds., Speech Coding and Synthesis, pp. 557-610. Elsevier, Amsterdam, 1995.

    Google Scholar 

  46. G. Kubin. Voice processing — beyond the linear model. In PRORISC/IEEE Workshop on Circ, Systems, and Signal Process., pp. 393–400, Mierlo, The Netherlands, 1996.

    Google Scholar 

  47. G. Kubin. Poincaré section techniques for speech. In Proc. 1997 IEEE Workshop on Speech Coding for Telecomm., pp. 7–8, Pocono Manor, PA, 1997.

    Google Scholar 

  48. G. Kubin and W. B. Kleijn. Time-scale modification of speech based on a nonlinear oscillator model. In Proc. Int. Conf. Acoust. Speech Sign. Process., I-453-I-456, Adelaide, Australia, 1994.

    Google Scholar 

  49. L. Lindbom. A Wiener Filtering Approach to the Design of Tracking Algorithms—With Applications in Mobile Radio Communications. Ph. D. Thesis, Uppsala University, Uppsala, Sweden, 1995.

    Google Scholar 

  50. M. C. Mackey and L. Glass. Oscillation and chaos in physiological control systems. Science, 197:287–289, 1977.

    Article  Google Scholar 

  51. J. D. Markel and A. H. Gray, Jr. Linear Prediction of Speech. Springer, Berlin, 1976.

    Book  MATH  Google Scholar 

  52. R. J. McAulay and T. F. Quatieri. Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans. Acoust. Speech Signal Process., ASSP-34(4):744–754, 1986.

    Article  Google Scholar 

  53. Claude Montacié et al. Cinematic techniques for speech processing: Temporal decomposition and mutivariate linear prediction. In Proc. Int. Conf. Acoust. Speech Sign. Process., I153–I156, San Francisco, CA, 1992.

    Google Scholar 

  54. N. Morgan and H. Bourlard. Continuous speech recognition. IEEE Signal Process. Mag., 12(3):24–42, 1995.

    Article  Google Scholar 

  55. E. Moulines and F. Charpentier. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun., 9(5/6):453–467, 1990.

    Article  Google Scholar 

  56. Y. K. Muthusamy, E. Barnard, and R. A. Cole. Reviewing automatic language identification. IEEE Signal Process. Mag., 11(4):33–41, 1994.

    Article  Google Scholar 

  57. M. Niedzwiecki. First-order tracking properties of weighted least squares estimators. IEEE Trans. Autom. Contr., AC-33(l):94–96, 1988.

    Article  MathSciNet  Google Scholar 

  58. M. Niedzwiecki. On tracking characteristics of weighted least squares estimators applied to nonstationary system identification. IEEE Trans. Autom. Contr., AC-33(l):96–98, 1988.

    Article  MathSciNet  Google Scholar 

  59. A. Papoulis. Probability, Random Variables, and Stochastic Processes, 2nd ed. McGraw-Hill Int., Tokyo, 2nd ed., 1984.

    MATH  Google Scholar 

  60. T. S. Parker and L. O. Chua. Chaos: a tutorial for engineers. Proc. IEEE, 75(8):982–1008, 1987.

    Article  Google Scholar 

  61. B. Porat. Second-order equivalence of rectangular and exponential windows in least-squares estimation of Gaussian autoregressive processes. IEEE Trans. Acoust Speech Signal Process., ASSP-33(5):1209–1212, 1985.

    Article  MathSciNet  Google Scholar 

  62. R. K. Potter, A. G. Kopp, and H. C. Green. Visible Speech. Van Nostrand, New York, 1947.

    Google Scholar 

  63. L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 77(2):257–286, 1989.

    Article  Google Scholar 

  64. L. R. Rabiner. Applications of voice processing to telecommunications. Proc. IEEE, 82(2):199–228, 1994.

    Article  Google Scholar 

  65. D. B. Roe and S. Furui, eds. Special issue on interactive voice technology for telecommmunication application. Speech Commun., 17(3-4), 1995.

    Google Scholar 

  66. E. S. Saltzmann. Dynamics and coordinate systems in skilled sensorimotor activity. In Status Report on Speech Research, SR-115/16:1–15, Haskins Laboratories, New Haven, CT, 1993.

    Google Scholar 

  67. T. Sauer, J. A. Yorke, and M. Casdagli. Embedology. J. Stat. Phys., 65:579–616, 1991.

    Article  MathSciNet  MATH  Google Scholar 

  68. T. Schlögl. Synthese von Sprachsignalen mit rückgekoppelten neuralen Netzen (Synthesis of speech signals with feedback neural networks, in German). INTHF — student project report, Vienna University of Technology, Vienna, Austria, 1997.

    Google Scholar 

  69. S. Singhal and B. S. Atal. Improving performance of multi-pulse LPC coders at low bit rates. In Proc. Int. Conf. Acoust. Speech Sign. Process., 1.3.1-1.3.4, San Diego, CA, 1984.

    Google Scholar 

  70. V. Steinbiss et al. Continuous speech dictation — From theory to practice. Speech Commun., 17(l-2):19–38, 1995.

    Article  Google Scholar 

  71. J. Thyssen. Non-Linear Analysis, Prediction, and Coding of Speech. Ph.D. thesis, Technical University of Denmark, Lyngby, Denmark, 1995.

    Google Scholar 

  72. R. Togneri, M. D. Alder, and Y. Attikiouzel. Dimensions and structure of the speech space. IEE Proceedings-I, 139(2):123–127, 1992.

    Google Scholar 

  73. A. M. L. van Dijk-Kappers and S. M. Marcus. Temporal decomposition of speech. Speech Commun. 8:125–135, 1989.

    Article  Google Scholar 

  74. J.-M. Vesin. On Some Aspects of Non-Linear Signal Modelling and its Real World Applications. Ph.D. thesis, EPFL, Lausanne, Switzerland, 1992.

    Google Scholar 

  75. A. Waibel et al. Phoneme recognition using time-delay neural networks. IEEE Trans. Acoust. Speech Signal Process., 37:328–339, 1989.

    Article  Google Scholar 

  76. W. Wokurek, G. Kubin, and F. Hlawatsch. Wigner distribution—a new method for high-resolution time-frequency analysis of speech signals. In Proc. Xlth Int. Congress Phonetic Sciences, pp. 44-47, Tallinn, Esthonia, 1987.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer Science+Business Media New York

About this chapter

Cite this chapter

Kubin, G. (1998). Signal Analysis and Modelling for Speech Processing. In: Procházka, A., Uhlíř, J., Rayner, P.W.J., Kingsbury, N.G. (eds) Signal Analysis and Prediction. Applied and Numerical Harmonic Analysis. Birkhäuser, Boston, MA. https://doi.org/10.1007/978-1-4612-1768-8_26

Download citation

  • DOI: https://doi.org/10.1007/978-1-4612-1768-8_26

  • Publisher Name: Birkhäuser, Boston, MA

  • Print ISBN: 978-1-4612-7273-1

  • Online ISBN: 978-1-4612-1768-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics