Abstract
This chapter presents a survey of standard and advanced methods for the analysis and modelling speech signals. First it introduces several speech processing functions as part of voice communication systems technology and proceeds to a brief description of human speech production. Prom this, a two-tier physical model of speech emerges which embraces the speech organ movements at the articulatory tier and the coupled aerodynamic flow and sound propagation at the aero-acoustic tier. Both of these physical tiers appear as separate components in most computational speech signal models. Their discussion addresses both the standard view of linear short-time stationarity and more advanced concepts from non-stationary processes (underspread processes, cyclostationarity) and non-linear systems (neural networks, non-linear oscillators).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
H. D. I. Abarbanel, R. Brown, J. J. SiDorowich, and L. Sh. Tsimring. The analysis of observed chaotic data in physical systems. Rev. Mod. Phys., 65(4):1331–1392, 1993.
B. S. Atal. Efficient coding of LPC parameters by temporal decomposition. In Proc. Int. Conf. Acoust. Speech Sign. Process., pp. 81-84, Boston, MA, 1983.
B. S. Atal and S. L. Hanauer. Speech analysis and synthesis by linear prediction of the speech wave. J. Acoust. Soc. Am., 50(2(Part2)):637–655, 1971.
J. S. Bay and H. Hemami. Modelling of a neural pattern generator with coupled nonlinear oscillators. IEEE Trans. Biomed. Eng., BME-34(4):297–306, 1987.
A. Benveniste. Design of adaptive algorithms for the tracking of time-varying systems. Int. J. Adapt. Control and Sign. Process., 1:3–29, 1987.
H.-P. Bernhard. Sprachsignalanalyse mit Phasenraummethoden (Analysis of speech signals with phase space methods, in German). In Fortschritte der Akustik — DAGA’95, pp. 1015-1018. Deutsche Gesellschaft für Akustik, Oldenburg, Germany, 1995.
H.-P. Bernhard. The Mutual Information Function and its Application to Signal Processing. Ph. D. thesis, Vienna University of Technology, Vienna, Austria, 1997.
H.-P. Bernhard and G. Kubin. Speech production and chaos. In Proc. Xllth Int. Congr. Phonetic Sci., pp. 394–397, Aix-en-Provence, France, Aug. 1991.
H.-P. Bernhard and G. Kubin. A fast mutual information calculation algorithm. In M. J. J. Holt et al., eds., Signal Processing VII: Theories and Applications. 1:50–53. Elsevier, Amsterdam, 1994.
F. Bimbot et al. Temporal decomposition and acoustic-phonetic decoding of speech. In Proc. Int. Conf. Acoust. Speech Sign. Process., pp. 315-318, New York, 1988.
F. Bimbot, G. Chollet, and A. Paolini, eds. Special section on automatic speaker recognition, identification and verification. Speech Commun., 17(1-2), 1995.
F. Bimbot et al. Standard and target driven AR-vector models for speech analysis and speaker recognition. In Proc. Int. Conf. Acoust. Speech Sign. Process., II-5-II-8. San Francisco, CA, 1992.
M. Birgmeier. Kalman-Trained Neural Networks for Signal Processing Applications. Doctoral dissertation, Vienna University of Technology, Vienna, Austria, 1996.
M. Birgmeier. Nonlinear prediction of speech signals using radial basis function networks. In Proc. VIII Europ. Signal Process. Conf., EUSIPCO’96, pp. 459-462, Trieste, Italy, 1996.
M. Birgmeier, H.-P. Bernhard, and G. Kubin. Nonlinear long-term prediction of speech signals. In Proc. Int. Conf. Acoust. Speech Sign. Process., pp. 1283-1286, Munich, Germany, 1997.
H. Bölcskei and F. Halwatsch. Discrete Zak transforms, polyphase transforms, and applications. IEEE Trans. Signal Process., 45(4), 1997.
M. Casdagli et al. Nonlinear modelling of chaotic time series: theory and applications. In J. H. Kim and J. Stringer, eds., Applied Chaos, pp. 335-380. Wiley, New York, 1992.
P.R. Cook. Noise and aperiodicity in the glottal source: a study of singer voices. In Proc. Xllth Int. Congr. Phonetic Sci., 1:166–170, Aix-en-Provence, Prance, 1991.
M. Cooke, S. Beet, and M. Crawford, eds. Visual Representations of Speech Signals. Wiley, Chichester, England, 1993.
A. De Lima Veiga and Y. Grenier. A multi-step excited model for speech parameter trajectories. In Proc. Int. Conf. Acoust. Speech Sign. Process., pp. 67-70, New York, 1988.
J. R. B. de Marca and M. Copperi, eds. Special issue on speech coding for telecommunications. Europ. Trans. Telecomm., 5(5), 1994.
L. Deng. A generalized hidden Markov model with state-conditioned trend functions of time for the speech signal. Signal Process., 27:65–78, 1992.
G. Fant. Acoustic Theory of Speech Production, 2nd ed. Mouton, The Hague (The Netherlands), 2nd ed., 1970.
S. Furui and M.M. Sondhi, eds. Advances in Speech Signal Processing. Marcel Dekker, New York, 1992.
W. A. Gardner, ed. Cyclostationarity in Communications and Signal Processing. IEEE Press, New York, 1994.
A. Gersho. Advances in speech and audio coding. Proc. IEEE, 82(6):900–918, 1994.
O. Ghitza and M.M. Sondhi. Hidden Markov models with templates as non-stationary states: Application to speech recognition. Comp. Speech Lang., 2:101–119, 1993.
Y. Grenier. Time-dependent ARMA modelling of nonstationary signals. IEEE Trans. Acoust. Speech Signal Process., ASSP-31(4):899–911, 1983.
G. C. Hegerl and H. Höge. Numerical simulation of the glottal flow by a model based on the compressible Navier-Stokes equations. In Proc. Int. Conf. Acoust. Speech Sign. Process., pp. 477-480, Toronto, Ont, 1991.
H. Hermansky and N. Morgan. Rasta processing of speech. IEEE Trans. Speech Audio Process., 2(4):578–589, 1994.
N. Jayant, J. Johnston, and R. Safranek. Signal compression based on models of human perception. Proc. IEEE, 81(10):1385–1422,1993.
B. H. Juang and L. R. Rabiner. Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs, NJ, 1994.
J. A. S. Kelso et al. A qualitative dynamic analysis of reiterant speech production: Phase portraits, kinematics, and dynamic modelling. J. Acoust. Soc. Am., 77(l):266–280, 1985.
G. Kitagawa and W. Gersch. A smoothness prior time-varying AR coefficient modelling of nonstationary covariance time series. IEEE Trans. Autom. Contr., AC-30(l):48–56, 1985.
W. B. Kleijn and K. K. Paliwal, eds. Speech Coding and Synthesis. Elsevier, Amsterdam, 1995.
W. Bastiaan Kleijn and W. Granzow. Methods for waveform interpolation in speech coding. Digital Signal Processing, l(4):215–230, 1991.
W. Bastiaan Kleijn and J. Haagen. A speech coder based on decomposition of characteristic waveforms. In Proc. Int. Conf. Acoust. Speech Sign. Process., pp. 508-511, Detroit, MI, May 1995.
W. Kozek. Matched generalized Gabor expansion of nonstationary processes. In Proc. IEEE Int. Conf. Signals, Systems, and Computers, pp. 499-503, Pacific Grove, CA, Nov. 1993.
W. Kozek. Matched Weyl-Heisenberg Expansions of Nonstationary Environments. Ph. D. thesis, Vienna University of Technology, Vienna, Austria, 1996.
W. Kozek. Adaptation of Weyl-Heisenberg frames to underspread environments. In Hans G. Feichtinger and Thomas Strohmer, eds., Gabor Analysis and Algorithms — Theory and Applications. chap. 10. Birkhäuser, Boston, 1997.
W. Kozek and H. G. Feichtinger. Time-frequency structured decorrelation of speech signals via nonseparable Gabor frames. In Proc. Int. Conf. Acoust. Speech Sign. Process., Munich, Germany, Apr. 1997.
P. Kroon and W. B. Kleijn. Linear-prediction based analysis-by-synthesis coding. In W. B. Kleijn and K. K. Paliwal, eds., Speech Coding and Synthesis, pp. 70–119. Elsevier, Amsterdam, The Netherlands, 1995.
G. Kubin. Coefficient filtering — a common framework for the adaptation in time-varying environments. In D. Docampo and A. R. Figueras, eds., Adaptive Algorithms: Applications and Non-Classical Schemes, pp. 91-110, Vigo, Spain, 1991.
G. Kubin. A mixed bag of tools for WI speech coding and beyond. AT&T Bell Laboratories, Murray Hill, NJ, 1995.
G. Kubin. Nonlinear processing of speech. In W. B. Kleijn and K. K. Paliwal, eds., Speech Coding and Synthesis, pp. 557-610. Elsevier, Amsterdam, 1995.
G. Kubin. Voice processing — beyond the linear model. In PRORISC/IEEE Workshop on Circ, Systems, and Signal Process., pp. 393–400, Mierlo, The Netherlands, 1996.
G. Kubin. Poincaré section techniques for speech. In Proc. 1997 IEEE Workshop on Speech Coding for Telecomm., pp. 7–8, Pocono Manor, PA, 1997.
G. Kubin and W. B. Kleijn. Time-scale modification of speech based on a nonlinear oscillator model. In Proc. Int. Conf. Acoust. Speech Sign. Process., I-453-I-456, Adelaide, Australia, 1994.
L. Lindbom. A Wiener Filtering Approach to the Design of Tracking Algorithms—With Applications in Mobile Radio Communications. Ph. D. Thesis, Uppsala University, Uppsala, Sweden, 1995.
M. C. Mackey and L. Glass. Oscillation and chaos in physiological control systems. Science, 197:287–289, 1977.
J. D. Markel and A. H. Gray, Jr. Linear Prediction of Speech. Springer, Berlin, 1976.
R. J. McAulay and T. F. Quatieri. Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans. Acoust. Speech Signal Process., ASSP-34(4):744–754, 1986.
Claude Montacié et al. Cinematic techniques for speech processing: Temporal decomposition and mutivariate linear prediction. In Proc. Int. Conf. Acoust. Speech Sign. Process., I153–I156, San Francisco, CA, 1992.
N. Morgan and H. Bourlard. Continuous speech recognition. IEEE Signal Process. Mag., 12(3):24–42, 1995.
E. Moulines and F. Charpentier. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun., 9(5/6):453–467, 1990.
Y. K. Muthusamy, E. Barnard, and R. A. Cole. Reviewing automatic language identification. IEEE Signal Process. Mag., 11(4):33–41, 1994.
M. Niedzwiecki. First-order tracking properties of weighted least squares estimators. IEEE Trans. Autom. Contr., AC-33(l):94–96, 1988.
M. Niedzwiecki. On tracking characteristics of weighted least squares estimators applied to nonstationary system identification. IEEE Trans. Autom. Contr., AC-33(l):96–98, 1988.
A. Papoulis. Probability, Random Variables, and Stochastic Processes, 2nd ed. McGraw-Hill Int., Tokyo, 2nd ed., 1984.
T. S. Parker and L. O. Chua. Chaos: a tutorial for engineers. Proc. IEEE, 75(8):982–1008, 1987.
B. Porat. Second-order equivalence of rectangular and exponential windows in least-squares estimation of Gaussian autoregressive processes. IEEE Trans. Acoust Speech Signal Process., ASSP-33(5):1209–1212, 1985.
R. K. Potter, A. G. Kopp, and H. C. Green. Visible Speech. Van Nostrand, New York, 1947.
L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 77(2):257–286, 1989.
L. R. Rabiner. Applications of voice processing to telecommunications. Proc. IEEE, 82(2):199–228, 1994.
D. B. Roe and S. Furui, eds. Special issue on interactive voice technology for telecommmunication application. Speech Commun., 17(3-4), 1995.
E. S. Saltzmann. Dynamics and coordinate systems in skilled sensorimotor activity. In Status Report on Speech Research, SR-115/16:1–15, Haskins Laboratories, New Haven, CT, 1993.
T. Sauer, J. A. Yorke, and M. Casdagli. Embedology. J. Stat. Phys., 65:579–616, 1991.
T. Schlögl. Synthese von Sprachsignalen mit rückgekoppelten neuralen Netzen (Synthesis of speech signals with feedback neural networks, in German). INTHF — student project report, Vienna University of Technology, Vienna, Austria, 1997.
S. Singhal and B. S. Atal. Improving performance of multi-pulse LPC coders at low bit rates. In Proc. Int. Conf. Acoust. Speech Sign. Process., 1.3.1-1.3.4, San Diego, CA, 1984.
V. Steinbiss et al. Continuous speech dictation — From theory to practice. Speech Commun., 17(l-2):19–38, 1995.
J. Thyssen. Non-Linear Analysis, Prediction, and Coding of Speech. Ph.D. thesis, Technical University of Denmark, Lyngby, Denmark, 1995.
R. Togneri, M. D. Alder, and Y. Attikiouzel. Dimensions and structure of the speech space. IEE Proceedings-I, 139(2):123–127, 1992.
A. M. L. van Dijk-Kappers and S. M. Marcus. Temporal decomposition of speech. Speech Commun. 8:125–135, 1989.
J.-M. Vesin. On Some Aspects of Non-Linear Signal Modelling and its Real World Applications. Ph.D. thesis, EPFL, Lausanne, Switzerland, 1992.
A. Waibel et al. Phoneme recognition using time-delay neural networks. IEEE Trans. Acoust. Speech Signal Process., 37:328–339, 1989.
W. Wokurek, G. Kubin, and F. Hlawatsch. Wigner distribution—a new method for high-resolution time-frequency analysis of speech signals. In Proc. Xlth Int. Congress Phonetic Sciences, pp. 44-47, Tallinn, Esthonia, 1987.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer Science+Business Media New York
About this chapter
Cite this chapter
Kubin, G. (1998). Signal Analysis and Modelling for Speech Processing. In: Procházka, A., UhlĂĹ™, J., Rayner, P.W.J., Kingsbury, N.G. (eds) Signal Analysis and Prediction. Applied and Numerical Harmonic Analysis. Birkhäuser, Boston, MA. https://doi.org/10.1007/978-1-4612-1768-8_26
Download citation
DOI: https://doi.org/10.1007/978-1-4612-1768-8_26
Publisher Name: Birkhäuser, Boston, MA
Print ISBN: 978-1-4612-7273-1
Online ISBN: 978-1-4612-1768-8
eBook Packages: Springer Book Archive