Signal Analysis and Modelling for Speech Processing

Kubin, Gernot

doi:10.1007/978-1-4612-1768-8_26

Gernot Kubin⁵

Part of the book series: Applied and Numerical Harmonic Analysis ((ANHA))

3805 Accesses
1 Citations

Abstract

This chapter presents a survey of standard and advanced methods for the analysis and modelling speech signals. First it introduces several speech processing functions as part of voice communication systems technology and proceeds to a brief description of human speech production. Prom this, a two-tier physical model of speech emerges which embraces the speech organ movements at the articulatory tier and the coupled aerodynamic flow and sound propagation at the aero-acoustic tier. Both of these physical tiers appear as separate components in most computational speech signal models. Their discussion addresses both the standard view of linear short-time stationarity and more advanced concepts from non-stationary processes (underspread processes, cyclostationarity) and non-linear systems (neural networks, non-linear oscillators).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

H. D. I. Abarbanel, R. Brown, J. J. SiDorowich, and L. Sh. Tsimring. The analysis of observed chaotic data in physical systems. Rev. Mod. Phys., 65(4):1331–1392, 1993.
Article MathSciNet Google Scholar
B. S. Atal. Efficient coding of LPC parameters by temporal decomposition. In Proc. Int. Conf. Acoust. Speech Sign. Process., pp. 81-84, Boston, MA, 1983.
Google Scholar
B. S. Atal and S. L. Hanauer. Speech analysis and synthesis by linear prediction of the speech wave. J. Acoust. Soc. Am., 50(2(Part2)):637–655, 1971.
Article Google Scholar
J. S. Bay and H. Hemami. Modelling of a neural pattern generator with coupled nonlinear oscillators. IEEE Trans. Biomed. Eng., BME-34(4):297–306, 1987.
Article Google Scholar
A. Benveniste. Design of adaptive algorithms for the tracking of time-varying systems. Int. J. Adapt. Control and Sign. Process., 1:3–29, 1987.
Article MATH Google Scholar
H.-P. Bernhard. Sprachsignalanalyse mit Phasenraummethoden (Analysis of speech signals with phase space methods, in German). In Fortschritte der Akustik — DAGA’95, pp. 1015-1018. Deutsche Gesellschaft für Akustik, Oldenburg, Germany, 1995.
Google Scholar
H.-P. Bernhard. The Mutual Information Function and its Application to Signal Processing. Ph. D. thesis, Vienna University of Technology, Vienna, Austria, 1997.
Google Scholar
H.-P. Bernhard and G. Kubin. Speech production and chaos. In Proc. Xllth Int. Congr. Phonetic Sci., pp. 394–397, Aix-en-Provence, France, Aug. 1991.
Google Scholar
H.-P. Bernhard and G. Kubin. A fast mutual information calculation algorithm. In M. J. J. Holt et al., eds., Signal Processing VII: Theories and Applications. 1:50–53. Elsevier, Amsterdam, 1994.
Google Scholar
F. Bimbot et al. Temporal decomposition and acoustic-phonetic decoding of speech. In Proc. Int. Conf. Acoust. Speech Sign. Process., pp. 315-318, New York, 1988.
Google Scholar
F. Bimbot, G. Chollet, and A. Paolini, eds. Special section on automatic speaker recognition, identification and verification. Speech Commun., 17(1-2), 1995.
Google Scholar
F. Bimbot et al. Standard and target driven AR-vector models for speech analysis and speaker recognition. In Proc. Int. Conf. Acoust. Speech Sign. Process., II-5-II-8. San Francisco, CA, 1992.
Google Scholar
M. Birgmeier. Kalman-Trained Neural Networks for Signal Processing Applications. Doctoral dissertation, Vienna University of Technology, Vienna, Austria, 1996.
Google Scholar
M. Birgmeier. Nonlinear prediction of speech signals using radial basis function networks. In Proc. VIII Europ. Signal Process. Conf., EUSIPCO’96, pp. 459-462, Trieste, Italy, 1996.
Google Scholar
M. Birgmeier, H.-P. Bernhard, and G. Kubin. Nonlinear long-term prediction of speech signals. In Proc. Int. Conf. Acoust. Speech Sign. Process., pp. 1283-1286, Munich, Germany, 1997.
Google Scholar
H. Bölcskei and F. Halwatsch. Discrete Zak transforms, polyphase transforms, and applications. IEEE Trans. Signal Process., 45(4), 1997.
Google Scholar
M. Casdagli et al. Nonlinear modelling of chaotic time series: theory and applications. In J. H. Kim and J. Stringer, eds., Applied Chaos, pp. 335-380. Wiley, New York, 1992.
Google Scholar
P.R. Cook. Noise and aperiodicity in the glottal source: a study of singer voices. In Proc. Xllth Int. Congr. Phonetic Sci., 1:166–170, Aix-en-Provence, Prance, 1991.
Google Scholar
M. Cooke, S. Beet, and M. Crawford, eds. Visual Representations of Speech Signals. Wiley, Chichester, England, 1993.
Google Scholar
A. De Lima Veiga and Y. Grenier. A multi-step excited model for speech parameter trajectories. In Proc. Int. Conf. Acoust. Speech Sign. Process., pp. 67-70, New York, 1988.
Google Scholar
J. R. B. de Marca and M. Copperi, eds. Special issue on speech coding for telecommunications. Europ. Trans. Telecomm., 5(5), 1994.
Google Scholar
L. Deng. A generalized hidden Markov model with state-conditioned trend functions of time for the speech signal. Signal Process., 27:65–78, 1992.
Article MATH Google Scholar
G. Fant. Acoustic Theory of Speech Production, 2nd ed. Mouton, The Hague (The Netherlands), 2nd ed., 1970.
Google Scholar
S. Furui and M.M. Sondhi, eds. Advances in Speech Signal Processing. Marcel Dekker, New York, 1992.
Google Scholar
W. A. Gardner, ed. Cyclostationarity in Communications and Signal Processing. IEEE Press, New York, 1994.
MATH Google Scholar
A. Gersho. Advances in speech and audio coding. Proc. IEEE, 82(6):900–918, 1994.
Article Google Scholar
O. Ghitza and M.M. Sondhi. Hidden Markov models with templates as non-stationary states: Application to speech recognition. Comp. Speech Lang., 2:101–119, 1993.
Article Google Scholar
Y. Grenier. Time-dependent ARMA modelling of nonstationary signals. IEEE Trans. Acoust. Speech Signal Process., ASSP-31(4):899–911, 1983.
Article Google Scholar
G. C. Hegerl and H. Höge. Numerical simulation of the glottal flow by a model based on the compressible Navier-Stokes equations. In Proc. Int. Conf. Acoust. Speech Sign. Process., pp. 477-480, Toronto, Ont, 1991.
Google Scholar
H. Hermansky and N. Morgan. Rasta processing of speech. IEEE Trans. Speech Audio Process., 2(4):578–589, 1994.
Article Google Scholar
N. Jayant, J. Johnston, and R. Safranek. Signal compression based on models of human perception. Proc. IEEE, 81(10):1385–1422,1993.
Article Google Scholar
B. H. Juang and L. R. Rabiner. Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs, NJ, 1994.
Google Scholar
J. A. S. Kelso et al. A qualitative dynamic analysis of reiterant speech production: Phase portraits, kinematics, and dynamic modelling. J. Acoust. Soc. Am., 77(l):266–280, 1985.
Article Google Scholar
G. Kitagawa and W. Gersch. A smoothness prior time-varying AR coefficient modelling of nonstationary covariance time series. IEEE Trans. Autom. Contr., AC-30(l):48–56, 1985.
Article MathSciNet MATH Google Scholar
W. B. Kleijn and K. K. Paliwal, eds. Speech Coding and Synthesis. Elsevier, Amsterdam, 1995.
Google Scholar
W. Bastiaan Kleijn and W. Granzow. Methods for waveform interpolation in speech coding. Digital Signal Processing, l(4):215–230, 1991.
Article Google Scholar
W. Bastiaan Kleijn and J. Haagen. A speech coder based on decomposition of characteristic waveforms. In Proc. Int. Conf. Acoust. Speech Sign. Process., pp. 508-511, Detroit, MI, May 1995.
Google Scholar
W. Kozek. Matched generalized Gabor expansion of nonstationary processes. In Proc. IEEE Int. Conf. Signals, Systems, and Computers, pp. 499-503, Pacific Grove, CA, Nov. 1993.
Google Scholar
W. Kozek. Matched Weyl-Heisenberg Expansions of Nonstationary Environments. Ph. D. thesis, Vienna University of Technology, Vienna, Austria, 1996.
Google Scholar
W. Kozek. Adaptation of Weyl-Heisenberg frames to underspread environments. In Hans G. Feichtinger and Thomas Strohmer, eds., Gabor Analysis and Algorithms — Theory and Applications. chap. 10. Birkhäuser, Boston, 1997.
Google Scholar
W. Kozek and H. G. Feichtinger. Time-frequency structured decorrelation of speech signals via nonseparable Gabor frames. In Proc. Int. Conf. Acoust. Speech Sign. Process., Munich, Germany, Apr. 1997.
Google Scholar
P. Kroon and W. B. Kleijn. Linear-prediction based analysis-by-synthesis coding. In W. B. Kleijn and K. K. Paliwal, eds., Speech Coding and Synthesis, pp. 70–119. Elsevier, Amsterdam, The Netherlands, 1995.
Google Scholar
G. Kubin. Coefficient filtering — a common framework for the adaptation in time-varying environments. In D. Docampo and A. R. Figueras, eds., Adaptive Algorithms: Applications and Non-Classical Schemes, pp. 91-110, Vigo, Spain, 1991.
Google Scholar
G. Kubin. A mixed bag of tools for WI speech coding and beyond. AT&T Bell Laboratories, Murray Hill, NJ, 1995.
Google Scholar
G. Kubin. Nonlinear processing of speech. In W. B. Kleijn and K. K. Paliwal, eds., Speech Coding and Synthesis, pp. 557-610. Elsevier, Amsterdam, 1995.
Google Scholar
G. Kubin. Voice processing — beyond the linear model. In PRORISC/IEEE Workshop on Circ, Systems, and Signal Process., pp. 393–400, Mierlo, The Netherlands, 1996.
Google Scholar
G. Kubin. Poincaré section techniques for speech. In Proc. 1997 IEEE Workshop on Speech Coding for Telecomm., pp. 7–8, Pocono Manor, PA, 1997.
Google Scholar
G. Kubin and W. B. Kleijn. Time-scale modification of speech based on a nonlinear oscillator model. In Proc. Int. Conf. Acoust. Speech Sign. Process., I-453-I-456, Adelaide, Australia, 1994.
Google Scholar
L. Lindbom. A Wiener Filtering Approach to the Design of Tracking Algorithms—With Applications in Mobile Radio Communications. Ph. D. Thesis, Uppsala University, Uppsala, Sweden, 1995.
Google Scholar
M. C. Mackey and L. Glass. Oscillation and chaos in physiological control systems. Science, 197:287–289, 1977.
Article Google Scholar
J. D. Markel and A. H. Gray, Jr. Linear Prediction of Speech. Springer, Berlin, 1976.
Book MATH Google Scholar
R. J. McAulay and T. F. Quatieri. Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans. Acoust. Speech Signal Process., ASSP-34(4):744–754, 1986.
Article Google Scholar
Claude Montacié et al. Cinematic techniques for speech processing: Temporal decomposition and mutivariate linear prediction. In Proc. Int. Conf. Acoust. Speech Sign. Process., I153–I156, San Francisco, CA, 1992.
Google Scholar
N. Morgan and H. Bourlard. Continuous speech recognition. IEEE Signal Process. Mag., 12(3):24–42, 1995.
Article Google Scholar
E. Moulines and F. Charpentier. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun., 9(5/6):453–467, 1990.
Article Google Scholar
Y. K. Muthusamy, E. Barnard, and R. A. Cole. Reviewing automatic language identification. IEEE Signal Process. Mag., 11(4):33–41, 1994.
Article Google Scholar
M. Niedzwiecki. First-order tracking properties of weighted least squares estimators. IEEE Trans. Autom. Contr., AC-33(l):94–96, 1988.
Article MathSciNet Google Scholar
M. Niedzwiecki. On tracking characteristics of weighted least squares estimators applied to nonstationary system identification. IEEE Trans. Autom. Contr., AC-33(l):96–98, 1988.
Article MathSciNet Google Scholar
A. Papoulis. Probability, Random Variables, and Stochastic Processes, 2nd ed. McGraw-Hill Int., Tokyo, 2nd ed., 1984.
MATH Google Scholar
T. S. Parker and L. O. Chua. Chaos: a tutorial for engineers. Proc. IEEE, 75(8):982–1008, 1987.
Article Google Scholar
B. Porat. Second-order equivalence of rectangular and exponential windows in least-squares estimation of Gaussian autoregressive processes. IEEE Trans. Acoust Speech Signal Process., ASSP-33(5):1209–1212, 1985.
Article MathSciNet Google Scholar
R. K. Potter, A. G. Kopp, and H. C. Green. Visible Speech. Van Nostrand, New York, 1947.
Google Scholar
L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 77(2):257–286, 1989.
Article Google Scholar
L. R. Rabiner. Applications of voice processing to telecommunications. Proc. IEEE, 82(2):199–228, 1994.
Article Google Scholar
D. B. Roe and S. Furui, eds. Special issue on interactive voice technology for telecommmunication application. Speech Commun., 17(3-4), 1995.
Google Scholar
E. S. Saltzmann. Dynamics and coordinate systems in skilled sensorimotor activity. In Status Report on Speech Research, SR-115/16:1–15, Haskins Laboratories, New Haven, CT, 1993.
Google Scholar
T. Sauer, J. A. Yorke, and M. Casdagli. Embedology. J. Stat. Phys., 65:579–616, 1991.
Article MathSciNet MATH Google Scholar
T. Schlögl. Synthese von Sprachsignalen mit rückgekoppelten neuralen Netzen (Synthesis of speech signals with feedback neural networks, in German). INTHF — student project report, Vienna University of Technology, Vienna, Austria, 1997.
Google Scholar
S. Singhal and B. S. Atal. Improving performance of multi-pulse LPC coders at low bit rates. In Proc. Int. Conf. Acoust. Speech Sign. Process., 1.3.1-1.3.4, San Diego, CA, 1984.
Google Scholar
V. Steinbiss et al. Continuous speech dictation — From theory to practice. Speech Commun., 17(l-2):19–38, 1995.
Article Google Scholar
J. Thyssen. Non-Linear Analysis, Prediction, and Coding of Speech. Ph.D. thesis, Technical University of Denmark, Lyngby, Denmark, 1995.
Google Scholar
R. Togneri, M. D. Alder, and Y. Attikiouzel. Dimensions and structure of the speech space. IEE Proceedings-I, 139(2):123–127, 1992.
Google Scholar
A. M. L. van Dijk-Kappers and S. M. Marcus. Temporal decomposition of speech. Speech Commun. 8:125–135, 1989.
Article Google Scholar
J.-M. Vesin. On Some Aspects of Non-Linear Signal Modelling and its Real World Applications. Ph.D. thesis, EPFL, Lausanne, Switzerland, 1992.
Google Scholar
A. Waibel et al. Phoneme recognition using time-delay neural networks. IEEE Trans. Acoust. Speech Signal Process., 37:328–339, 1989.
Article Google Scholar
W. Wokurek, G. Kubin, and F. Hlawatsch. Wigner distribution—a new method for high-resolution time-frequency analysis of speech signals. In Proc. Xlth Int. Congress Phonetic Sciences, pp. 44-47, Tallinn, Esthonia, 1987.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Communications and High-Frequency Engineering, Vienna University of Technology, Gusshausstrasse 25/389, A-1040, Vienna, Austria
Gernot Kubin

Authors

Gernot Kubin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Chemical Technology, Prague, Czech Republic
Ales Procházka
Czech Technical University, Prague, Czech Republic
Jan Uhlíř
University of Cambridge, England, UK
P. W. J. Rayner & N. G. Kingsbury &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kubin, G. (1998). Signal Analysis and Modelling for Speech Processing. In: Procházka, A., Uhlíř, J., Rayner, P.W.J., Kingsbury, N.G. (eds) Signal Analysis and Prediction. Applied and Numerical Harmonic Analysis. Birkhäuser, Boston, MA. https://doi.org/10.1007/978-1-4612-1768-8_26

Download citation

DOI: https://doi.org/10.1007/978-1-4612-1768-8_26
Publisher Name: Birkhäuser, Boston, MA
Print ISBN: 978-1-4612-7273-1
Online ISBN: 978-1-4612-1768-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics