Abstract
The most basic speech production model used in speech processing is, undoubtedly, the source-filter model. Since 1960, the year of its first appearance in Fant (1960) and its simplification into the auto-regressive model, it has given birth to dozens of profitable interpretations (see Markel and Gray, 1976 for a review), from maximum likelihood to polynomial approaches, through the extensively used spectral interpretation. All of them share the same mathematical reality observed in various lights. Instead of providing the reader with yet another overview of these developments, we have chosen to present a somewhat original geometrical interpretation of the linear prediction (LP) framework, an approach that is seldom investigated in the literature (it is introduced in Kroon, 1985, and Alexander, 1986), even though it does yield a significant insight into auto-regressive models1. Particular care has been taken in its presentation, in order to avoid abusively simplistic interpretations, which often result from ill-considered generalization of three-to N-dimension concepts. A clear understanding of the notions involved in the first few sections of this chapter will also be of some use in Section 8.9, which is employed to examine to what extent the glottal autoregressive model (GAR) constitutes an important refinement to the classical AR model.
It is a poor sort of memory that only works backwards. Lewis Carroll, Through the Looking Glass
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
ALEXANDER, S.T., (1986), Adaptive Signal Processing: Theory and Applications, Springer-Verlag, New-York, pp.123–141.
ANANTHAPADMANABHA, T.V., and B. VEGNANARAYANA, (1979), “Epoch Extraction of Voiced Speech”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-27, pp. 309–319.
ATAL, B.S., and N. DAVID, (1979), “On Synthesizing Natural-Sounding Speech by Linear Prediction”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 79, pp. 44–47.
ATAL, B.S., and J.R. REMDE, (1982), “A New Model of LPC Excitation for Producing Natural-Sounding Speech at Low Bit Rates”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 82, vol. 1, pp. 614–617.
BOITE, R., and M. KUNT, (1987), Traitement de la parole, Presses polytechniques romandes, Lausanne.
CADZOW, J. A., (1990), “Signal Processing via Least Squares Error Modeling”, IEEE ASSP Magazine, October, pp 12–31.
CASPERS, B., and B.S. ATAL, (1983), “Changing Pitch and Duration in LPC Synthesized Speech Using the Multipulse Excitation”, Journal of the Acoustical Society of America, vol. 73, S5.
CHENG, Y.M., and D. O’SHAUGNESSY, (1989), “Automatic and Reliable Estimation of Glottal Closure Instant and Period”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, n°12.
COURBON, J.L., and F. EMERARD, (1982), “SPARTE: a Text-to-Speech Machine Using Synthesis by Diphones”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 82, pp. 1597–1600.
CRANEN, B., and J. SHROETER, (1995), “Modeling a Leaky Glottis”, Journal of Phonetics,23, pp. 165–177
CROSMER, J.R., and T.P. BARNWELL, (1985), “A Low Bit Rate Segment Vocoder Based on Line Spectrum Pairs”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 85, n° 7.2.
DEL CARPIO, J., (1989), Réalisation d›un Système de Synthèse de la Parole à partir d›un Texte en Langue Espagnole, PhD dissertation, Faculté Polytechnique de Mons.
DELLER, J.R., (1982), “Evaluation of Larygeal Dysfunction Based on Features of an Accurate Estimate of the Glottal Waveform”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 82, pp. 759–762.
DELSARTE, F., and Y. GENIN, (1987), “On the Splitting of Classical Algorithms in Linear Prediction Theory”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-35, n°5, pp. 645–653.
DEPALLE, P., X. RODET, and G. POIROT, (1990), “Energy and Articulation Rules for Improving Diphone Speech Synthesis”, Proceedings of the First ESCA Workshop on Speech Synthesis, Autrans, pp. 47–50.
FANT, G., (1960), Acoustic Theory of Speech Production, Mouton, The Hague, 1960.
FLANAGAN, J.L., (1972), Speech Analysis, Synthesis, and Perception, Springer Verlag, Berlin, p. 77.
FRIES, G., (1994), “Hybrid Time- and Frequency-Domain Speech Synthesis with Extended Glottal Source Generation”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 94, Adelaide, vol. I, pp. 581–584.
FUJISAKI, H., and M. UUNGQVIST, (1986), “Proposal and Evaluation of Models for the Glottal Source Waveform”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 86, Tokyo, 31.2, pp. 1605–1607.
FUJISAKI, H., and M. UUNGQVIST, (1987), “Estimation of Voice Source and Vocal Tract Parameters Based on ARMA Analysis and a Model for the Glottal Source Waveform”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 87,15.4, pp. 637–640.
FULDSETH, A., E. HARBORG, F.T. JOHANSEN, and J.E. KNUDSEN, (1991), “A Real-Time Implementable 7 kHz Speech Coder at 16 kbits/s”, Proceedings of Eurospeech 91, pp. 897–900.
GOLUB, G.H., and C.F. VAN LOAN, (1989), Matrix Computations, Johns Hopkins University Press, London, p. 243.
HEDELIN, P., (1984), “A Glottal LPC Vocoder”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 84, pp. 1.6.1–1.6.4.
HEDELIN, P., (1988), “Phase Compensation in All-Pole Speech Analysis”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 88,8.10, pp. 339–342.
HEDELIN, P., (1989), “High-Quality LPC Vocoding”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 89,9.9, pp. 465–468.
HESS, W., (1992), “Pitch and Voicing Determination”, in Advances in Speech Signal Processing, S. Furui, M. Sondhi, eds., Dekker, New York, pp. 3–48.
HUNT, M.J., J.S. BRIDLE, and J.N. HOLMES, “Interactive Digital Inverse Filtering and its Relation to Linear Prediction Methods”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 78, pp. 15–19.
HUNT, M.J., D.A. ZWIERZYNSKI, and R.C. CARR, (1989), “Issues in High-Quality LPC Analysis ans Synthesis”, Proceedings of Eurospeech 89, vol.2, pp. 348–351.
ISAKSSON, A., and M. MILLNERT, (1989), “Inverse Glottal Filtering Using a Parameterized Input Model”, Signal Processing, n°18, pp. 435–445.
ITAKURA, F., and S. SAITO, (1969), “Speech Analysis-Synthesis System based on the Partial Autocorrelation Coefficients”, Proceedings of the Acoustical Society of Japan Meeting.
ITAKURA, F., (1975), “Line Spectrum Representation of Linear Predictive Coefficients of Speech Signals”, Journal of the Acoustical Society of America, vol. 57, S35(A), 1975.
KABAL, P., and P. RAMACHANDRAN, (1986), “The Computation of Line Spectral Frequencies using Tchebyshev Polynomials”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-34, n°6.
KANG, G.S., and S.S. EVERETT, (1985), “Improvement of the Excitation Source in the Narrow-Band Linear Prediction Vocoder”, IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-33, n°2, pp. 377–386.
KAY, S. M., (1988), Modern Spectral Estimation, Prentice Hall Signal Processing Series, p. 222.
KROON, P., (1985), Time-Domain Coding of (Near) Toll Quality Speech at Rates below 16 kB/s, Ph.D. dissertation, Techniche Hogeschool, Delft.
LOBO, A.P., and W.A. AINSWORTH, (1989), “Evaluation of a Glottal ARMA Modeling Scheme”, Proceedings of Eurospeech 89, Vol. 2, pp. 027–030.
MAC AULAY, R.J., and T.F. QUATIERI, (1986), “Speech Analysis/Synthesis based on Sinusoidal Representation”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34, pp. 744–754.
MAKHOUL, J., R. VISWANATHAN, R. SHARTZ, and A.W.F. HIGGINS, (1978), “A Mixed-Source Model for Compression and Synthesis”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 78, pp. 163–166.
MALLAT, S. G., (1989), “Multiresolution Approximations and Wavelet Orthonormal Bases of L2(R)”, Transactions of the American Mathematical Society, vol. 315, n°1, pp. 69–87.
MARKEL, J.D., and A.H. GRAY Jr, (1976), Linear Prediction of Speech, Springer Verlag, New York, pp. 10–42.
MHJEKOVIC, P., (1986), “Glottal Inverse Filtering by Joint Estimation of an AR System with a Linear Input Model”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34, n°1.
MOULINES, E., and F. CHARPENTIER, (1988), “Diphone Synthesis Using a Multipulse LPC Technique”, Proceedings of the seventh FASE International Conference,Edinburgh, pp. 47–53.
OLIVEIRA, L.C., (1993), “Estimation of Source Parameters by Frequency Analysis”, Proceedings of Eurospeech 93,Berlin, vol. 1, pp. 99–102.
PAPAMICHALIS, P.E., (1987), Pratical approaches to speech coding, Prentice Hall.
ROY, G., and P. KABAL, (1991), “Wideband Speech Coding at 16 kbits/sec”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 91, vol. 1, pp. 17–20.
STRIK, H., B. CRANEN, and L. BOVES, (1993), “Fitting an LF-model to Inverse Filter Signals”,Proceedings of Eurospeech 93,Berlin, vol. 1, pp. 103–106.
STRUBE, H.W., (1974), “Determination of the Instant of Glottal Closure from the Speech Wave”, Journal of the Acoustical Society of America, vol. 56, pp. 1625–1629.
SUGAMURA, N., and F. ITAKURA, (1986), “Speech Analysis and Synthesis Methods Developed at ECL in NTT — From LPC to LSP”, Speech Communication, June, pp. 199–215.
TREMAIN, T.E., (1982), “The Government Standard Linear Predictive Coding Algorithm: LPC-10”, Speech Technology, vol. 1, n°2, April, pp. 40–49.
VAN COILE, B.W., and J.P. MARTENS, (1989), “Dutch Text-to-Speech Aids for the Vocally Handicapped”, Proceedings of Eurospeech 89, vol.1, pp. 590–593.
VARGA, A., and F. FALLSIDE, (1987), “A Technique for Using Multipulse Linear Predictive Speech Synthesis in Text-to-Speech Type Systems”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 35, pp. 386–387.
WHITE, S., (1990), “Codeur CELP à Débit Variable: Application au Codage de Diphones”, Proc. 19èmes Journées d›Etudes sur la Parole, Montréal.
WONG, D.Y., J.D. MARKEL, and A.H. GRAY Jr, (1979), “Least Squares Glottal Inverse Filtering of the Acoustic Speech Waveform”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-27, n°4, pp. 350–353
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 1997 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Dutoit, T. (1997). Linear Prediction Synthesis. In: An Introduction to Text-to-Speech Synthesis. Text, Speech and Language Technology, vol 3. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-5730-8_8
Download citation
DOI: https://doi.org/10.1007/978-94-011-5730-8_8
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-0369-1
Online ISBN: 978-94-011-5730-8
eBook Packages: Springer Book Archive