Linear Prediction Synthesis

Dutoit, Thierry

doi:10.1007/978-94-011-5730-8_8

Thierry Dutoit⁴

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 3))

363 Accesses

Abstract

The most basic speech production model used in speech processing is, undoubtedly, the source-filter model. Since 1960, the year of its first appearance in Fant (1960) and its simplification into the auto-regressive model, it has given birth to dozens of profitable interpretations (see Markel and Gray, 1976 for a review), from maximum likelihood to polynomial approaches, through the extensively used spectral interpretation. All of them share the same mathematical reality observed in various lights. Instead of providing the reader with yet another overview of these developments, we have chosen to present a somewhat original geometrical interpretation of the linear prediction (LP) framework, an approach that is seldom investigated in the literature (it is introduced in Kroon, 1985, and Alexander, 1986), even though it does yield a significant insight into auto-regressive models¹. Particular care has been taken in its presentation, in order to avoid abusively simplistic interpretations, which often result from ill-considered generalization of three-to N-dimension concepts. A clear understanding of the notions involved in the first few sections of this chapter will also be of some use in Section 8.9, which is employed to examine to what extent the glottal autoregressive model (GAR) constitutes an important refinement to the classical AR model.

It is a poor sort of memory that only works backwards. Lewis Carroll, Through the Looking Glass

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

ALEXANDER, S.T., (1986), Adaptive Signal Processing: Theory and Applications, Springer-Verlag, New-York, pp.123–141.
Book MATH Google Scholar
ANANTHAPADMANABHA, T.V., and B. VEGNANARAYANA, (1979), “Epoch Extraction of Voiced Speech”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-27, pp. 309–319.
Article Google Scholar
ATAL, B.S., and N. DAVID, (1979), “On Synthesizing Natural-Sounding Speech by Linear Prediction”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 79, pp. 44–47.
Google Scholar
ATAL, B.S., and J.R. REMDE, (1982), “A New Model of LPC Excitation for Producing Natural-Sounding Speech at Low Bit Rates”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 82, vol. 1, pp. 614–617.
Article Google Scholar
BOITE, R., and M. KUNT, (1987), Traitement de la parole, Presses polytechniques romandes, Lausanne.
Google Scholar
CADZOW, J. A., (1990), “Signal Processing via Least Squares Error Modeling”, IEEE ASSP Magazine, October, pp 12–31.
Google Scholar
CASPERS, B., and B.S. ATAL, (1983), “Changing Pitch and Duration in LPC Synthesized Speech Using the Multipulse Excitation”, Journal of the Acoustical Society of America, vol. 73, S5.
Article Google Scholar
CHENG, Y.M., and D. O’SHAUGNESSY, (1989), “Automatic and Reliable Estimation of Glottal Closure Instant and Period”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, n°12.
Google Scholar
COURBON, J.L., and F. EMERARD, (1982), “SPARTE: a Text-to-Speech Machine Using Synthesis by Diphones”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 82, pp. 1597–1600.
Google Scholar
CRANEN, B., and J. SHROETER, (1995), “Modeling a Leaky Glottis”, Journal of Phonetics,23, pp. 165–177
Article Google Scholar
CROSMER, J.R., and T.P. BARNWELL, (1985), “A Low Bit Rate Segment Vocoder Based on Line Spectrum Pairs”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 85, n° 7.2.
Google Scholar
DEL CARPIO, J., (1989), Réalisation d›un Système de Synthèse de la Parole à partir d›un Texte en Langue Espagnole, PhD dissertation, Faculté Polytechnique de Mons.
Google Scholar
DELLER, J.R., (1982), “Evaluation of Larygeal Dysfunction Based on Features of an Accurate Estimate of the Glottal Waveform”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 82, pp. 759–762.
Google Scholar
DELSARTE, F., and Y. GENIN, (1987), “On the Splitting of Classical Algorithms in Linear Prediction Theory”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-35, n°5, pp. 645–653.
Article Google Scholar
DEPALLE, P., X. RODET, and G. POIROT, (1990), “Energy and Articulation Rules for Improving Diphone Speech Synthesis”, Proceedings of the First ESCA Workshop on Speech Synthesis, Autrans, pp. 47–50.
Google Scholar
FANT, G., (1960), Acoustic Theory of Speech Production, Mouton, The Hague, 1960.
Google Scholar
FLANAGAN, J.L., (1972), Speech Analysis, Synthesis, and Perception, Springer Verlag, Berlin, p. 77.
Google Scholar
FRIES, G., (1994), “Hybrid Time- and Frequency-Domain Speech Synthesis with Extended Glottal Source Generation”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 94, Adelaide, vol. I, pp. 581–584.
Google Scholar
FUJISAKI, H., and M. UUNGQVIST, (1986), “Proposal and Evaluation of Models for the Glottal Source Waveform”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 86, Tokyo, 31.2, pp. 1605–1607.
Google Scholar
FUJISAKI, H., and M. UUNGQVIST, (1987), “Estimation of Voice Source and Vocal Tract Parameters Based on ARMA Analysis and a Model for the Glottal Source Waveform”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 87,15.4, pp. 637–640.
Google Scholar
FULDSETH, A., E. HARBORG, F.T. JOHANSEN, and J.E. KNUDSEN, (1991), “A Real-Time Implementable 7 kHz Speech Coder at 16 kbits/s”, Proceedings of Eurospeech 91, pp. 897–900.
Google Scholar
GOLUB, G.H., and C.F. VAN LOAN, (1989), Matrix Computations, Johns Hopkins University Press, London, p. 243.
MATH Google Scholar
HEDELIN, P., (1984), “A Glottal LPC Vocoder”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 84, pp. 1.6.1–1.6.4.
Google Scholar
HEDELIN, P., (1988), “Phase Compensation in All-Pole Speech Analysis”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 88,8.10, pp. 339–342.
Google Scholar
HEDELIN, P., (1989), “High-Quality LPC Vocoding”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 89,9.9, pp. 465–468.
Google Scholar
HESS, W., (1992), “Pitch and Voicing Determination”, in Advances in Speech Signal Processing, S. Furui, M. Sondhi, eds., Dekker, New York, pp. 3–48.
Google Scholar
HUNT, M.J., J.S. BRIDLE, and J.N. HOLMES, “Interactive Digital Inverse Filtering and its Relation to Linear Prediction Methods”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 78, pp. 15–19.
Google Scholar
HUNT, M.J., D.A. ZWIERZYNSKI, and R.C. CARR, (1989), “Issues in High-Quality LPC Analysis ans Synthesis”, Proceedings of Eurospeech 89, vol.2, pp. 348–351.
Google Scholar
ISAKSSON, A., and M. MILLNERT, (1989), “Inverse Glottal Filtering Using a Parameterized Input Model”, Signal Processing, n°18, pp. 435–445.
Google Scholar
ITAKURA, F., and S. SAITO, (1969), “Speech Analysis-Synthesis System based on the Partial Autocorrelation Coefficients”, Proceedings of the Acoustical Society of Japan Meeting.
Google Scholar
ITAKURA, F., (1975), “Line Spectrum Representation of Linear Predictive Coefficients of Speech Signals”, Journal of the Acoustical Society of America, vol. 57, S35(A), 1975.
Article Google Scholar
KABAL, P., and P. RAMACHANDRAN, (1986), “The Computation of Line Spectral Frequencies using Tchebyshev Polynomials”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-34, n°6.
Google Scholar
KANG, G.S., and S.S. EVERETT, (1985), “Improvement of the Excitation Source in the Narrow-Band Linear Prediction Vocoder”, IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-33, n°2, pp. 377–386.
Article Google Scholar
KAY, S. M., (1988), Modern Spectral Estimation, Prentice Hall Signal Processing Series, p. 222.
MATH Google Scholar
KROON, P., (1985), Time-Domain Coding of (Near) Toll Quality Speech at Rates below 16 kB/s, Ph.D. dissertation, Techniche Hogeschool, Delft.
Google Scholar
LOBO, A.P., and W.A. AINSWORTH, (1989), “Evaluation of a Glottal ARMA Modeling Scheme”, Proceedings of Eurospeech 89, Vol. 2, pp. 027–030.
Google Scholar
MAC AULAY, R.J., and T.F. QUATIERI, (1986), “Speech Analysis/Synthesis based on Sinusoidal Representation”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34, pp. 744–754.
Article Google Scholar
MAKHOUL, J., R. VISWANATHAN, R. SHARTZ, and A.W.F. HIGGINS, (1978), “A Mixed-Source Model for Compression and Synthesis”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 78, pp. 163–166.
Google Scholar
MALLAT, S. G., (1989), “Multiresolution Approximations and Wavelet Orthonormal Bases of L²(R)”, Transactions of the American Mathematical Society, vol. 315, n°1, pp. 69–87.
MathSciNet MATH Google Scholar
MARKEL, J.D., and A.H. GRAY Jr, (1976), Linear Prediction of Speech, Springer Verlag, New York, pp. 10–42.
Book MATH Google Scholar
MHJEKOVIC, P., (1986), “Glottal Inverse Filtering by Joint Estimation of an AR System with a Linear Input Model”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34, n°1.
Google Scholar
MOULINES, E., and F. CHARPENTIER, (1988), “Diphone Synthesis Using a Multipulse LPC Technique”, Proceedings of the seventh FASE International Conference,Edinburgh, pp. 47–53.
Google Scholar
OLIVEIRA, L.C., (1993), “Estimation of Source Parameters by Frequency Analysis”, Proceedings of Eurospeech 93,Berlin, vol. 1, pp. 99–102.
Google Scholar
PAPAMICHALIS, P.E., (1987), Pratical approaches to speech coding, Prentice Hall.
Google Scholar
ROY, G., and P. KABAL, (1991), “Wideband Speech Coding at 16 kbits/sec”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 91, vol. 1, pp. 17–20.
Google Scholar
STRIK, H., B. CRANEN, and L. BOVES, (1993), “Fitting an LF-model to Inverse Filter Signals”,Proceedings of Eurospeech 93,Berlin, vol. 1, pp. 103–106.
Google Scholar
STRUBE, H.W., (1974), “Determination of the Instant of Glottal Closure from the Speech Wave”, Journal of the Acoustical Society of America, vol. 56, pp. 1625–1629.
Article Google Scholar
SUGAMURA, N., and F. ITAKURA, (1986), “Speech Analysis and Synthesis Methods Developed at ECL in NTT — From LPC to LSP”, Speech Communication, June, pp. 199–215.
Google Scholar
TREMAIN, T.E., (1982), “The Government Standard Linear Predictive Coding Algorithm: LPC-10”, Speech Technology, vol. 1, n°2, April, pp. 40–49.
Google Scholar
VAN COILE, B.W., and J.P. MARTENS, (1989), “Dutch Text-to-Speech Aids for the Vocally Handicapped”, Proceedings of Eurospeech 89, vol.1, pp. 590–593.
Google Scholar
VARGA, A., and F. FALLSIDE, (1987), “A Technique for Using Multipulse Linear Predictive Speech Synthesis in Text-to-Speech Type Systems”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 35, pp. 386–387.
Article Google Scholar
WHITE, S., (1990), “Codeur CELP à Débit Variable: Application au Codage de Diphones”, Proc. 19èmes Journées d›Etudes sur la Parole, Montréal.
Google Scholar
WONG, D.Y., J.D. MARKEL, and A.H. GRAY Jr, (1979), “Least Squares Glottal Inverse Filtering of the Acoustic Speech Waveform”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-27, n°4, pp. 350–353
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculté Polytechnique de Mons, Mons, Belgium
Thierry Dutoit

Authors

Thierry Dutoit
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dutoit, T. (1997). Linear Prediction Synthesis. In: An Introduction to Text-to-Speech Synthesis. Text, Speech and Language Technology, vol 3. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-5730-8_8

Download citation

DOI: https://doi.org/10.1007/978-94-011-5730-8_8
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-0369-1
Online ISBN: 978-94-011-5730-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics