Skip to main content

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 3))

  • 363 Accesses

Abstract

The most basic speech production model used in speech processing is, undoubtedly, the source-filter model. Since 1960, the year of its first appearance in Fant (1960) and its simplification into the auto-regressive model, it has given birth to dozens of profitable interpretations (see Markel and Gray, 1976 for a review), from maximum likelihood to polynomial approaches, through the extensively used spectral interpretation. All of them share the same mathematical reality observed in various lights. Instead of providing the reader with yet another overview of these developments, we have chosen to present a somewhat original geometrical interpretation of the linear prediction (LP) framework, an approach that is seldom investigated in the literature (it is introduced in Kroon, 1985, and Alexander, 1986), even though it does yield a significant insight into auto-regressive models1. Particular care has been taken in its presentation, in order to avoid abusively simplistic interpretations, which often result from ill-considered generalization of three-to N-dimension concepts. A clear understanding of the notions involved in the first few sections of this chapter will also be of some use in Section 8.9, which is employed to examine to what extent the glottal autoregressive model (GAR) constitutes an important refinement to the classical AR model.

It is a poor sort of memory that only works backwards. Lewis Carroll, Through the Looking Glass

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • ALEXANDER, S.T., (1986), Adaptive Signal Processing: Theory and Applications, Springer-Verlag, New-York, pp.123–141.

    Book  MATH  Google Scholar 

  • ANANTHAPADMANABHA, T.V., and B. VEGNANARAYANA, (1979), “Epoch Extraction of Voiced Speech”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-27, pp. 309–319.

    Article  Google Scholar 

  • ATAL, B.S., and N. DAVID, (1979), “On Synthesizing Natural-Sounding Speech by Linear Prediction”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 79, pp. 44–47.

    Google Scholar 

  • ATAL, B.S., and J.R. REMDE, (1982), “A New Model of LPC Excitation for Producing Natural-Sounding Speech at Low Bit Rates”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 82, vol. 1, pp. 614–617.

    Article  Google Scholar 

  • BOITE, R., and M. KUNT, (1987), Traitement de la parole, Presses polytechniques romandes, Lausanne.

    Google Scholar 

  • CADZOW, J. A., (1990), “Signal Processing via Least Squares Error Modeling”, IEEE ASSP Magazine, October, pp 12–31.

    Google Scholar 

  • CASPERS, B., and B.S. ATAL, (1983), “Changing Pitch and Duration in LPC Synthesized Speech Using the Multipulse Excitation”, Journal of the Acoustical Society of America, vol. 73, S5.

    Article  Google Scholar 

  • CHENG, Y.M., and D. O’SHAUGNESSY, (1989), “Automatic and Reliable Estimation of Glottal Closure Instant and Period”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, n°12.

    Google Scholar 

  • COURBON, J.L., and F. EMERARD, (1982), “SPARTE: a Text-to-Speech Machine Using Synthesis by Diphones”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 82, pp. 1597–1600.

    Google Scholar 

  • CRANEN, B., and J. SHROETER, (1995), “Modeling a Leaky Glottis”, Journal of Phonetics,23, pp. 165–177

    Article  Google Scholar 

  • CROSMER, J.R., and T.P. BARNWELL, (1985), “A Low Bit Rate Segment Vocoder Based on Line Spectrum Pairs”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 85, n° 7.2.

    Google Scholar 

  • DEL CARPIO, J., (1989), Réalisation d›un Système de Synthèse de la Parole à partir d›un Texte en Langue Espagnole, PhD dissertation, Faculté Polytechnique de Mons.

    Google Scholar 

  • DELLER, J.R., (1982), “Evaluation of Larygeal Dysfunction Based on Features of an Accurate Estimate of the Glottal Waveform”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 82, pp. 759–762.

    Google Scholar 

  • DELSARTE, F., and Y. GENIN, (1987), “On the Splitting of Classical Algorithms in Linear Prediction Theory”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-35, n°5, pp. 645–653.

    Article  Google Scholar 

  • DEPALLE, P., X. RODET, and G. POIROT, (1990), “Energy and Articulation Rules for Improving Diphone Speech Synthesis”, Proceedings of the First ESCA Workshop on Speech Synthesis, Autrans, pp. 47–50.

    Google Scholar 

  • FANT, G., (1960), Acoustic Theory of Speech Production, Mouton, The Hague, 1960.

    Google Scholar 

  • FLANAGAN, J.L., (1972), Speech Analysis, Synthesis, and Perception, Springer Verlag, Berlin, p. 77.

    Google Scholar 

  • FRIES, G., (1994), “Hybrid Time- and Frequency-Domain Speech Synthesis with Extended Glottal Source Generation”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 94, Adelaide, vol. I, pp. 581–584.

    Google Scholar 

  • FUJISAKI, H., and M. UUNGQVIST, (1986), “Proposal and Evaluation of Models for the Glottal Source Waveform”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 86, Tokyo, 31.2, pp. 1605–1607.

    Google Scholar 

  • FUJISAKI, H., and M. UUNGQVIST, (1987), “Estimation of Voice Source and Vocal Tract Parameters Based on ARMA Analysis and a Model for the Glottal Source Waveform”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 87,15.4, pp. 637–640.

    Google Scholar 

  • FULDSETH, A., E. HARBORG, F.T. JOHANSEN, and J.E. KNUDSEN, (1991), “A Real-Time Implementable 7 kHz Speech Coder at 16 kbits/s”, Proceedings of Eurospeech 91, pp. 897–900.

    Google Scholar 

  • GOLUB, G.H., and C.F. VAN LOAN, (1989), Matrix Computations, Johns Hopkins University Press, London, p. 243.

    MATH  Google Scholar 

  • HEDELIN, P., (1984), “A Glottal LPC Vocoder”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 84, pp. 1.6.1–1.6.4.

    Google Scholar 

  • HEDELIN, P., (1988), “Phase Compensation in All-Pole Speech Analysis”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 88,8.10, pp. 339–342.

    Google Scholar 

  • HEDELIN, P., (1989), “High-Quality LPC Vocoding”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 89,9.9, pp. 465–468.

    Google Scholar 

  • HESS, W., (1992), “Pitch and Voicing Determination”, in Advances in Speech Signal Processing, S. Furui, M. Sondhi, eds., Dekker, New York, pp. 3–48.

    Google Scholar 

  • HUNT, M.J., J.S. BRIDLE, and J.N. HOLMES, “Interactive Digital Inverse Filtering and its Relation to Linear Prediction Methods”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 78, pp. 15–19.

    Google Scholar 

  • HUNT, M.J., D.A. ZWIERZYNSKI, and R.C. CARR, (1989), “Issues in High-Quality LPC Analysis ans Synthesis”, Proceedings of Eurospeech 89, vol.2, pp. 348–351.

    Google Scholar 

  • ISAKSSON, A., and M. MILLNERT, (1989), “Inverse Glottal Filtering Using a Parameterized Input Model”, Signal Processing, n°18, pp. 435–445.

    Google Scholar 

  • ITAKURA, F., and S. SAITO, (1969), “Speech Analysis-Synthesis System based on the Partial Autocorrelation Coefficients”, Proceedings of the Acoustical Society of Japan Meeting.

    Google Scholar 

  • ITAKURA, F., (1975), “Line Spectrum Representation of Linear Predictive Coefficients of Speech Signals”, Journal of the Acoustical Society of America, vol. 57, S35(A), 1975.

    Article  Google Scholar 

  • KABAL, P., and P. RAMACHANDRAN, (1986), “The Computation of Line Spectral Frequencies using Tchebyshev Polynomials”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-34, n°6.

    Google Scholar 

  • KANG, G.S., and S.S. EVERETT, (1985), “Improvement of the Excitation Source in the Narrow-Band Linear Prediction Vocoder”, IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-33, n°2, pp. 377–386.

    Article  Google Scholar 

  • KAY, S. M., (1988), Modern Spectral Estimation, Prentice Hall Signal Processing Series, p. 222.

    MATH  Google Scholar 

  • KROON, P., (1985), Time-Domain Coding of (Near) Toll Quality Speech at Rates below 16 kB/s, Ph.D. dissertation, Techniche Hogeschool, Delft.

    Google Scholar 

  • LOBO, A.P., and W.A. AINSWORTH, (1989), “Evaluation of a Glottal ARMA Modeling Scheme”, Proceedings of Eurospeech 89, Vol. 2, pp. 027–030.

    Google Scholar 

  • MAC AULAY, R.J., and T.F. QUATIERI, (1986), “Speech Analysis/Synthesis based on Sinusoidal Representation”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34, pp. 744–754.

    Article  Google Scholar 

  • MAKHOUL, J., R. VISWANATHAN, R. SHARTZ, and A.W.F. HIGGINS, (1978), “A Mixed-Source Model for Compression and Synthesis”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 78, pp. 163–166.

    Google Scholar 

  • MALLAT, S. G., (1989), “Multiresolution Approximations and Wavelet Orthonormal Bases of L2(R)”, Transactions of the American Mathematical Society, vol. 315, n°1, pp. 69–87.

    MathSciNet  MATH  Google Scholar 

  • MARKEL, J.D., and A.H. GRAY Jr, (1976), Linear Prediction of Speech, Springer Verlag, New York, pp. 10–42.

    Book  MATH  Google Scholar 

  • MHJEKOVIC, P., (1986), “Glottal Inverse Filtering by Joint Estimation of an AR System with a Linear Input Model”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34, n°1.

    Google Scholar 

  • MOULINES, E., and F. CHARPENTIER, (1988), “Diphone Synthesis Using a Multipulse LPC Technique”, Proceedings of the seventh FASE International Conference,Edinburgh, pp. 47–53.

    Google Scholar 

  • OLIVEIRA, L.C., (1993), “Estimation of Source Parameters by Frequency Analysis”, Proceedings of Eurospeech 93,Berlin, vol. 1, pp. 99–102.

    Google Scholar 

  • PAPAMICHALIS, P.E., (1987), Pratical approaches to speech coding, Prentice Hall.

    Google Scholar 

  • ROY, G., and P. KABAL, (1991), “Wideband Speech Coding at 16 kbits/sec”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 91, vol. 1, pp. 17–20.

    Google Scholar 

  • STRIK, H., B. CRANEN, and L. BOVES, (1993), “Fitting an LF-model to Inverse Filter Signals”,Proceedings of Eurospeech 93,Berlin, vol. 1, pp. 103–106.

    Google Scholar 

  • STRUBE, H.W., (1974), “Determination of the Instant of Glottal Closure from the Speech Wave”, Journal of the Acoustical Society of America, vol. 56, pp. 1625–1629.

    Article  Google Scholar 

  • SUGAMURA, N., and F. ITAKURA, (1986), “Speech Analysis and Synthesis Methods Developed at ECL in NTT — From LPC to LSP”, Speech Communication, June, pp. 199–215.

    Google Scholar 

  • TREMAIN, T.E., (1982), “The Government Standard Linear Predictive Coding Algorithm: LPC-10”, Speech Technology, vol. 1, n°2, April, pp. 40–49.

    Google Scholar 

  • VAN COILE, B.W., and J.P. MARTENS, (1989), “Dutch Text-to-Speech Aids for the Vocally Handicapped”, Proceedings of Eurospeech 89, vol.1, pp. 590–593.

    Google Scholar 

  • VARGA, A., and F. FALLSIDE, (1987), “A Technique for Using Multipulse Linear Predictive Speech Synthesis in Text-to-Speech Type Systems”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 35, pp. 386–387.

    Article  Google Scholar 

  • WHITE, S., (1990), “Codeur CELP à Débit Variable: Application au Codage de Diphones”, Proc. 19èmes Journées d›Etudes sur la Parole, Montréal.

    Google Scholar 

  • WONG, D.Y., J.D. MARKEL, and A.H. GRAY Jr, (1979), “Least Squares Glottal Inverse Filtering of the Acoustic Speech Waveform”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-27, n°4, pp. 350–353

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Dutoit, T. (1997). Linear Prediction Synthesis. In: An Introduction to Text-to-Speech Synthesis. Text, Speech and Language Technology, vol 3. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-5730-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-94-011-5730-8_8

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-1-4020-0369-1

  • Online ISBN: 978-94-011-5730-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics