Skip to main content

Spectral Envelopes and Additive + Residual Analysis/Synthesis

  • Chapter

Part of the book series: Modern Acoustics and Signal Processing ((MASP))

Abstract

The subject of this chapter is the estimation, representation, modification, and use of spectral envelopes in the context of sinusoidal-additive-plus-residual analysis/synthesis. A spectral envelope is an amplitude-vs-frequency function, which may be obtained from the envelope of a short-time spectrum (Rodet et al., 1987; Schwarz, 1998). [Precise definitions of such an envelope and short-time spectrum (STS) are given in Section 2.] The additive-plus-residual analysis/synthesis method is based on a representation of signals in terms of a sum of time-varying sinusoids and of a non-sinusoidal residual signal [e.g., see Serra (1989), Laroche et al. (1993), McAulay and Quatieri (1995), and Ding and Qian (1997)]. Many musical sound signals may be described as a combination of a nearly periodic waveform and colored noise. The nearly periodic part of the signal can be viewed as a sum of sinusoidal components, called partials, with time-varying frequency and amplitude. Such sinusoidal components are easily observed on a spectral analysis display (Fig. 5.1) as obtained, for instance, from a discrete Fourier transform.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Allen, J., Hunnicutt, M. S., and Klatt, D. (1987). From Text to Speech, The MITalk System (Cambridge University Press, New York).

    Google Scholar 

  • Atal, B. S. and Hanauer, S. L. (1971). “Speech analysis and synthesis by linear prediction of the speech wave,” J. Acoust. Soc. Am. 50, 637–655.

    Article  ADS  Google Scholar 

  • Atal, B. S. (1974). “Recent advances in predictive coding-applications to speech synthesis,” in Speech Communication: Proceedings of the Speech Communication Seminar, Stockholm, G. Fant, ed. (John Wiley, New York), pp. 27–31.

    Google Scholar 

  • Beauchamp, J. W. (1974). “Time-variant spectra of violin tones,” J. Acoust. Soc. Am. 56(30), 995–1004.

    Article  ADS  Google Scholar 

  • Beauchamp, J. W. (1975). “Analysis and synthesis of cornet tones using nonlinear interharmonic relationships,” J. Audio Eng. Soc. 23(10), 778–795.

    Google Scholar 

  • Beauchamp, J. W. (1979). “Practical sound synthesis using a nonlinear processor (waveshaper) and a high pass filter,” Computer Music J. 3(3), 42–49.

    Article  MathSciNet  Google Scholar 

  • Beauchamp, J. W. (1980). “Analysis of simultaneous mouthpiece and output waveforms of wind instruments,” 66th Conv. Audio Engineering Soc., Los Angeles, Audio Eng. Soc. Preprint 1626.

    Google Scholar 

  • Beauchamp, J. W. (1982). “Synthesis by spectral amplitude and ‘brightness’ matching of analyzed musical instrument tones,” J. Audio Eng. Soc. 30(6), 396–406.

    Google Scholar 

  • Benade, A. H. (1976). Fundamentals of Musical Acoustics (Oxford University Press, New York).

    Google Scholar 

  • Bennett, G., and Rodet, X. (1989). “Synthesis of the Singing Voice,” in Current Directions in Computer Music Research, M. V. Mathews and J. R. Pierce, eds. (MIT Press, Cambirdge, MA), pp 19–44.

    Google Scholar 

  • Bogert, B., Healy, M., and Tukey, J. (1963). “The Quefrency Alanysis of Time Series for Echoes,” Proc. Symp. on Time Series Analysis, M. Rosenblatt, ed. (J. Wiley, New York), Ch. 15, pp. 209–243.

    Google Scholar 

  • Campedel-Oudot, M., Cappé, O., and Moulines, E. (2001). “Estimation of the spectral envelope of voiced sounds using a penalized likelihood criterion,” IEEE Trans. on Speech and Audio Processing 9(5), 469–481.

    Article  Google Scholar 

  • Chandra, S., and Lin, W. C. (1974). “Experimental comparison between stationary and non-stationary formulations of linear prediction applied to voiced speech,” IEEE Trans. Acoustics, Speech Signal Processing ASSP-22, 403–415.

    Article  Google Scholar 

  • Depalle, P. (1991). “Analyse, modélisation et synthèse des sons basé es sur le modèle source/filtre,” Doctoral dissertation, Université du Maine, Le Mans, France.

    Google Scholar 

  • Depalle, P., Garcia, G., and Rodet, X. (1993). “Tracking of partials for additive sound synthesis using hidden Markov models,” Proc. 1993 Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA-1993), New Paltz, NY (IEEE, New York), pp. 225–228.

    Google Scholar 

  • Ding, Y. and Qian, X. (1997). “Sinusoidal and residual decomposition and residual modeling of musical tones using the QUASAR signal model,” Proc.1997 Int. Computer Music Conf., Thessaloniki, Greece (Int. Computer Music Assoc., San Francisco), pp. 35–42.

    Google Scholar 

  • Dubnov, S., and Rodet, X. (1997). “Statistical modeling of sound aperiodicities,” Proc. 1997 Int. Computer Music Conf., Thessaloniki, Greece (Int. Computer Music Assoc., San Francisco), pp. 43–50.

    Google Scholar 

  • El-Jaroudy, A., and Makhoul, J. (1991). “Discrete all-pole modeling,” IEEE Trans Signal Processing 39, 411–423.

    Article  ADS  Google Scholar 

  • Fant, G. (1970). Acoustic Theory of Speech Production with Calculations Based on X-Ray Studies of Russian Articulations (Mouton, The Hauge).

    Google Scholar 

  • Flanagan, J. L. (1972). Speech Analysis, Synthesis and Perception (Springer-Verlag, Berlin).

    Google Scholar 

  • Fletcher, N. H. and Tarnopolsky, A. (1999). “Blowing pressure, power, and spectrum in trumpet playing,” J. Acoust. Soc. Am. 105(2), Pt. 1, 874–881.

    Article  ADS  Google Scholar 

  • Freed, A., Rodet, X., and Depalle, P. (1993). “Synthesis and control of hundreds of sinusoidal partials on a desktop computer without custom hardware.” Proc. 1993 Int. Computer Music Conf, Tokyo, Japan (Int. Computer Music Assoc., San Francisco), pp. 98–101.

    Google Scholar 

  • Freed, A. (1995). “Bring your own control to additive synthesis,” Proc. 1995 Int. Computer Music Conf., Banff, Canada (Int. Computer Music Assoc., San Francisco), pp. 303–306.

    Google Scholar 

  • Freed, A. (1999). “Spectral line broadening with transform domain additive synthesis,” Proc. 1999 Int. Computer Music Conf., Beijing, China (Int. Computer Music Assoc., San Francisco), pp. 78–81.

    Google Scholar 

  • Fitz, K., Haken, L., and Holloway, B. (1995). “Lemur—A tool for timbre manipulation,” Proc. 1995 Int. Computer Music Conf., Banff, Canada (Int. Computer Music Assoc., San Francisco), pp. 158–161.

    Google Scholar 

  • Galas, T. and Rodet, X. (1990). “An improved cepstral method for deconvolution of source–filter systems with discrete spectra: Application to musical sound signals,” Proc. 1990 Int. Computer Music Conf., Glasgow, Scotland (Int. Computer Music Assoc., San Francisco), pp. 82–84.

    Google Scholar 

  • Galas, T., and Rodet, X. (1991a). “Generalized discrete cepstral analysis for deconvolution of source–filter systems with discrete spectra,” Final Program and Paper Summaries: 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA-1991), New Paltz, NY (IEEE, New York), Paper No. 3.2.

    Google Scholar 

  • Galas, T. and Rodet, X. (1991b). “Generalized functional approximation for source–filter system modeling,” Proc. 1991 European Conf. on Speech Communication and Technology, Genoa, Italy, pp. 1085–1088.

    Google Scholar 

  • Giron, F. (1990). “Analyse et synthèse de sons de Shakuachi,” rapport de stage de DEA d'Acoustique de l'université du Maine, IRCAM, October, 1990.

    Google Scholar 

  • Goodwin, M. (1996). “Residual modeling in music analysis-synthesis,” Proc. 1996 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP ′96), Atlanta, GA, (IEEE, New York), pp. 1005–1008.

    Chapter  Google Scholar 

  • Griffin, D. W., and Lim, J. S. (1985). “A new model-based speech analysis/synthesis system,” 1985 Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP ′85), Tampa, FL, (IEEE, New York), pp. 513–516.

    Google Scholar 

  • Hamming, R. W. (1977). Digital Filters (Prentice-Hall, Englewood Cliffs, NJ).

    Google Scholar 

  • Harris, F. J. (1978). “On the use of windows for harmonic analysis with the discrete Fourier transform,” Proc. IEEE 66(1), 51–82.

    Article  ADS  Google Scholar 

  • Holmes, J. N. (1983). “Formant synthesizers: Cascade or parallel,” Speech Communication 2(4), 251–273.

    Article  Google Scholar 

  • Horner, A. and Beauchamp, J. W. (1995). “Synthesis of trumpet tones using a wavetable and a dynamic filter,” J. Audio Eng. Soc. 43(10), 799–812.

    Google Scholar 

  • Horner, A., and Beauchamp, J. W. (1996). “Piecewise linear approximation of additive synthesis envelopes: A comparison of various methods,” Computer Music J. 20(2), 72–95.

    Article  Google Scholar 

  • Itakura, F. (1975). “Line spectrum representation of linear predictive coefficients of speech signals,” J. Acoust. Soc. Am. 57, 535 (abstract).

    Article  Google Scholar 

  • Kay, S. M. (1988). Modern Spectral Estimation: Theory and Application (Prentice Hall, Englewood Cliffs, NJ).

    MATH  Google Scholar 

  • Klatt, D.H. (1980) “Software for cascade/parallel formant synthesizer,” J. Acoust. Soc. Am. 67, 971–995.

    Article  ADS  Google Scholar 

  • Kopec, G. E. (1986). “Formant tracking using hidden Markov models and vector quantization,” IEEE Trans. on Acoustics, Speech and Signal Processing 34, 709–729.

    Article  MathSciNet  Google Scholar 

  • Laroche, J., Stylianou, Y., and Moulines, E. (1993). “HNS: Speech modification based on a harmonic + noise model,” Proc. 1993 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP ′93), Minneapolis, MN, Vol. 2 (IEEE, New York), pp. 550–553.

    Google Scholar 

  • Laura, C. and Rodet, X. (1989). “Appariement de Pics Spectraux et règles pour la synthèse de la parole par concaténation de diphones,” Actes du 1er Congrès français d'acoustique, Lyon, France, pp. 531–536.

    Google Scholar 

  • Maher, R. C. and Beauchamp, J. W. (1990). “An investigation of vocal vibrato for synthesis,” Applied Acoustics 30(4), 219–245.

    Article  Google Scholar 

  • Makoul, J. (1975). “Linear prediction: A tutorial review,” Proc. IEEE 63(4), 561–580.

    Article  ADS  Google Scholar 

  • Marin, C. and McAdams, S. (1991). “Segregation of concurent sounds. II: Effects of spectral envelope tracing, frequency modulation coherence, and frequency modulation width,” J. Acoust. Soc. Am. 89, 341–351.

    Article  ADS  Google Scholar 

  • Markel, J. D. and Gray, A. H., Jr. (1980). Linear Prediction of Speech (Springer-Verlag, Berlin).

    Google Scholar 

  • Massie, D. C. and Stonick, V. L. (1992). “The musical intrigue of pole-zero pairs,” Proc. 1992 Int. Computer Music Conf., San Jose, CA (Int. Computer Music Assoc.: San Francisco), pp. 22–25.

    Google Scholar 

  • McAdams, S. and Rodet, X. (1988). “The role of FM-induced AM in dynamic spectral profile analysis,” Basic Issues in Hearing: Proc. 8th Int. Symposium on Hearing, H. Duifhuis, J. Horst, and H. Wit, eds. (Academic Press, London) pp. 359–369.

    Google Scholar 

  • McAulay, R. J. and Quatieri, T. F. (1995). “Sinusoidal coding,” in Speech Coding and Synthesis, W. B. Kleijn and K.K. Paliwal, eds. (Elsevier Science, Amsterdam), pp. 121–173.

    Google Scholar 

  • McCandleem, S. S. (1974). “An algorithm for automatic formant extraction using linear prediction spectra,” IEEE Trans. on Acoustics, Speech and Signal Processing ASSP-22, 135–141.

    Article  Google Scholar 

  • Mellody, M. and Wakefield, G. H. (1997). “A modal distribution study of violin vibrato,” Proc. 1997 Int. Computer Music Conf., Thessaloniki, Greece (Int. Computer Music Assoc., San Francisco), pp. 465–468.

    Google Scholar 

  • Mellody, M. and Wakefield, G. H. (2000). “The time-frequency characteristics of violin vibrato: Modal distribution analysis and synthesis,” J. Acoust. Soc. Am. 107(1), 598–611.

    Article  ADS  Google Scholar 

  • Moore, B. C. J. (1997). An Introduction to the Psychology of Hearing (Academic Press, San Diego).

    Google Scholar 

  • Moore, F. R. (1990). Elements of Computer Music (Prentice Hall, Englewood Cliffs, NJ).

    Google Scholar 

  • Moorer, J. A. (1979). “The Use of Linear Prediction of Speech in Computer Music Applications,” J. Audio Eng. Soc. 27(3), 134–140.

    Google Scholar 

  • Niranjan, M. and Cox, I. J. (1994). “Recursive tracking of formants in speech signals,” Proc. 1994 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP ′94), Adelaide, South Australia, Vol. 2 (IEEE, New York), pp. 205–208.

    Google Scholar 

  • Olive, J. P. (1971). “Automatic formant tracking by a Newton-Raphson technique,” J. Acoust. Soc. Am. 50(2), 661–670.

    Article  ADS  Google Scholar 

  • Oppenheim, A. V. (1978). “Digital processing of speech,” in Applications of Digital Signal Processing, A. V. Oppenheim, ed., (Prentice-Hall, Englewood Cliffs, NJ), pp. 117–168.

    Google Scholar 

  • Oppenheim, A. V., and Schafer, R. W. (1975). Digital Signal Processing (Prentice-Hall, Englewood Cliffs, NJ).

    MATH  Google Scholar 

  • Oudot, M., Cappé, O., and Moulines, E. (1997). “Robust estimation of the spectral envelope for ‘harmonics+noise’ models,” Proc. 1997 IEEE Workshop on Speech Coding for Telecommunications, Pocono Manor, PA (IEEE, New York), 11–12.

    Chapter  Google Scholar 

  • Oudot, M. (1998). “Analyse/synthese des signaux de parole a partir d'un modele de sinusoides et de bruit. Application au codage bas debit et aux transformations prosodiques [Speech analysis/synthesis using harmonic sinewaves and noise. Application to low-bit-rate-coding and prosodic transformations],” Ecole Nationale Superieure de Telecommunications.

    Google Scholar 

  • Peeters, G. and Rodet, X. (1998). “Signal characterization in terms of sinusoidal and non-sinusoidal components,” Proc. First COST-G6 Workshop on Digital Audio Effects (DAFX98), Barcelona, Spain.

    Google Scholar 

  • Pierucci, P., and Paladin, A. (1997). “Singing voice analysis and synthesis system through glottal excited formant resonators,” Proc. 1997 Int. Computer Music Conf., Thessaloniki, Greece (Int. Computer Music Assoc., San Francisco), pp. 168–171.

    Google Scholar 

  • Potard, Y., Baisnée, P. F., and Barrière, J. B. (1986). “Experimenting with models of resonance produced by a new technique for the analysis of impulsive sounds,” Proc. 1986 Int. Computer Music Conf., The Hague, Netherlands (Int. Computer Music Assoc., San Francisco), pp. 269–274.

    Google Scholar 

  • Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. (1992). Numerical Recipes in C: The Art of Scientific Computing, 2nd ed. (Cambridge University Press, New York).

    Google Scholar 

  • Risset, J.-C., and Mathews, M. V. (1969). “Analysis of musical-instrument tones,” Physics Today 22(2), 23–30.

    Article  Google Scholar 

  • Rodet, X. and Delatre, J. (1979). “Time-domain speech synthesis by rules using a flexible and fast signal management system,” Proc. 1979 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP ′79), Washington, D.C. (IEEE, New York), pp. 895–898.

    Google Scholar 

  • Rodet, X. (1980). “Time-domain formant-wave-function synthesis,” Spoken Language Generation and Understanding: Proc. NATO Advanced Study Institute, Bonas, France, J.C. Simon, ed. (D. Reidel Pub. Co., Dordrecht, Holland), pp. 429–441.

    Google Scholar 

  • Rodet, X. (1984). “Time-domain formant-wave-function synthesis,” Computer Music J. 8(3), 9–14.

    Article  Google Scholar 

  • Rodet, X., Potard, Y., and Barrière, J. B. (1984). “The Chant Project: From synthesis of the singing voice to synthesis in general,” Computer Music J. 8(3), 15–31.

    Article  Google Scholar 

  • Rodet, X., and Depalle, P. (1985). “Synthesis by rule: LPC diphones and calculation of formant trajectories,” Proc. 1985 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP ′85), Tampa, FL, Vol. 2 (IEEE, New York), pp. 736–739.

    Google Scholar 

  • Rodet, X., and Depalle, P. (1986). “Use of LPC spectral estimation for music analysis, processing and synthesis,” Final Program and Paper Summaries for the 1986 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA-1986), New Paltz, NY (IEEE, New York), Paper No. 5.5.

    Google Scholar 

  • Rodet, X., Depalle, P., and Poirot, G. (1987). “Speech analysis and synthesis methods based on spectral envelopes and voiced/unvoiced functions,” European Conf. on Speech Technology, 1987, Edinburgh, U.K., pp. 155–158.

    Google Scholar 

  • Rodet, X., Depalle, P., and Poirot, G. (1988). “Diphone sound synthesis based on spectral envelopes and harmonic/noise excitation functions,” Proc. 1988 Int. Computer Music Conf., Cologne, Germany (Int. Computer Music Assoc., San Francisco), pp. 313–321.

    Google Scholar 

  • Rodet, X. and Depalle, P. (1992). “Spectral envelopes and inverse FFT synthesis”, 93rd Convention of the Audio Eng. Soc., San Francisco, CA, Audio Eng. Soc. Preprint No. 3393.

    Google Scholar 

  • Rodet, X., Depalle, P., and Garcia, G. (1995). “New possibilities in sound analysis and synthesis.” Proc. 1995 Int. Symposium on Musical Acoustics, Dourdan, France.

    Google Scholar 

  • Rodet, X., and Lefèvre, A. (1997). “The Diphone program: New features, new synthesis methods and experience of musical use,” Proc. 1997 Int. Computer Music Conf., Thessaloniki, Greece (Int. Computer Music Assoc., San Francisco), pp. 418–421.

    Google Scholar 

  • Sandler, M. B. (1989). “Auto Regressive modelling and synthesis of acoustic instruments,” 86th Convention of the Audio Eng. Soc., Hamburg, Germany, Audio Eng. Soc. Preprint 2758.

    Google Scholar 

  • Schafer, R. W. and Rabiner, L. R. (1970). “System for automatic formant analysis of voiced speech,” J. Acoust. Soc. Am. 47, 634–648.

    Article  ADS  Google Scholar 

  • Schwarz, D. (1998). Spectral Envelopes in Sound Analysis and Synthesis, Universität Stuttgart, Fakultät Informatik, Diplomarbeit Nr. 1622, Stuttgart, Germany, June 1998.

    Google Scholar 

  • Schwarz, D. and Rodet, X. (1999). “Spectral envelope estimation and representation for sound analysis-synthesis,” Proc. 1999 Int. Computer Music Conf., Beijing, China (Int. Computer Music Assoc., San Francisco), pp. 351–354.

    Google Scholar 

  • Serra, X. (1989). “A System for Sound Analysis/Transformation/Synthesis Based on a Deterministic plus Stochastic Decomposition,” Doctoral dissertation, Stanford University, Stanford, CA. Dissertation Abstracts International-A. 51/01, 18.

    Google Scholar 

  • Serra, X. and Smith, J. O. (1990). “Spectral modeling synthesis: A sound analysis/synthesis system based on a deterministic plus stochastic decomposition,” Computer Music J. 14(4), 12–24.

    Article  Google Scholar 

  • Serra, X., Bonada, J., Herrera, P., and Loureiro, R. (1997). “Integrating complementary spectral models in the design of a musical synthesizer,” Proc. 1997 Int. Computer Music Conf., Thessaloniki, Greece (Int. Computer Music Assoc., San Francisco), pp. 152–159.

    Google Scholar 

  • Smith, J. O. (1985). “Introduction to digital filter theory,” in Digital Audio Signal Processing: An Anthology, J. Strawn, ed. (William Kaufmann, Los Altos, CA), pp. 69–135. Also available as Stanford University Dept. of Music Technical Report STAN-M–20.

    Google Scholar 

  • Soong, F. K., and Juang, B.-H. (1984). “Line spectrum pair (LSP) and speech data compression,” Proc. 1984 IEEE Int. Conf. on Acoustics, Speech and Digital Processing (ICASSP ′84), San Diego, CA (IEEE, New York), pp. 1.10.1–1.10.4.

    Google Scholar 

  • Unser, M., Aldroubi, A., and Eden, M. (1993). “B-spline signal processing: I—Theory,” IEEE Trans. Signal Processing 41, 821–833.

    Article  MATH  ADS  Google Scholar 

  • Virolle, D., Schwarz, D., and Rodet, X. (2001). SDIF: Sound Description Interchange Format. [see http://recherche.ircam.fr/sdif/].

    Google Scholar 

  • Vishwanathan, R., and Makhoul, J. (1978). “Adaptive lattice methods for linear prediction,” Proc. 1978 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP ′78), Tulsa, OK (IEEE, New York), pp. 83–86.

    Google Scholar 

  • Wanderley, M. M., Schnell, N., and Rovan, J. (1998). “ESCHER—Modeling and performing composed instruments in real-time,” Proc. 1998 IEEE Symposium on Systems, Man, and Cybernetics, San Diego, CA (IEEE, New York), pp. 1080–1084.

    Google Scholar 

  • Wright, M., Chaudhary, A., Freed, A., Wessel, D., Rodet, X., Virolle, D., Woehrmann, R., and Serra, X. (1998). “New applications of the sound description interchange format,” Proc. 1998 Int. Computer Music Conf., Ann Arbor, MI (Int. Computer Music Assoc., San Francisco), pp. 276–279.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer

About this chapter

Cite this chapter

RODET, X., SCHWARZ, D. (2007). Spectral Envelopes and Additive + Residual Analysis/Synthesis. In: Beauchamp, J.W. (eds) Analysis, Synthesis, and Perception of Musical Sounds. Modern Acoustics and Signal Processing. Springer, New York, NY. https://doi.org/10.1007/978-0-387-32576-7_5

Download citation

Publish with us

Policies and ethics