Abstract
The subject of this chapter is the estimation, representation, modification, and use of spectral envelopes in the context of sinusoidal-additive-plus-residual analysis/synthesis. A spectral envelope is an amplitude-vs-frequency function, which may be obtained from the envelope of a short-time spectrum (Rodet et al., 1987; Schwarz, 1998). [Precise definitions of such an envelope and short-time spectrum (STS) are given in Section 2.] The additive-plus-residual analysis/synthesis method is based on a representation of signals in terms of a sum of time-varying sinusoids and of a non-sinusoidal residual signal [e.g., see Serra (1989), Laroche et al. (1993), McAulay and Quatieri (1995), and Ding and Qian (1997)]. Many musical sound signals may be described as a combination of a nearly periodic waveform and colored noise. The nearly periodic part of the signal can be viewed as a sum of sinusoidal components, called partials, with time-varying frequency and amplitude. Such sinusoidal components are easily observed on a spectral analysis display (Fig. 5.1) as obtained, for instance, from a discrete Fourier transform.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Allen, J., Hunnicutt, M. S., and Klatt, D. (1987). From Text to Speech, The MITalk System (Cambridge University Press, New York).
Atal, B. S. and Hanauer, S. L. (1971). “Speech analysis and synthesis by linear prediction of the speech wave,” J. Acoust. Soc. Am. 50, 637–655.
Atal, B. S. (1974). “Recent advances in predictive coding-applications to speech synthesis,” in Speech Communication: Proceedings of the Speech Communication Seminar, Stockholm, G. Fant, ed. (John Wiley, New York), pp. 27–31.
Beauchamp, J. W. (1974). “Time-variant spectra of violin tones,” J. Acoust. Soc. Am. 56(30), 995–1004.
Beauchamp, J. W. (1975). “Analysis and synthesis of cornet tones using nonlinear interharmonic relationships,” J. Audio Eng. Soc. 23(10), 778–795.
Beauchamp, J. W. (1979). “Practical sound synthesis using a nonlinear processor (waveshaper) and a high pass filter,” Computer Music J. 3(3), 42–49.
Beauchamp, J. W. (1980). “Analysis of simultaneous mouthpiece and output waveforms of wind instruments,” 66th Conv. Audio Engineering Soc., Los Angeles, Audio Eng. Soc. Preprint 1626.
Beauchamp, J. W. (1982). “Synthesis by spectral amplitude and ‘brightness’ matching of analyzed musical instrument tones,” J. Audio Eng. Soc. 30(6), 396–406.
Benade, A. H. (1976). Fundamentals of Musical Acoustics (Oxford University Press, New York).
Bennett, G., and Rodet, X. (1989). “Synthesis of the Singing Voice,” in Current Directions in Computer Music Research, M. V. Mathews and J. R. Pierce, eds. (MIT Press, Cambirdge, MA), pp 19–44.
Bogert, B., Healy, M., and Tukey, J. (1963). “The Quefrency Alanysis of Time Series for Echoes,” Proc. Symp. on Time Series Analysis, M. Rosenblatt, ed. (J. Wiley, New York), Ch. 15, pp. 209–243.
Campedel-Oudot, M., Cappé, O., and Moulines, E. (2001). “Estimation of the spectral envelope of voiced sounds using a penalized likelihood criterion,” IEEE Trans. on Speech and Audio Processing 9(5), 469–481.
Chandra, S., and Lin, W. C. (1974). “Experimental comparison between stationary and non-stationary formulations of linear prediction applied to voiced speech,” IEEE Trans. Acoustics, Speech Signal Processing ASSP-22, 403–415.
Depalle, P. (1991). “Analyse, modélisation et synthèse des sons basé es sur le modèle source/filtre,” Doctoral dissertation, Université du Maine, Le Mans, France.
Depalle, P., Garcia, G., and Rodet, X. (1993). “Tracking of partials for additive sound synthesis using hidden Markov models,” Proc. 1993 Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA-1993), New Paltz, NY (IEEE, New York), pp. 225–228.
Ding, Y. and Qian, X. (1997). “Sinusoidal and residual decomposition and residual modeling of musical tones using the QUASAR signal model,” Proc.1997 Int. Computer Music Conf., Thessaloniki, Greece (Int. Computer Music Assoc., San Francisco), pp. 35–42.
Dubnov, S., and Rodet, X. (1997). “Statistical modeling of sound aperiodicities,” Proc. 1997 Int. Computer Music Conf., Thessaloniki, Greece (Int. Computer Music Assoc., San Francisco), pp. 43–50.
El-Jaroudy, A., and Makhoul, J. (1991). “Discrete all-pole modeling,” IEEE Trans Signal Processing 39, 411–423.
Fant, G. (1970). Acoustic Theory of Speech Production with Calculations Based on X-Ray Studies of Russian Articulations (Mouton, The Hauge).
Flanagan, J. L. (1972). Speech Analysis, Synthesis and Perception (Springer-Verlag, Berlin).
Fletcher, N. H. and Tarnopolsky, A. (1999). “Blowing pressure, power, and spectrum in trumpet playing,” J. Acoust. Soc. Am. 105(2), Pt. 1, 874–881.
Freed, A., Rodet, X., and Depalle, P. (1993). “Synthesis and control of hundreds of sinusoidal partials on a desktop computer without custom hardware.” Proc. 1993 Int. Computer Music Conf, Tokyo, Japan (Int. Computer Music Assoc., San Francisco), pp. 98–101.
Freed, A. (1995). “Bring your own control to additive synthesis,” Proc. 1995 Int. Computer Music Conf., Banff, Canada (Int. Computer Music Assoc., San Francisco), pp. 303–306.
Freed, A. (1999). “Spectral line broadening with transform domain additive synthesis,” Proc. 1999 Int. Computer Music Conf., Beijing, China (Int. Computer Music Assoc., San Francisco), pp. 78–81.
Fitz, K., Haken, L., and Holloway, B. (1995). “Lemur—A tool for timbre manipulation,” Proc. 1995 Int. Computer Music Conf., Banff, Canada (Int. Computer Music Assoc., San Francisco), pp. 158–161.
Galas, T. and Rodet, X. (1990). “An improved cepstral method for deconvolution of source–filter systems with discrete spectra: Application to musical sound signals,” Proc. 1990 Int. Computer Music Conf., Glasgow, Scotland (Int. Computer Music Assoc., San Francisco), pp. 82–84.
Galas, T., and Rodet, X. (1991a). “Generalized discrete cepstral analysis for deconvolution of source–filter systems with discrete spectra,” Final Program and Paper Summaries: 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA-1991), New Paltz, NY (IEEE, New York), Paper No. 3.2.
Galas, T. and Rodet, X. (1991b). “Generalized functional approximation for source–filter system modeling,” Proc. 1991 European Conf. on Speech Communication and Technology, Genoa, Italy, pp. 1085–1088.
Giron, F. (1990). “Analyse et synthèse de sons de Shakuachi,” rapport de stage de DEA d'Acoustique de l'université du Maine, IRCAM, October, 1990.
Goodwin, M. (1996). “Residual modeling in music analysis-synthesis,” Proc. 1996 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP ′96), Atlanta, GA, (IEEE, New York), pp. 1005–1008.
Griffin, D. W., and Lim, J. S. (1985). “A new model-based speech analysis/synthesis system,” 1985 Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP ′85), Tampa, FL, (IEEE, New York), pp. 513–516.
Hamming, R. W. (1977). Digital Filters (Prentice-Hall, Englewood Cliffs, NJ).
Harris, F. J. (1978). “On the use of windows for harmonic analysis with the discrete Fourier transform,” Proc. IEEE 66(1), 51–82.
Holmes, J. N. (1983). “Formant synthesizers: Cascade or parallel,” Speech Communication 2(4), 251–273.
Horner, A. and Beauchamp, J. W. (1995). “Synthesis of trumpet tones using a wavetable and a dynamic filter,” J. Audio Eng. Soc. 43(10), 799–812.
Horner, A., and Beauchamp, J. W. (1996). “Piecewise linear approximation of additive synthesis envelopes: A comparison of various methods,” Computer Music J. 20(2), 72–95.
Itakura, F. (1975). “Line spectrum representation of linear predictive coefficients of speech signals,” J. Acoust. Soc. Am. 57, 535 (abstract).
Kay, S. M. (1988). Modern Spectral Estimation: Theory and Application (Prentice Hall, Englewood Cliffs, NJ).
Klatt, D.H. (1980) “Software for cascade/parallel formant synthesizer,” J. Acoust. Soc. Am. 67, 971–995.
Kopec, G. E. (1986). “Formant tracking using hidden Markov models and vector quantization,” IEEE Trans. on Acoustics, Speech and Signal Processing 34, 709–729.
Laroche, J., Stylianou, Y., and Moulines, E. (1993). “HNS: Speech modification based on a harmonic + noise model,” Proc. 1993 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP ′93), Minneapolis, MN, Vol. 2 (IEEE, New York), pp. 550–553.
Laura, C. and Rodet, X. (1989). “Appariement de Pics Spectraux et règles pour la synthèse de la parole par concaténation de diphones,” Actes du 1er Congrès français d'acoustique, Lyon, France, pp. 531–536.
Maher, R. C. and Beauchamp, J. W. (1990). “An investigation of vocal vibrato for synthesis,” Applied Acoustics 30(4), 219–245.
Makoul, J. (1975). “Linear prediction: A tutorial review,” Proc. IEEE 63(4), 561–580.
Marin, C. and McAdams, S. (1991). “Segregation of concurent sounds. II: Effects of spectral envelope tracing, frequency modulation coherence, and frequency modulation width,” J. Acoust. Soc. Am. 89, 341–351.
Markel, J. D. and Gray, A. H., Jr. (1980). Linear Prediction of Speech (Springer-Verlag, Berlin).
Massie, D. C. and Stonick, V. L. (1992). “The musical intrigue of pole-zero pairs,” Proc. 1992 Int. Computer Music Conf., San Jose, CA (Int. Computer Music Assoc.: San Francisco), pp. 22–25.
McAdams, S. and Rodet, X. (1988). “The role of FM-induced AM in dynamic spectral profile analysis,” Basic Issues in Hearing: Proc. 8th Int. Symposium on Hearing, H. Duifhuis, J. Horst, and H. Wit, eds. (Academic Press, London) pp. 359–369.
McAulay, R. J. and Quatieri, T. F. (1995). “Sinusoidal coding,” in Speech Coding and Synthesis, W. B. Kleijn and K.K. Paliwal, eds. (Elsevier Science, Amsterdam), pp. 121–173.
McCandleem, S. S. (1974). “An algorithm for automatic formant extraction using linear prediction spectra,” IEEE Trans. on Acoustics, Speech and Signal Processing ASSP-22, 135–141.
Mellody, M. and Wakefield, G. H. (1997). “A modal distribution study of violin vibrato,” Proc. 1997 Int. Computer Music Conf., Thessaloniki, Greece (Int. Computer Music Assoc., San Francisco), pp. 465–468.
Mellody, M. and Wakefield, G. H. (2000). “The time-frequency characteristics of violin vibrato: Modal distribution analysis and synthesis,” J. Acoust. Soc. Am. 107(1), 598–611.
Moore, B. C. J. (1997). An Introduction to the Psychology of Hearing (Academic Press, San Diego).
Moore, F. R. (1990). Elements of Computer Music (Prentice Hall, Englewood Cliffs, NJ).
Moorer, J. A. (1979). “The Use of Linear Prediction of Speech in Computer Music Applications,” J. Audio Eng. Soc. 27(3), 134–140.
Niranjan, M. and Cox, I. J. (1994). “Recursive tracking of formants in speech signals,” Proc. 1994 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP ′94), Adelaide, South Australia, Vol. 2 (IEEE, New York), pp. 205–208.
Olive, J. P. (1971). “Automatic formant tracking by a Newton-Raphson technique,” J. Acoust. Soc. Am. 50(2), 661–670.
Oppenheim, A. V. (1978). “Digital processing of speech,” in Applications of Digital Signal Processing, A. V. Oppenheim, ed., (Prentice-Hall, Englewood Cliffs, NJ), pp. 117–168.
Oppenheim, A. V., and Schafer, R. W. (1975). Digital Signal Processing (Prentice-Hall, Englewood Cliffs, NJ).
Oudot, M., Cappé, O., and Moulines, E. (1997). “Robust estimation of the spectral envelope for ‘harmonics+noise’ models,” Proc. 1997 IEEE Workshop on Speech Coding for Telecommunications, Pocono Manor, PA (IEEE, New York), 11–12.
Oudot, M. (1998). “Analyse/synthese des signaux de parole a partir d'un modele de sinusoides et de bruit. Application au codage bas debit et aux transformations prosodiques [Speech analysis/synthesis using harmonic sinewaves and noise. Application to low-bit-rate-coding and prosodic transformations],” Ecole Nationale Superieure de Telecommunications.
Peeters, G. and Rodet, X. (1998). “Signal characterization in terms of sinusoidal and non-sinusoidal components,” Proc. First COST-G6 Workshop on Digital Audio Effects (DAFX98), Barcelona, Spain.
Pierucci, P., and Paladin, A. (1997). “Singing voice analysis and synthesis system through glottal excited formant resonators,” Proc. 1997 Int. Computer Music Conf., Thessaloniki, Greece (Int. Computer Music Assoc., San Francisco), pp. 168–171.
Potard, Y., Baisnée, P. F., and Barrière, J. B. (1986). “Experimenting with models of resonance produced by a new technique for the analysis of impulsive sounds,” Proc. 1986 Int. Computer Music Conf., The Hague, Netherlands (Int. Computer Music Assoc., San Francisco), pp. 269–274.
Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. (1992). Numerical Recipes in C: The Art of Scientific Computing, 2nd ed. (Cambridge University Press, New York).
Risset, J.-C., and Mathews, M. V. (1969). “Analysis of musical-instrument tones,” Physics Today 22(2), 23–30.
Rodet, X. and Delatre, J. (1979). “Time-domain speech synthesis by rules using a flexible and fast signal management system,” Proc. 1979 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP ′79), Washington, D.C. (IEEE, New York), pp. 895–898.
Rodet, X. (1980). “Time-domain formant-wave-function synthesis,” Spoken Language Generation and Understanding: Proc. NATO Advanced Study Institute, Bonas, France, J.C. Simon, ed. (D. Reidel Pub. Co., Dordrecht, Holland), pp. 429–441.
Rodet, X. (1984). “Time-domain formant-wave-function synthesis,” Computer Music J. 8(3), 9–14.
Rodet, X., Potard, Y., and Barrière, J. B. (1984). “The Chant Project: From synthesis of the singing voice to synthesis in general,” Computer Music J. 8(3), 15–31.
Rodet, X., and Depalle, P. (1985). “Synthesis by rule: LPC diphones and calculation of formant trajectories,” Proc. 1985 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP ′85), Tampa, FL, Vol. 2 (IEEE, New York), pp. 736–739.
Rodet, X., and Depalle, P. (1986). “Use of LPC spectral estimation for music analysis, processing and synthesis,” Final Program and Paper Summaries for the 1986 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA-1986), New Paltz, NY (IEEE, New York), Paper No. 5.5.
Rodet, X., Depalle, P., and Poirot, G. (1987). “Speech analysis and synthesis methods based on spectral envelopes and voiced/unvoiced functions,” European Conf. on Speech Technology, 1987, Edinburgh, U.K., pp. 155–158.
Rodet, X., Depalle, P., and Poirot, G. (1988). “Diphone sound synthesis based on spectral envelopes and harmonic/noise excitation functions,” Proc. 1988 Int. Computer Music Conf., Cologne, Germany (Int. Computer Music Assoc., San Francisco), pp. 313–321.
Rodet, X. and Depalle, P. (1992). “Spectral envelopes and inverse FFT synthesis”, 93rd Convention of the Audio Eng. Soc., San Francisco, CA, Audio Eng. Soc. Preprint No. 3393.
Rodet, X., Depalle, P., and Garcia, G. (1995). “New possibilities in sound analysis and synthesis.” Proc. 1995 Int. Symposium on Musical Acoustics, Dourdan, France.
Rodet, X., and Lefèvre, A. (1997). “The Diphone program: New features, new synthesis methods and experience of musical use,” Proc. 1997 Int. Computer Music Conf., Thessaloniki, Greece (Int. Computer Music Assoc., San Francisco), pp. 418–421.
Sandler, M. B. (1989). “Auto Regressive modelling and synthesis of acoustic instruments,” 86th Convention of the Audio Eng. Soc., Hamburg, Germany, Audio Eng. Soc. Preprint 2758.
Schafer, R. W. and Rabiner, L. R. (1970). “System for automatic formant analysis of voiced speech,” J. Acoust. Soc. Am. 47, 634–648.
Schwarz, D. (1998). Spectral Envelopes in Sound Analysis and Synthesis, Universität Stuttgart, Fakultät Informatik, Diplomarbeit Nr. 1622, Stuttgart, Germany, June 1998.
Schwarz, D. and Rodet, X. (1999). “Spectral envelope estimation and representation for sound analysis-synthesis,” Proc. 1999 Int. Computer Music Conf., Beijing, China (Int. Computer Music Assoc., San Francisco), pp. 351–354.
Serra, X. (1989). “A System for Sound Analysis/Transformation/Synthesis Based on a Deterministic plus Stochastic Decomposition,” Doctoral dissertation, Stanford University, Stanford, CA. Dissertation Abstracts International-A. 51/01, 18.
Serra, X. and Smith, J. O. (1990). “Spectral modeling synthesis: A sound analysis/synthesis system based on a deterministic plus stochastic decomposition,” Computer Music J. 14(4), 12–24.
Serra, X., Bonada, J., Herrera, P., and Loureiro, R. (1997). “Integrating complementary spectral models in the design of a musical synthesizer,” Proc. 1997 Int. Computer Music Conf., Thessaloniki, Greece (Int. Computer Music Assoc., San Francisco), pp. 152–159.
Smith, J. O. (1985). “Introduction to digital filter theory,” in Digital Audio Signal Processing: An Anthology, J. Strawn, ed. (William Kaufmann, Los Altos, CA), pp. 69–135. Also available as Stanford University Dept. of Music Technical Report STAN-M–20.
Soong, F. K., and Juang, B.-H. (1984). “Line spectrum pair (LSP) and speech data compression,” Proc. 1984 IEEE Int. Conf. on Acoustics, Speech and Digital Processing (ICASSP ′84), San Diego, CA (IEEE, New York), pp. 1.10.1–1.10.4.
Unser, M., Aldroubi, A., and Eden, M. (1993). “B-spline signal processing: I—Theory,” IEEE Trans. Signal Processing 41, 821–833.
Virolle, D., Schwarz, D., and Rodet, X. (2001). SDIF: Sound Description Interchange Format. [see http://recherche.ircam.fr/sdif/].
Vishwanathan, R., and Makhoul, J. (1978). “Adaptive lattice methods for linear prediction,” Proc. 1978 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP ′78), Tulsa, OK (IEEE, New York), pp. 83–86.
Wanderley, M. M., Schnell, N., and Rovan, J. (1998). “ESCHER—Modeling and performing composed instruments in real-time,” Proc. 1998 IEEE Symposium on Systems, Man, and Cybernetics, San Diego, CA (IEEE, New York), pp. 1080–1084.
Wright, M., Chaudhary, A., Freed, A., Wessel, D., Rodet, X., Virolle, D., Woehrmann, R., and Serra, X. (1998). “New applications of the sound description interchange format,” Proc. 1998 Int. Computer Music Conf., Ann Arbor, MI (Int. Computer Music Assoc., San Francisco), pp. 276–279.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer
About this chapter
Cite this chapter
RODET, X., SCHWARZ, D. (2007). Spectral Envelopes and Additive + Residual Analysis/Synthesis. In: Beauchamp, J.W. (eds) Analysis, Synthesis, and Perception of Musical Sounds. Modern Acoustics and Signal Processing. Springer, New York, NY. https://doi.org/10.1007/978-0-387-32576-7_5
Download citation
DOI: https://doi.org/10.1007/978-0-387-32576-7_5
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-32496-8
Online ISBN: 978-0-387-32576-7
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)