Abstract
This chapter describes an audio representation which supports time and frequency scale modifications in a compressed domain. The input audio is segregated into three component representations: sinusoids, transients, and noise. Each component can be individually quantized and/or time-scaled and/or pitch-shifted.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ali, M. (1996). “Adaptive signal representation with application in audio coding,” doctoral dissertation, Univ. of Minnesota, Minneapolis, MN, Dissertation Abstracts Int.-B 57-04, 2739.
Bosi, M., Brandenburg, K., Quackenbush, S., Fielder, L., Akagiri, K., Fuchs, H., Dietz, M., Herre, J., Davidson, G., and Oikawa, Y. (1997). “ISO-IEC MPEG-2 advanced audio coding,” J. Audio Eng. Soc. 45, 789–814.
Bosi, M. and Goldberg, R.E. (2003). Introduction to Digital Audio Coding and Standards (Klumer Academic, Boston).
Brandenburg, K. and Bosi, M. (1997), “Overview of MPEG audio: Current and future standards for low-bit-rate audio coding,” J. Audio Eng. Soc. 45(1/2), 4–21.
Dolson, M. (1986). “The phase vocoder: A tutorial,” Computer Music J. 10(4), 14–27.
Dudley, H. (1939). “Remaking speech,” J. Acoustical Soc. Am. 11, 169–177.
Edler, B., Purnhagen, H., and Ferekidis, C. (1996). “ASAC—analysis/synthesis audio codec for very low-bit rates,” 100th Convention of the Audio Engineering Society, Copenhagen, Audio Eng. Soc. Preprint No. 4179.
Flanagan, J. L., and Golden, R. M. (1966). “Phase vocoder,” Bell Syst. Tech. J. 45, 1493–1509. [reprinted in Speech Analysis, R. W. Schafer and J. D. Markel (eds.), IEEE Press, New York, 1979, pp. 388–404].
Fliege, N. J., and Zolzer, U. (1993). “Multi-complementary filter bank,” Proc. 1993 Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-93), Minneapolis (IEEE, New York), Vol. 3, pp. 193–196.
General Electric Co. (1977). “ADEC subroutine description,” Technical Report, Heavy Military Electronics Department (General Electric Co., Syracuse, NY).
George, E. B. and Smith, M. J. T. (1987). “A new speech coding model based on least-squares sinusoidal representation,” Proc. 1987 Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-87), Dallas, TX (IEEE, New York), pp. 1641–1644.
George, E. B., and Smith, M. J. T. (1992). “Analysis-by-synthesis/Overlap-add sinusoidal modeling applied to the analysis and synthesis of musical tones,” J. Audio Eng. Soc. 40(6), 497–516.
Goodwin, M. (1996). “Residual modeling in music analysis/synthesis,” in Proc. 1996 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-96), Atlanta, GA (IEEE, New York), pp. 1005–1008.
Griffin, D. W., and Lim, J. S. (1988). “Multiband excitation vocoder,” IEEE Trans. on Acoustics, Speech, Signal Processing 36(8), 1223–1235.
Hamdy, K. N., Ali, M., and Tewfik, A. H. (1996). “Low bit rate high quality audio coding with combined harmonic and wavelet representations,” Proc. 1996 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-96), Atlanta, GA (IEEE, New York), pp. 1045–1048.
Horner, A. and Beauchamp, J. (1996). “Piecewise Linear Approximation of Additive Synthesis Envelopes: A Comparison of Various Methods,” Computer Music J. 20(2), 72–95.
Horner, A., Ayers, L., and Law, D., (1997). “Modeling Small Chinese and Tibetan Bells,” J. Audio Eng. Soc. 45(3), 148–159.
Huffman, D. A. (1952). “A Method for the Construction of Minimum-Redundancy Codes,” Proc. IRE 40, 1098–1101.
ISE/IEC JTC 1/SC 29/WG 11 (1993). “ISO/IEC 11172-3: Information technology—coding of moving pictures and associated audio for digital storage media at up to about 1.5 mbit/s—Part 3: Audio” (Motion Picture Experts Group, Los Angeles, CA).
Laroche, J., Stylianou, Y., and Moulines, E. (1993). “HNM: A simple, efficient harmonic + noise model for speech,” Proc. 1993 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA-93), New Paltz, NY (IEEE, New York), pp. 169–172.
Laroche, J., and Dolson, M. (1999). “Improved Phase-Vocoder Time-Scale Modification of Audio,” IEEE Trans. Speech and Audio Processing 7(3), 323–332.
Levine, S. N. (1998). “Audio representations for data compression and compressed domain processing,” doctoral dissertation, Stanford University, Dissertation Abstracts Int.-B 60/04, 1767. [available for download at http://www-ccrma.stanford.edu/thesis.html; this site also includes audio examples.]
Levine, S. N., and Smith, J. O. (1998). “A sines+transients+noise audio representation for data compression and time/pitch-scale modications,” 105th Convention of the Audio Eng. Soc., San Francisco, Audio Eng. Soc. Preprint 4781. [available for download at http://www-ccrma.stanford.edu/papers.html.]
Levine, S. N., Verma, T. S., and Smith, J. O. (1998). “Multiresolution sinusoidal modeling for wideband audio with modifications,” Proc. 1998 Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-98), Seattle (IEEE, New York), pp. 3585–3588.
Levine, S. N., and Smith, J. O. (1999). “A switched parametric and transform audio coder,” in Proc. 1999 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-99), Phoenix (IEEE, New York), pp. 985–988. [available for download at http://www-ccrma.stanford.edu/papers.html.]
Malvar, H. (1992). Signal Processing with Lapped Transforms (Artech House Telecommunications Library, Boston), pp. 175–179.
McAulay, R. J. and Quatieri, T. F. (1984). “Magnitude-only reconstruction using a sinusoidal speech model,” Proc. 1984 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-84), San Diego (IEEE, New York), pp. 27.6.1–27.6.4.
McAulay, R. J. and Quatieri, T. F. (1985). “Mid-rate coding based on a sinusoidal representation of speech,” Proc. 1985 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-85), Tampa, FL (IEEE, New York), pp. 945–948.
McAulay, R. J., and Quatieri, T. F. (1986). “Speech analysis/synthesis based on a sinusoidal representation,” IEEE Trans. on Acoustics, Speech and Signal Processing 34, 744–754.
McAulay, R. J., and Quatieri, T. F. (1990). “Pitch estimation and voicing detection based on a sinusoidal speech model,” Proc. 1990 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-90), Albuquerque, NM (IEEE, New York), pp. 249–252.
McAulay, R. J., and Quatieri, T. F. (1991). “Sine-wave phase coding at low data rates,” Proc. 1991 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-91), Toronto, Canada (IEEE, New York), pp. 577–580.
Moorer, J. A. (1978). “The use of the phase vocoder in computer music applications,” J. Audio Eng. Soc. 26, 42–45.
Painter, T. and Spanias, A. (2000). “Perceptual coding of digital audio,” Proc. IEEE 88(4), 451–513.
Peterson, E., and Cooper, F. S. (1957). “Peakpicker: A bandwidth compression device” (abstract), J. Acoust. Soc. Am. 29, 777.
Portnoff, M. R. (1976). “Implementation of the digital phase vocoder using the fast Fourier transform,” IEEE Trans. on Acoustics, Speech, Signal Processing ASSP-24, 243–248.
Princen, J. P., and Bradley, A. B. (1986). “Analysis/synthesis filter bank design based on time domain aliasing cancellation,” IEEE Trans. on Acoustics, Speech, Signal Processing ASSP-34, 1153–1161.
Quatieri, T. F. and McAulay, R. J. (1986). “Speech transformations based on a sinusoidal representation,” IEEE Trans. on Acoustics, Speech, Signal Processing ASSP-34, 1449–1464.
Quatieri, T. F., and McAulay, R. J. (1989). “Phase coherence in speech reconstruction for enhancement and coding applications,” Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-89), Glasgow, Scotland (IEEE, New York), pp. 207–210.
Quatieri, T. F., and McAulay, R. J. (1998). “Audio signal processing based on sinusoidal analysis/synthesis,” in Applications of Digital Signal Processing to Audio and Acoustics, M. Kahrs and K. Brandenburg, eds. (Kluwer, Boston, MA), pp. 343–416.
Risset, J.-C. (1985). “Computer music experiments, 1964-⋯,” Computer Music J. 9(1), 11–18.
Roads, C. (Ed.). (1989). The Music Machine: Selected Readings from Computer Music Journal (MIT Press, Cambridge, MA).
Roads, C., Pope, S. T., Piccialli, A., and De Poli, G. (eds.). (1997). Musical Signal Processing (Swets and Zietlinger, Exton, PA).
Rodet, X. and Depalle, P. (1992). “Spectral envelopes and inverse FFT synthesis,” 93rd Convention of the Audio Eng. Soc., San Francisco, CA, Audio Eng. Soc. Preprint 3393.
Schafer, R. W., and Markel, J. D. (eds.). (1979). Speech Analysis (IEEE Press, New York).
Schroeder, M. R. (1966). “Vocoders: Analysis and synthesis of speech (a review of 30 years of applied speech research),” Proc. IEEE 56, 720–734. [reprinted in Speech Analysis, R. W. Schafer and J. D. Markel (eds.), (IEEE Press, New York), 1979, pp. 352–366].
Serra, X. (1989). “A System for Sound Analysis/Transformation/Synthesis Based on a Deterministic Plus Stochastic Decomposition,” doctoral dissertation, Stanford University, Dissertation Abstracts Int.-A, 51/01, 18 [also available as Dept. of Music Report No. STAN-M-58, Stanford Univ., 1989].
Serra, X. and Smith, J. O. (1990). “Spectral modeling synthesis: A sound analysis/synthesis system based on a deterministic plus stochastic decomposition,” Computer Music J. 14, 12–24.
Serra, X. and Smith, J. O. (1991). “Soundsheet examples for a sound analysis/synthesis system based on a deterministic plus stochastic decomposition,” Computer Music J. 15, 86–87.
Smirnov, A. (1998). “Proto musique concrete: Russian futurism in the 10s and 20s and early ideas of sonic art and art of noises,” presented at Inventionen 98 Festival, September 28, 1998, Haus des Rundfunks, Berlin, Germany.
Smith, J. O. and Serra, X. (1987). “PARSHL: An analysis/synthesis program for non-harmonic sounds based on a sinusoidal representation,” Proc. 1987 Int. Computer Music Conf. (ICMC-87), Urbana, IL (Computer Music Assoc., San Francisco), pp. 290–297. (also available as Dept. of Music Technical Report STAN-M-43, Stanford Univ., 1987.)
Smith, J. O. (1998). “Principles of digital waveguide models of musical instruments,” in Applications of Digital Signal Processing to Audio and Acoustics, M. Kahrs and K. Brandenburg, eds. (Kluwer Academic Publishers, Boston), pp. 417–466.
Smith, J. O. (2004). Physical Audio Signal Processing: Digital Waveguide Modeling of Musical Instruments and Audio Effects, available online at http://ccrma.stanford.edu/pasp.
Thomson, D. J. (1982). “Spectrum estimation and harmonic analysis,” Proc. IEEE 70(9), 1055–1096.
Verma, T. S., Levine, S. N., and Meng, T. H. Y. (1997). “Transient modeling synthesis: a flexible analysis/synthesis tool for transient signals,” Proc. 1997 Int. Computer Music Conf. (ICMC-97), Thessaloniki, Greece (Int. Computer Music Assoc., San Francisco), pp. 164–167.
Wang, A. L. (1995). “Instantaneous and frequency-warped techniques for source separation and signal parametrization,” Proc. 1995 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA-95), New Paltz, NY (IEEE, New York), Paper 2.5.
Zwicker, E. (1961). “Subdivision of the Audible Frequency Range into Critical Bands (Frequenzgruppen),” J. Acoust. Soc. Am. 33(2), 248.
Zwicker, E., and Fastl, H. (1990). Psychoacoustics, Facts, and Models (Springer-Verlag, Berlin).
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer
About this chapter
Cite this chapter
LEVINE, S.N., SMITH III, J.O. (2007). A Compact and Malleable Sines+Transients+Noise Model for Sound. In: Beauchamp, J.W. (eds) Analysis, Synthesis, and Perception of Musical Sounds. Modern Acoustics and Signal Processing. Springer, New York, NY. https://doi.org/10.1007/978-0-387-32576-7_4
Download citation
DOI: https://doi.org/10.1007/978-0-387-32576-7_4
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-32496-8
Online ISBN: 978-0-387-32576-7
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)