Spatial Audio Coding and MPEG Surround

  • Christof Faller

22.1 Introduction

Surround sound has been widely utilized in cinemas for decades, but wide adoption of home cinema surround was only enabled recently by the digital video disc (DVD). While a compact disc (CD) stores its stereo audio content uncompressed as 16-bit PCM at a sampling rate of 44.1 kHz, the DVD stores its six multichannel surround channels (five main audio channels plus a low-frequency effects channel) compressed with the perceptual audio coder AC-3 from Dolby Laboratories. The multichannel surround signal would require too much storage space on the DVD and thus it is compressed.

The most prominent perceptual audio coder is MP3 (MPEG-1 Layer 3). Being a relatively old audio coder it only supports up to two audio channels. Many perceptual audio coders can also code multichannel surround signals and achieve roughly a compression ratio of 10. Spatial audio codingis motivated by the need to code multichannel audio at lower bitrates. It enables coding of multichannel audio at...


Audio Signal Interaural Time Difference Short Time Fourier Transform Virtual Source Interaural Level Difference 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Rec. ITU-R BS.775, Multi-Channel Stereophonic Sound System with or without Accompanying Picture, ITU, 1993,
  2. 2.
    R. Zelinski and P. Noll, “Adaptive transform coding of speech signals,” IEEE Trans. Acoust. Speech, and Signal Processing, vol. 25, pp. 299–309, August 1977.CrossRefGoogle Scholar
  3. 3.
    K. Brandenburg, G. G. Langenbucher, H. Schramm, and D. Seitzer, “A digital signal processor for real time adaptive transform coding of audio signal up to 20 khz bandwidth,” in Proc. ICCC, 1982, pp. 474–477.Google Scholar
  4. 4.
    J. D. Johnston and A. J. Ferreira, “Sum-difference stereo transform coding,” in Proc. ICASSP-92, 1992, pp. 569–572.Google Scholar
  5. 5.
    J. Blauert, Spatial Hearing: The Psychophysics of Human Sound Localization, The MIT Press, Cambridge, MA, revised edition, 1997.Google Scholar
  6. 6.
    L. D. Fielder, M. Bosi, G. Davidson, M. Davis, C. Todd, and S. Vernon, “AC-2 and AC-3: low-complexity transform-based audio coding,” in Collected Papers on Digital Audio Bit-Rate Reduction, N. Gilchrist and C. Grewin, Eds., pp. 54–72. Audio Engineering Society, 1996.Google Scholar
  7. 7.
    K. Tsutsui, H. Suzuki, O. Shimoyoshi, M. Sonohara, K. Akagiri, and R. M. Heddle, “ATRAC: Adaptive transform acoustic coding for MiniDisc,” in Collected Papers on Digital Audio Bit-Rate Reduction, N. Gilchrist and C. Grewin, Eds., pp. 95–101. Audio Engineering Society, 1996.Google Scholar
  8. 8.
    D. Sinha, J. D. Johnston, S. Dorward, and S. Quackenbush, “The perceptual audio coder (PAC),” in The Digital Signal Processing Handbook, V. Madisetti and D. B. Williams, Eds., Chapter 42. CRC Press, IEEE Press, Boca Raton, FL, 1997.Google Scholar
  9. 9.
    K. Brandenburg and G. Stoll, “ISO-MPEG-1 audio: a generic standard for coding of high-quality digital audio,” J. Audio Eng. Soc., pp. 780–792, Oct. 1994.Google Scholar
  10. 10.
    ISO/IEC, Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s, Part 3: audio, ISO/IEC 11172-3 International Standard, 1993, JTC1/SC29/WG11.Google Scholar
  11. 11.
    G. Stoll, “ISO-MPEG-2 audio: A generic standard for the coding of two-channel and multichannel sound,” in Collected Papers on Digital Audio Bit-Rate Reduction, N. Gilchrist and C. Grewin, Eds., pp. 43–53. Audio Engineering Society, 1996.Google Scholar
  12. 12.
    M. Bosi, K. Brandenburg, S. Quackenbush, L. Fielder, K. Akagiri, H. Fuchs, M. Dietz, J. Herre, G. Davidson, and Y. Oikawa, “ISO/IEC MPEG-2 advanced audio coding,” J. Audio Eng. Soc., vol. 45, no. 10, pp. 789–814, 1997.Google Scholar
  13. 13.
    B. Grill, “The MPEG-4 general audio coder,” in Proc. AES 17th Int. Conf.: High-Quality Audio Coding, Florence, Italy, September 1999, AES, pp. 147–156.Google Scholar
  14. 14.
    ISO/IEC, Generic coding of moving pictures and associated audio information, Part 7: advanced audio coding, ISO/IEC 13818-7 International Standard, 1997, JTC1/SC29/WG11.Google Scholar
  15. 15.
    J. Herre, K. Brandenburg, and D. Lederer, “Intensity stereo coding,” 96th AES Conv., Feb. 1994, Amsterdam (preprint 3799), 1994.Google Scholar
  16. 16.
    M. Dietz, L. Liljeryd, K. Kjörling, and O. Kunz, “Spectral band replication — a novel approach in audio coding,” in Preprint 112th Conv. Aud. Eng. Soc., May 2002.Google Scholar
  17. 17.
    J. Makhoul and M. Berouti, “High-frequency regeneration in speech coding systems,” in Proc. ICASSP, 1979, vol. 428–431.Google Scholar
  18. 18.
    K. Gundry, “A new active matrix decoder for surround sound,” in Proc. AES 19th Int. Conf., June 2001.Google Scholar
  19. 19.
    R. Y. Litovsky, H. S. Colburn, W. A. Yost, and S. J. Guzman, “The precedence effect,” J. Acoust. Soc. Am., vol. 106, no. 4, pp. 1633–1654, Oct. 1999.CrossRefGoogle Scholar
  20. 20.
    C. Faller, “Matrix surround revisited,” in Proc. 30th Int. Conv. Aud. Eng. Soc., March 2007.Google Scholar
  21. 21.
    B. Bernfeld, “Attempts for better understanding of the directional stereophonic listening mechanism,” in Preprint 44th Conv. Aud. Eng. Soc., Feb. 1973.Google Scholar
  22. 22.
    W. Gaik, “Combined evaluation of interaural time and intensity differences: psychoacoustic results and computer modeling,” J. Acoust. Soc. Am., vol. 94, no. 1, pp. 98–110, July 1993.CrossRefGoogle Scholar
  23. 23.
    R. I. Chernyak and N. A. Dubrovsky, “Pattern of the noise images and the binaural summation of loudness for the different interaural correlation of noise,” in Proc. 6th Int. Congr. on Acoustics Tokyo, 1968, vol. 1, pp. A-3–12 (see Blauert 1997, Fig. 3.24).Google Scholar
  24. 24.
    G. Theile and G. Plenge, “Localization of lateral phantom sources,” J. Audio Eng. Soc., vol. 25, no. 4, pp. 196–200, 1977.Google Scholar
  25. 25.
    V. Pulkki, “Localization of amplitude-panned sources II: Two- and three-dimensional panning,” J. Audio Eng. Soc., vol. 49, no. 9, pp. 753–757, 2001.Google Scholar
  26. 26.
    R. Mason, Elicitation and measurement of auditory spatial attributes in reproduced sound, Ph.D. thesis, University of Surrey, 2002 (a review of existing measurements that relate to spatial impression).Google Scholar
  27. 27.
    D. H. Mershon and L. E. King, “Intensity and reverberation as factors in the auditory perception of egocentric distance,” Percept. Psychophys., vol. 18, no. 6, pp. 409–415, 1975.Google Scholar
  28. 28.
    D. H. Mershon and J. N. Bowers, “Absolute and relative cues for the auditory perception of egocentric distance,” Perception, vol. 8, pp. 311–322, 1979.CrossRefGoogle Scholar
  29. 29.
    P. D. Coleman, “Failure to localize the source distance of an unfamiliar sound,” J. Acoust. Soc. Am., vol. 34, pp. 345–346, 1962.CrossRefGoogle Scholar
  30. 30.
    A. Bronkhorst and T. Houtgast, “Auditory distance perception in rooms,” Nature, vol. 397, pp. 517–520, Feb. 1999.CrossRefGoogle Scholar
  31. 31.
    M. Morimoto and Z. Maekawa, “Auditory spaciousness and envelopment,” in Proc. 13th Int. Congr. on Acoustics, Belgrade, 1989, vol. 2, pp. 215–218.Google Scholar
  32. 32.
    J. S. Bradley and B. A. Soulodre, “Listener envelopment: an essential part of good concert hall acoustics,” J. Acoust. Soc. Am., vol. 99, pp. 22, Jan. 1996.CrossRefGoogle Scholar
  33. 33.
    J. S. Bradley, “Comparison of concert hall measurements of spatial impression,” J. Acoust. Soc. Am., vol. 96, no. 6, pp. 3525–3535, 1994.CrossRefGoogle Scholar
  34. 34.
    T. Okano, L. L. Beranek, and T. Hidaka, “Relations among interaural cross-correlation coefficient (IACCE), lateral fraction (LFE), and apparent source width (asw) in concernt halls,” J. Acoust. Soc. Am., vol. 104, no. 1, pp. 255–265, July 1998.CrossRefGoogle Scholar
  35. 35.
    K. Kurozumi and K. Ohgushi, “The relationship between the cross-correlation coefficient of two-channel acoustic signals and sound image quality, and apparent source width (asw) in concernt halls,” J. Acoust. Soc. Am., vol. 74, no. 6, pp. 1726–1733, Dec. 1983.CrossRefGoogle Scholar
  36. 36.
    B. C. J. Moore and B. R. Glasberg, “Suggested formula for caculating auditory-filter bandwidth and excitation patterns,” J. Acoust. Soc. Aneruca, vol. 74, pp. 750–753, 1983.CrossRefGoogle Scholar
  37. 37.
    I. Holube, M. Kinkel, and B. Kollmeier, “Binaural and monaural auditory filter bandwidths and time constants in probe tone detection experiments,” J. Acoust. Soc. Am., vol. 104, no. 4, pp. 2412–2425, Oct. 1998.CrossRefGoogle Scholar
  38. 38.
    J. F. Culling, H. S. Colburn, and M. Spurchise, “Interaural correlation sensitivity,” J. Acoust. Soc. Am., vol. 110, no. 2, pp. 1020–1029, Aug. 2001.CrossRefGoogle Scholar
  39. 39.
    William Morris Hartmann, “Listening in a room and the precedence effect,” in Binaural and Spatial Hearing in Real and Virtual Environments, Robert H. Gilkey and Timothy R. Anderson, Eds., pp. 349–376. Lawrence Erlbaum Associates, Mahwah, NJ, 1997.Google Scholar
  40. 40.
    C. Faller and J. Merimaa, “Source localization in complex listening situations: selection of binaural cues based on interaural coherence,” J. Acoust. Soc. Am., vol. 116, no. 5, pp. 3075–3089, Nov. 2004.CrossRefGoogle Scholar
  41. 41.
    C. Faller and F. Baumgarte, “Efficient representation of spatial audio using perceptual para-metrization,” in Proc. IEEE Workshop on Appl. Sig. Proc. to Audio and Acoust., Oct. 2001, pp. 199–202.Google Scholar
  42. 42.
    C. Faller and F. Baumgarte, “Binaural cue coding applied to stereo and multi-channel audio compression,” in Preprint 112th Conv. Aud. Eng. Soc., May 2002.Google Scholar
  43. 43.
    F. Baumgarte and C. Faller, “Binaural cue coding, Part 1: psychoacoustic fundamentals and design principles,” IEEE Trans. Speech Audio Proc., vol. 11, no. 6, pp. 509–519, Nov. 2003.CrossRefGoogle Scholar
  44. 44.
    C. Faller and F. Baumgarte, “Binaural cue coding, Part 2: schemes and applications,” IEEE Trans. Speech Audio Proc., vol. 11, no. 6, pp. 520–531, Nov. 2003.CrossRefGoogle Scholar
  45. 45.
    A. Baumgarte, C. Faller, and P. Kroon, “Audio coder enhancement using scalable binaural cue coding with equalized mixing,” in Preprint 116th Conv. Aud. Eng. Soc., May 2004.Google Scholar
  46. 46.
    E. Schuijers, J. Breebaart, H. Purnhagen, and J. Engdegard, “Low complexity parametric stereo coding,” in Preprint 117th Conv. Aud. Eng. Soc., May 2004.Google Scholar
  47. 47.
    C. Faller, Parametric Coding of Spatial Audio, Ph.D. thesis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland, July 2004, Thesis No. 3062, theses/?nr=3062.
  48. 48.
    J. Herre, K. Kjörling, J. Breebaart, C. Faller, S. Disch, H. Purnhagen, J. Koppens, J. Hilpert, J. Rödén, W. Oomen, K. Linzmeier, and K. S. Chong, “Mpeg surround — the iso/mpeg standard for efficient and compatible multi-channel audio coding,” in Preprint 122th Conv. Aud. Eng. Soc., May 2007.Google Scholar
  49. 49.
    J. Breebaart and C. Faller, Spatial Audio Processing: MPEG Surround and Other Applications, Wiley, Jan. 2008.Google Scholar
  50. 50.
    C. Faller, “Parametric multi-channel audio coding: Synthesis of coherence cues,” IEEE Trans. on Speech and Audio Proc., vol. 14, no. 1, pp. 299–310, Jan. 2006.CrossRefGoogle Scholar
  51. 51.
    J. Herre, C. Faller, C. Ertel, J. Hilpert, A. Hoelzer, and C. Spenger, “MP3 surround: efficient and compatible coding of multi-channel audio,” in Preprint 116th Conv. Aud. Eng. Soc., May 2004.Google Scholar
  52. 52.
    C. Faller, “Coding of spatial audio compatible with different playback formats,” in Preprint 117th Conv. Aud. Eng. Soc., October 2004.Google Scholar
  53. 53.
    J. Breebaart, G. Hotho, J. Koppens, E. Schuijers, W. Oomen, and S. van de Par, “Background, concept and architecture for the recent mpeg surround standard on multi-channel audio compression,” J. Audio Eng. Soc., vol. 55, no. 5, pp. 331–351, May 2007.Google Scholar
  54. 54.
    J. Huopaniemi, Virtual Acoustics and 3D Sound in Multimedia Signal Processing, Ph.D. thesis, Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, Finland, 1999, Rep. 53.Google Scholar
  55. 55.
    J. Breebaart, L. Villemoes, and K. Kjörling, “Binaural rendering in mpeg surround,” EURASIP J. Advances in Signal Processing, 2008, doi: 10.1155/2008/732895.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Christof Faller
    • 1
  1. 1.Audiovisual Communications LaboratoryEPFLLausanneSwitzerland

Personalised recommendations