Skip to main content

Audio Content Descriptors of Timbre

  • Chapter
  • First Online:
Timbre: Acoustics, Perception, and Cognition

Part of the book series: Springer Handbook of Auditory Research ((SHAR,volume 69))

Abstract

This chapter introduces acoustic modeling of timbre with the audio descriptors commonly used in music, speech, and environmental sound studies. These descriptors derive from different representations of sound, ranging from the waveform to sophisticated time-frequency transforms. Each representation is more appropriate for a specific aspect of sound description that is dependent on the information captured. Auditory models of both temporal and spectral information can be related to aspects of timbre perception, whereas the excitation-filter model of sound production provides links to the acoustics of sound production. A brief review of the most common representations of audio signals used to extract audio descriptors related to timbre is followed by a discussion of the audio descriptor extraction process using those representations. This chapter covers traditional temporal and spectral descriptors, including harmonic description, time-varying descriptors, and techniques for descriptor selection and descriptor decomposition. The discussion is focused on conceptual aspects of the acoustic modeling of timbre and the relationship between the descriptors and timbre perception, semantics, and cognition, including illustrative examples. The applications covered in this chapter range from timbre psychoacoustics and multimedia descriptions to computer-aided orchestration and sound morphing. Finally, the chapter concludes with speculation on the role of deep learning in the future of timbre description and on the challenges of audio content descriptors of timbre.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Abreu J, Caetano M, Penha R (2016) Computer-aided musical orchestration using an artificial immune system. In: Johnson C, Ciesielski V, Correia J, Machado P (eds) Evolutionary and biologically inspired music, sound, art and design, lecture notes in computer science, vol 9596. Springer, Heidelberg, pp 1–16

    Chapter  Google Scholar 

  • Almeida A, Schubert E, Smith J, Wolfe J (2017) Brightness scaling of periodic tones. Atten Percept Psychophys 79(7):1892–1896

    Article  PubMed  Google Scholar 

  • Amatriain X, Bonada J, Loscos À et al (2003) Content-based transformations. J New Music Res 32(1):95–114

    Article  Google Scholar 

  • Aucouturier J-J, Defreville B, Pachet F (2007) The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music. J Acoust Soc Am. https://doi.org/10.1121/1.2750160

    Article  PubMed  Google Scholar 

  • Barthet M, Depalle P, Kronland-Martinet R, Ystad S (2010) Acoustical correlates of timbre and expressiveness in clarinet performance. Music Percept 28(2):135–153

    Article  Google Scholar 

  • Bogert BP, Healy MJR, Tukey JW (1963) The quefrency analysis of time series for echoes: cepstrum, pseudo autocovariance, cross-cepstrum and saphe cracking. In: Rosenblatt M (ed) Time series analysis. Wiley, New York, pp 209–243

    Google Scholar 

  • Brown JC (1991) Calculation of a constant Q spectral transform. J Acoust Soc Am 89(1):425–434

    Article  Google Scholar 

  • Brown JC (1999) Computer identification of musical instruments using pattern recognition with cepstral coefficients as features. J Acoust Soc Am 105(3). https://doi.org/10.1121/1.426728

    Article  CAS  PubMed  Google Scholar 

  • Brown JC, Houix O, McAdams S (2001) Feature dependence in the automatic identification of musical woodwind instruments. J Acoust Soc Am 109(3):1064–1072. https://doi.org/10.1121/1.1342075

    Article  CAS  PubMed  Google Scholar 

  • Brown JC, Puckette MS (1992) An efficient algorithm for the calculation of a constant q transform. J Acoust Soc Am 92(5):2698–2701

    Article  Google Scholar 

  • Burred JJ, Röbel A (2010) A segmental spectro-temporal model of musical timbre. In: Zotter F (ed) Proceedings of the 13th international conference on digital audio effects (DAFx-10). IEM, Graz

    Google Scholar 

  • Burred JJ, Röbel A, Sikora T (2010) Dynamic spectral envelope modeling for timbre analysis of musical instrument sounds. IEEE Trans Audio Speech Lang Proc 18(3):663–674

    Article  Google Scholar 

  • Caclin A, McAdams S, Smith BK, Winsberg S (2005) Acoustic correlates of timbre space dimensions: a confirmatory study using synthetic tones. J Acoust Soc Am 118:471–482

    Article  PubMed  Google Scholar 

  • Caetano MF, Burred JJ, Rodet X (2010) Automatic segmentation of the temporal evolution of isolated acoustic musical instrument sounds using spectro-temporal cues. In: Zoter F (ed) Proceedings of the 13th international conference on digital audio effects (DAFx-10). IEM, Graz

    Google Scholar 

  • Caetano M, Rodet X (2013) Musical instrument sound morphing guided by perceptually motivated features. IEEE Trans Audio Speech Lang Proc 21(8):1666–1675

    Article  Google Scholar 

  • Caetano M, Zacharakis A, Barbancho I, Tardón LJ (2019) Leveraging diversity in computer-aided musical orchestration with an artificial immune system for multi-modal optimization. Swarm Evol Comput. https://doi.org/10.1016/j.swevo.2018.12.010

  • Carpentier G, Assayag G, Saint-James E (2010a) Solving the musical orchestration problem using multiobjective constrained optimization with a genetic local search approach. J Heuristics 16(5):681–714. https://doi.org/10.1007/s10732-009-9113-7

    Article  Google Scholar 

  • Carpentier G, Tardieu D, Harvey J et al (2010b) Predicting timbre features of instrument sound combinations: application to automatic orchestration. J New Mus Res 39(1):47–61

    Article  Google Scholar 

  • Casey M (2001a) MPEG-7 sound-recognition tools. IEEE Trans Circ Sys Video Tech 11(6):737–747

    Article  Google Scholar 

  • Casey M (2001b) General sound classification and similarity in MPEG-7. Organized Sound 6(2):153–164

    Article  Google Scholar 

  • Casey MA, Veltkamp R, Goto M et al (2008) Content-based music information retrieval: current directions and future challenges. Proc IEEE 96(4):668–696

    Article  Google Scholar 

  • Childers DG, Skinner DP, Kemerait RC (1977) The cepstrum: a guide to processing. Proc IEEE 65(10):1428–1443

    Article  Google Scholar 

  • Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366

    Article  Google Scholar 

  • Deng JD, Simmermacher C, Cranefield S (2008) A study on feature analysis for musical instrument classification. IEEE Trans Syst Man Cybern B Cybern 38(2):429–438

    Article  PubMed  Google Scholar 

  • De Poli G, Prandoni P (1997) Sonological models for timbre characterization. J New Mus Res 26(2):170–197

    Article  Google Scholar 

  • Dolson M (1986) The phase vocoder: a tutorial. Comp Music J 10(4):14–27. https://doi.org/10.2307/3680093

    Article  Google Scholar 

  • Esling P, Agon C (2013) Multiobjective time series matching for audio classification and retrieval. IEEE Trans Audio Speech Lang Proc 21(10):2057–2072

    Article  Google Scholar 

  • Fletcher NH (1999) The nonlinear physics of musical instruments. Rep Prog Phys 62(5):723–764

    Article  Google Scholar 

  • Giordano BL, McAdams S, Zatorre RJ et al (2012) Abstract encoding of auditory objects in cortical activity patterns. Cereb Cortex 23(9):2025–2037

    Article  PubMed  Google Scholar 

  • Glasberg BR, Moore BCJ (1990) Derivation of auditory filter shapes from notched-noise data. Hear Res 47:103–138

    Article  CAS  PubMed  Google Scholar 

  • Grey JM (1977) Multidimensional perceptual scaling of musical timbres. J Acoust Soc Am 61(5). https://doi.org/10.1121/1.381428

    Article  CAS  PubMed  Google Scholar 

  • Grey JM, Gordon JW (1978) Perceptual effects of spectral modifications on musical timbres. J Acoust Soc Am 63(5):1493–1500

    Article  Google Scholar 

  • Hajda J (2007) The effect of dynamic acoustical features on musical timbre. In: Beauchamp JW (ed) Analysis, synthesis, and perception of musical sounds. Springer, New York, pp 250–271

    Chapter  Google Scholar 

  • Handel S (1995) Timbre perception and auditory object identification. In: Moore BCJ (ed) Hearing, Handbook of perception and cognition, 2nd edn. Academic Press, San Diego, pp 425–461

    Chapter  Google Scholar 

  • Harris FJ (1978) On the use of windows for harmonic analysis with the discrete Fourier transform. Proc IEEE 66(1):51–83

    Article  Google Scholar 

  • Hartmann WM (1996) Pitch, periodicity, and auditory organization. J Acoust Soc Am 100(6):3491–3502

    Article  CAS  PubMed  Google Scholar 

  • Herrera-Boyer P, Peeters G, Dubnov S (2003) Automatic classification of musical instrument sounds. J New Music Res 32(1):3–21

    Article  Google Scholar 

  • Hinton G, Deng L, Yu D et al (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig Proc Mag 29(6):82–97

    Article  Google Scholar 

  • Holighaus N, Dörfler M, Velasco GA, Grill T (2013) A framework for invertible, real-time constant-Q transforms. IEEE Trans Audio Speech Lang Proc 21(4):775–785

    Article  Google Scholar 

  • Horner AB, Beauchamp JW, So RH (2011) Evaluation of Mel-band and MFCC-based error metrics for correspondence to discrimination of spectrally altered musical instrument sounds. J Audio Eng Soc 59(5):290–303

    Google Scholar 

  • Huq A, Bello JP, Rowe R (2010) Automated music emotion recognition: a systematic evaluation. J New Mus Res 39(3):227–244

    Article  Google Scholar 

  • Irino T, Patterson RD (1997) A time-domain, level-dependent auditory filter: the gammachirp. J Acoust Soc Am 101:412–419

    Article  Google Scholar 

  • Jaffe DA (1987a) Spectrum analysis tutorial, part 1: the discrete Fourier transform. Comp Music J 11(2):9–24

    Article  Google Scholar 

  • Jaffe DA (1987b) Spectrum analysis tutorial, part 2: properties and applications of the discrete Fourier transform. Comp Music J 11(3):17–35

    Article  Google Scholar 

  • Kell AJE, Yamins DLK, Shook EN et al (2018) A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron 98(3):630–644. https://doi.org/10.1016/j.neuron.2018.03.044

    Article  CAS  PubMed  Google Scholar 

  • Kim HG, Burred JJ, Sikora T (2004) How efficient is MPEG-7 for general sound recognition? Paper presented at the 25th international Audio Engineering Society conference: metadata for audio. London, 17–19 June 2004

    Google Scholar 

  • Krimphoff J, Mcadams S, Winsberg S (1994) Caractérisation du timbre des sons complexes. II. Analyses acoustiques et quantification psychophysique (Characterization of the timbre of complex sounds II Acoustic analysis and psychophysical quantification) J de Physique (J Phys) IV(C5):625–628

    Google Scholar 

  • Lartillot O, Toiviainen P (2007) A Matlab toolbox for musical feature extraction from audio. In: Marchand S (ed) Proceedings of the 10th international conference on digital audio effects (DAFx-07). Université de Bordeaux, Bordeaux, p 237–244

    Google Scholar 

  • Levy M, Sandler M (2009) Music information retrieval using social tags and audio. IEEE Trans Multimedia 11(3):383–395

    Article  Google Scholar 

  • Lyon FL (2017) Human and machine hearing: extracting meaning from sound. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • McLoughlin IV (2008) Review: line spectral pairs. Sig Proc 88(3):448–467

    Article  Google Scholar 

  • Makhoul J (1975) Linear prediction: a tutorial review. Proc IEEE 63(4):561–580

    Article  Google Scholar 

  • Martínez JM, Koenen R, Pereira F (2002) MPEG-7: the generic multimedia content description standard, part 1. IEEE MultiMedia 9(2):78–87

    Article  Google Scholar 

  • Marozeau J, de Cheveigné A (2007) The effect of fundamental frequency on the brightness dimension of timbre. J Acoust Soc Am 121(1):383–387

    Article  PubMed  Google Scholar 

  • Martínez J, Perez H, Escamilla E, Suzuki MM (2012). Speaker recognition using mel frequency cepstral coefficients (MFCC) and vector quantization (VQ) techniques. In: Sánchez PB (ed) Proceedings of the 22nd international conference on electrical communications and computers. IEEE, Piscataway, p 248–251

    Google Scholar 

  • McAdams S, Douglas C, Vempala NN (2017) Perception and modeling of affective qualities of musical instrument sounds across pitch registers. Front Psychol. https://doi.org/10.3389/fpsyg.2017.00153

  • McAdams S, Winsberg S, Donnadieu S et al (1995) Perceptual scaling of synthesized musical timbres: common dimensions, specificities, and latent subject classes. Psychol Res 58(3):177–192

    Article  CAS  PubMed  Google Scholar 

  • McAulay R, Quatieri T (1986) Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans Acoust Speech Sig Proc 34(4):744–754

    Article  Google Scholar 

  • McDermott JH, Schemitsch M, Simoncelli EP (2013) Summary statistics in auditory perception. Nat Neurosci 16:493–498

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Nack F, Lindsay AT (1999) Everything you wanted to know about MPEG-7: part 2. IEEE MultiMedia 6(4):64–73

    Article  Google Scholar 

  • Ogg M, Slevc LR, Idsardi WJ (2017) The time course of sound category identification: insights from acoustic features. J Acoust Soc Am 142(6):3459–3473

    Article  PubMed  Google Scholar 

  • On CK, Pandiyan PM, Yaacob S, Saudi A (2006). Mel-frequency cepstral coefficient analysis in speech recognition. Paper presented at the 2006 international conference on computing & informatics (ICOCI 2006). Kuala Lumpur, 6–8 June 2006

    Google Scholar 

  • Patterson RD, Robinson K et al (1992) Complex sounds and auditory images. In: Cazals Y, Demany L, Horner K (eds) Auditory physiology and perception. Pergamon Press, Oxford, pp 429–446

    Chapter  Google Scholar 

  • Peeters G, Giordano BL, Susini P et al (2011) The timbre toolbox: audio descriptors of musical signals. J Acoust Soc Am 130:2902–2916. https://doi.org/10.1121/1.3642604

    Article  PubMed  Google Scholar 

  • Pollard HF, Jansson EV (1982) A tristimulus method for the specification of musical timbre. Acta Acust united Ac 51(3):162–171

    Google Scholar 

  • Portnoff M (1980) Time-frequency representation of digital signals and systems based on short-time Fourier analysis. IEEE Trans Acoust Speech Sig Proc 28(1):55–69

    Article  Google Scholar 

  • Regnier L, Peeters G (2009) Singing voice detection in music tracks using direct voice vibrato detection. In: Chen LG, Glass JR (eds) Proceedings of the 2009 IEEE international conference on acoustics, speech and signal processing, Taipei, April 2009. IEEE, Piscataway, p 1685–1688

    Google Scholar 

  • Rigaud F, David B (2013) A parametric model and estimation techniques for the inharmonicity and tuning of the piano. J Acoust Soc Am 133(5):3107–3118. https://doi.org/10.1121/1.4799806

    Article  PubMed  Google Scholar 

  • Saitis C, Giordano BL, Fritz C, Scavone GP (2012) Perceptual evaluation of violins: a quantitative analysis of preference judgements by experienced players. J Acoust Soc Am 132:4002–4012

    Article  PubMed  Google Scholar 

  • Schubert E, Wolfe J (2006) Does timbral brightness scale with frequency and spectral centroid? Acta Acust united Ac 92(5):820–825

    Google Scholar 

  • Siedenburg K, Fujinaga I, McAdams S (2016a) A comparison of approaches to timbre descriptors in music information retrieval and music psychology. J New Music Res 45(1):27–41

    Article  Google Scholar 

  • Siedenburg K, Jones-Mollerup K, McAdams S (2016b) Acoustic and categorical dissimilarity of musical timbre: evidence from asymmetries between acoustic and chimeric sounds. Front Psychol 6(1977)

    Google Scholar 

  • Siedenburg K, McAdams S (2017) Four distinctions for the auditory “wastebasket” of timbre. Front Psychol 8(1747)

    Google Scholar 

  • Slawson W (1985) Sound color. University of California Press, Berkeley

    Google Scholar 

  • Stevens SS, Volkman J, Newman E (1937) A scale for the measurement of the psychological magnitude of pitch. J Acoust Soc Am 8(3):185–190

    Article  Google Scholar 

  • Takahashi N, Gygli M, Van Gool L (2018) AENet: learning deep audio features for video analysis. IEEE Trans Multimedia 20(3):513–524

    Article  Google Scholar 

  • Terasawa H, Slaney M, Berger J (2005) The thirteen colors of timbre. In: proceedings of the 2005 IEEE workshop on applications of signal processing to audio and acoustics, new Paltz, October 2005. IEEE, Piscataway, p 323–326

    Google Scholar 

  • Verfaille V, Zolzer U, Arfib D (2006) Adaptive digital audio effects (a-DAFx): a new class of sound transformations. IEEE Trans Audio Speech Lang Proc 14(5):1817–1831

    Article  Google Scholar 

  • Zwicker E (1961) Subdivision of the audible frequency range into critical bands (Frequenzgruppen). J Acoust Soc Am 33:248–248

    Article  Google Scholar 

  • Zwicker E, Fastl H (1990) Psychoacoustics: facts and models. Springer, Berlin

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcelo Caetano .

Editor information

Editors and Affiliations

1 Electronic Supplementary Material

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Caetano, M., Saitis, C., Siedenburg, K. (2019). Audio Content Descriptors of Timbre. In: Siedenburg, K., Saitis, C., McAdams, S., Popper, A., Fay, R. (eds) Timbre: Acoustics, Perception, and Cognition. Springer Handbook of Auditory Research, vol 69. Springer, Cham. https://doi.org/10.1007/978-3-030-14832-4_11

Download citation

Publish with us

Policies and ethics