Machine Listening of Music

Bello, Juan Pablo

doi:10.1007/978-1-4939-0536-2_7

Juan Pablo Bello²

1163 Accesses

Abstract

The analysis and recognition of sounds in complex auditory scenes is a fundamental step towards context-awareness in machines, and thus an enabling technology for applications across multiple domains including robotics, human-computer interaction, surveillance and bioacoustics. In the realm of music, endowing computers with listening and analytical skills can aid the organization and study of large music collections, the creation of music recommendation services and personalized radio streams, the automation of tasks in the recording studio or the development of interactive music systems for performance and composition.

In this chapter, we survey common techniques for the automatic recognition of timbral, rhythmic and tonal information from recorded music, and for characterizing the similarities that exist between musical pieces. We explore the assumptions behind these methods and their inherent limitations, and conclude by discussing how current trends in machine learning and signal processing research can shape future developments in the field of machine listening.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Also known as onset detection function, or onset strength signal.

References

Agawu K (2012) Trends in African musicology: a review article. EthnoMusicol 56(1):133–140
Article Google Scholar
Aucouturier JJ (2006) Ten experiments on the modelling of polyphonic timbre. PhD thesis, University of Paris 6, France
Google Scholar
Aucouturier, J.-J., Defreville, B. and Pachet, F. The bag-of-frame approach to audio pattern recognition: A sufficient model for urban soundscapes but not for polyphonic music. Journal of the Acoustical Society of America, 122(2):881–91, 2007.
Google Scholar
Bamberger JS, Hernandez A (2000) Developing musical intuitions: a project-based introduction to making and understanding music. Oxford University Press, New York
Google Scholar
Barbedo JGA (2012) Instrument recognition. In: Li T, Ogihara M, Tzanetakis G (eds) Music data mining. CRC Press, Boca Raton, Florida, USA
Google Scholar
Battenberg E, Wessel D (2012) Analyzing drum patterns using conditional deep belief networks. In: ISMIR, pp 37–42
Google Scholar
Bello JP (2003) Towards the automated analysis of simple polyphonic music: a knowledge-based approach. PhD thesis, Department of Electronic Engineering, Queen Mary University of London
Google Scholar
Bello JP (September 2007) Audio-based cover song retrieval using approximate chord sequences: testing shifts, gaps, swaps and beats. In: Proceedings of the 8th international conference on music information retrieval (ISMIR-07). Vienna, Austria, September 2007.
Google Scholar
Bello JP, Daudet L, Abdallah S, Duxbury C, Davies M, Sandler MB (September 2005) A tutorial on onset detection in music signals. IEEE Trans Speech Audio Process 13(5):1035–1047 (Part 2)
Google Scholar
Bengio Y (January (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127
Article MATH MathSciNet Google Scholar
Berenzweig A (2007) Anchors and hubs in audio-based music similarity. PhD thesis, Columbia University, New York
Google Scholar
Berenzweig A, Logan B, Ellis D, Whitman B (2003) A large-scale evaluation of acoustic and subjective music similarity measures. In: Proceedings of the international conference on music information retrieval, Baltimore
Google Scholar
Bertin-Mahieux T, Ellis DPW (2012) Large-scale cover song recognition using the 2D Fourier transform magnitude. In: The 13th international society for music information retrieval conference, pp 241–246
Google Scholar
Bertin-Mahieux T, Ellis DPW, Whitman B, Lamere P (2011) The million song dataset. In: Proceedings of the 12th international conference on music information retrieval (ISMIR 2011)
Google Scholar
BMAT (2013) http://www.bmat.com/ Accessed July 20, 2013
Brown J (1991) Calculation of a constant Q spectral transform. J Acoust Soc Am 89(1):425–434
Article Google Scholar
Burgoyne JA, Pugin L, Kereliuk C, Fujinaga I (2007) A cross-validated study of modelling strategies for automatic chord recognition in audio. In: ISMIR, pp 251–254
Google Scholar
Burgoyne JA, Wild J, Fujinaga I (2011) An expert ground truth set for audio chord recognition and music analysis. In: Proceedings of the conference of the international society for music information retrieval (ISMIR), Miami, FL, pp 633–638
Google Scholar
Cho T, Bello JP (2011) A feature smoothing method for chord recognition using recurrence plots. In: Proceedings of the conference of the international society for music information retrieval (ISMIR)
Google Scholar
Taemin Cho; Bello, J.P., “On the Relative Importance of Individual Components of Chord Recognition Systems,” Audio, Speech, and Language Processing, IEEE/ACM Transactions on, vol.22, no.2, pp.477,492, Feb. 2014
Google Scholar
Cho T, Weiss RJ, Bello JP (July 2010) Exploring common variations in state of the art chord recognition systems. In: Proceedings of the sound and music computing conference (SMC-10), Barcelona
Google Scholar
Cook PR (2001) Music, cognition, and computerized sound: an introduction to psychoacoustics. The MIT Press, Cambridge, MA, USA.
Google Scholar
Daudet L (September (2006) Sparse and structured decompositions of signals with the molecular matching pursuit. IEEE Trans Audio Speech Lang Process 14(5):1808–1816
Article Google Scholar
Davies MEP, Plumbley MD (2007) Context-dependent beat tracking of musical audio. IEEE Trans Audio Speech Lang Process 15(3):1009–1020
Article Google Scholar
Gouyon F, Klapuri A, Dixon S, Alonso M, Tzanetakis G, Uhle C, Cano P (2006) An experimental comparison of audio tempo induction algorithms. IEEE Trans Audio Speech Lang Process 14(5):1832–1844
Article Google Scholar
Gracenote (2013) http://www.gracenote.com/music/
Grey JM (1975) An exploration of musical timbre. PhD thesis, Department of Music, Stanford University
Google Scholar
Grosche P, Muller M (2011, to appear) Extracting predominant local pulse information from music recordings. IEEE Trans Audio Speech Lang Process
Google Scholar
Hamel P, Eck D (2010) Learning features from music audio with deep belief networks. In: ISMIR, Utrecht, pp 339–344
Google Scholar
Harte C, Sandler MB, Abdallah SA, Gómez E (2005) Symbolic representation of musical chords: a proposed syntax for text annotations. In: Proceedings of the conference of the international society for music information retrieval (ISMIR), London, pp 66–71
Google Scholar
Henaff M, Jarrett K, Kavukcuoglu K, LeCun Y (2011) Unsupervised learning of sparse features for scalable audio classification. In: Proceedings of international symposium on music information retrieval (ISMIR’11)
Google Scholar
Herrera P, Klapuri A, Davy M (2006) Automatic classification of pitched musical instrument sounds. In: Klapuri A, Davy M (eds) Signal processing methods for music transcription. Springer, New York, pp 163–200
Chapter Google Scholar
Hockman J, Bello JP, Davies MEP, Plumbley M (September 2008) Automated rhythmic transformation of musical audio. In: Proceedings of the International Conference on Digital Audio Effects (DAFX-08), Espoo
Google Scholar
Holzapfel A, Stylianou Y (2009) A scale transform based method for rhythmic similarity of music. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei
Google Scholar
Holzapfel A, Flexer A, Widmer G (2011) Improving tempo-sensitive and tempo-robust descriptors for rhythmic similarity. In: Proceedings of SMC 2011, Conference on Sound and Music Computing
Google Scholar
Honing H (2012) The structure and interpretation of rhythm in music. In: Deutsch D (ed) The psychology of music, 3rd edn. Academic Press, London, pp 369–404
Google Scholar
Humphrey E, Glennon A, Bello JP (December 2011) Non-linear semantic embedding for organizing large instrument sample libraries. In: Proceedings of the IEEE international conference on machine learning and applications (ICMLA-11), Honolulu
Google Scholar
Humphrey E, Cho T, Bello JP (2012) Learning a robust tonnetz-space transform for automatic chord recognition. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP-12). Kyoto, Japan. May, 2012
Google Scholar
Humphrey E, Bello JP, LeCun Y (December 2013) Feature learning and deep architectures: new directions for music informatics. J Intell Inf Syst 41(3):461–481
Article Google Scholar
Huron D (2006) Sweet anticipation: music and the psychology of expectation. MIT Press Cambridge, MA, USA.
Google Scholar
Janata P, Birk JL, Van Horn JD, Leman M, Tillmann B, Bharucha JJ (2002) The cortical topography of tonal structures underlying western music. Science 298:2167–2170
Article Google Scholar
Janata P, Tomic ST, Haberman JM (2012) Sensorimotor coupling in music and the psychology of the groove. J Exp Psychol Gen 141(1):54
Article Google Scholar
Jehan T (2005) Creating music by listening. PhD thesis, Massachusetts Institute of Technology, MA, USA, September
Google Scholar
Khadkevich M, Omologo M (2009) Use of hidden markov models and factored language models for automatic chord recognition. In: Proceedings of the conference of the International Society for Music Information Retrieval (ISMIR), Kobe, Japan, pp 561–566
Google Scholar
Klapuri A (1999) Sound onset detection by applying psychoacoustic knowledge. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Washington, D.C., USA, pp 3089–3092
Google Scholar
Kolinski M (1973) A cross-cultural approach to metro-rhythmic patterns. Ethnomusicology 17(3):494–506
Article Google Scholar
Krumhansl CL (1990) Cognitive foundations of musical pitch. Oxford University Press, New York
Google Scholar
Lee K (2006) Identifying cover songs from audio using harmonic representation. In: MIREX task on audio cover song ID
Google Scholar
Lee K (May (2007) A system for chord transcription, key extraction, and cadence recognition from audio using hidden Markov models. PhD thesis. Stanford University, CA, USA, May 2007
Google Scholar
Lee H, Largman Y, Pham P, Ng AY (2009) Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Advances in Neural Information Processing Systems (NIPS), pp 1096–1104
Google Scholar
Lerdahl F (2001) Tonal pitch space. Oxford University Press, New York
Google Scholar
Lewis AC (2007) Rhythm: what it is and how to improve your sense of it. RhythmSource Press, San Francisco
Google Scholar
London J (2012) Hearing in time. Oxford University Press, New York
Book Google Scholar
Martin B, Brown DG, Hanna P, Ferraro P (2012) Blast for audio sequences alignment: a fast scalable cover identification tool. In: ISMIR, pp 529–534
Google Scholar
Mauch M, Dixon S (2010a) Approximate note transcription for the improved identification of difficult chords. In: ISMIR, pp 135–140
Google Scholar
Mauch M, Dixon S (2010b) Simultaneous estimation of chords and musical context from audio. IEEE Trans Audio Speech Lang Process 18(6):1280–1289
Article Google Scholar
McFee B, Barrington L, Lanckriet G (2012) Learning content similarity for music recommendation. IEEE Trans Audio Speech Lang Process 20(8):2207–2218
Article Google Scholar
Nam J, Herrera J, Slaney M, Smith JO (2012) Learning sparse feature representations for music annotation and retrieval. In: ISMIR, pp 565–570
Google Scholar
Ni Y, McVicar M, Santos-Rodriguez R, Bie TD (2012) An end-to-end machine learning system for harmonic analysis of music. IEEE Trans Audio Speech Lang Process 20(6):1771–1783
Article Google Scholar
Oppenheim AV, Schafer RW (2004) From frequency to quefrency: a history of the cepstrum. Signal Processing Mag IEEE 21(5):95–106
Article Google Scholar
Papadopoulos H, Peeters G (2007) Large-scale study of chord estimation algorithms based on chroma representation and hmm. In: Content-Based Multimedia Indexing. 2007. CBMI’07. International Workshop on (IEEE), pp 53–60
Google Scholar
Peeters G (2011) Spectral and temporal periodicity representations of rhythm for the automatic classification of music audio signal. Audio Speech Lang Process IEEE Trans 19(5):1242–1252
Article Google Scholar
Pohle T, Schnitzer D, Schedl M, Knees P, Widmer G (2009) On rhythm and general music similarity. In: Proceedings of the Conference of the International Society for Music Information Retrieval (ISMIR), Kobe, Japan, pp 525–530
Google Scholar
Rabiner LR (1989) A tutorial on HMM and selected applications in speech recognition. Proc IEEE 77(2):257–286
Article Google Scholar
Ravelli E, Bello JP, Sandler M (April 2007) Automatic rhythm modification of drum loops. IEEE Signal Proc Lett 14(4):228–231
Google Scholar
Schluter J, Osendorfer C (2011) Music similarity estimation with the mean-covariance restricted boltzmann machine. In: Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on (IEEE), vol 2, pp 118–123
Google Scholar
Schmidt EM, Kim YE (2011) Learning emotion-based acoustic features with deep belief networks. In: Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 IEEE Workshop on (IEEE), pp 65–68
Google Scholar
Schnitzer D, Flexer A, Schedl M, Widmer G (2012) Local and global scaling reduce hubs in space. J Mach Learn Res 13:2871–2902
MATH MathSciNet Google Scholar
Serra J, Gomez E, Herrera P, Serra X (2008) Chroma binary similarity and local alignment applied to cover song identification. IEEE Transactions on Audio, Speech and Language Processing. 16, 2008
Google Scholar
Serrà J, Serra X, (September 2009) Andrzejak RG (September 2009) Cross recurrence quantification for cover song identification. New J Phys 11:093017, September 2009
Google Scholar
Sheh A, Ellis D (October 2003) Chord segmentation and recognition using EM- trained hidden Markov models. In: Proceedings of the 4th International Conference on Music Information Retrieval (ISMIR-03). Baltimore, USA, pp 185–191
Google Scholar
Shepard R (1999) Pitch perception and measurement. In: Cook P (ed) Music, cognition, and computerized sound. MIT Press, Cambridge, pp 149–165
Google Scholar
Smaragdis P, Brown JC (2003) Non-negative matrix factorization for polyphonic music transcription. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp 177–180
Google Scholar
Smith JO (2007) Mathematics of the discrete fourier transform (DFT): with music and audio applications. W3K http://books.w3k.org/
The Echonest (2013) http://the.echonest.com/ Accessed July 20, 2013
Toussaint G (2013) The geometry of musical rhythm: what makes a good rhythm good? CRC Press, Boca Raton, FL, USA.
Google Scholar
Turnbull D, Barrington L, Torres D, Lanckriet G (2008) Semantic annotation and retrieval of music and sound effects. IEEE Trans Audio Speech Lang Proces 16(2):467–476
Article Google Scholar
Tzanetakis G, Cook P (July 2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Proces 10(5):293–302
Google Scholar
Weiss RJ, Bello JP (2011) Unsupervised discovery of temporal structure in music. IEEE J Sel Top Signal Proces 5(6):1240–1251
Article Google Scholar
Weller A, Ellis D, Jebara T (2009) Structured prediction models for chord transcription of music audio. In: Machine Learning and Applications, 2009. ICMLA’09. International Conference on (IEEE), pp 590–595
Google Scholar
Wessel DL (1979) Timbre space as a musical control structure. Comp Music J 3(2):45–52
Article MathSciNet Google Scholar
Widmer G, Dixon S, Goebl W, Pampalk E, Tobudic A (2003) In search of the Horowitz factor. AI Mag 24(3):111–130
Google Scholar

Download references

Author information

Authors and Affiliations

Music and Audio Research Laboratory (MARL), New York University, New York, USA
Juan Pablo Bello

Authors

Juan Pablo Bello
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juan Pablo Bello .

Editor information

Editors and Affiliations

Newton Lee Laboratories, LLC, Tujunga, California, USA
Newton Lee

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bello, J. (2014). Machine Listening of Music. In: Lee, N. (eds) Digital Da Vinci. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0536-2_7

Download citation

DOI: https://doi.org/10.1007/978-1-4939-0536-2_7
Published: 12 April 2014
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-0535-5
Online ISBN: 978-1-4939-0536-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics