Abstract
The analysis and recognition of sounds in complex auditory scenes is a fundamental step towards context-awareness in machines, and thus an enabling technology for applications across multiple domains including robotics, human-computer interaction, surveillance and bioacoustics. In the realm of music, endowing computers with listening and analytical skills can aid the organization and study of large music collections, the creation of music recommendation services and personalized radio streams, the automation of tasks in the recording studio or the development of interactive music systems for performance and composition.
In this chapter, we survey common techniques for the automatic recognition of timbral, rhythmic and tonal information from recorded music, and for characterizing the similarities that exist between musical pieces. We explore the assumptions behind these methods and their inherent limitations, and conclude by discussing how current trends in machine learning and signal processing research can shape future developments in the field of machine listening.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Also known as onset detection function, or onset strength signal.
References
Agawu K (2012) Trends in African musicology: a review article. EthnoMusicol 56(1):133–140
Aucouturier JJ (2006) Ten experiments on the modelling of polyphonic timbre. PhD thesis, University of Paris 6, France
Aucouturier, J.-J., Defreville, B. and Pachet, F. The bag-of-frame approach to audio pattern recognition: A sufficient model for urban soundscapes but not for polyphonic music. Journal of the Acoustical Society of America, 122(2):881–91, 2007.
Bamberger JS, Hernandez A (2000) Developing musical intuitions: a project-based introduction to making and understanding music. Oxford University Press, New York
Barbedo JGA (2012) Instrument recognition. In: Li T, Ogihara M, Tzanetakis G (eds) Music data mining. CRC Press, Boca Raton, Florida, USA
Battenberg E, Wessel D (2012) Analyzing drum patterns using conditional deep belief networks. In: ISMIR, pp 37–42
Bello JP (2003) Towards the automated analysis of simple polyphonic music: a knowledge-based approach. PhD thesis, Department of Electronic Engineering, Queen Mary University of London
Bello JP (September 2007) Audio-based cover song retrieval using approximate chord sequences: testing shifts, gaps, swaps and beats. In: Proceedings of the 8th international conference on music information retrieval (ISMIR-07). Vienna, Austria, September 2007.
Bello JP, Daudet L, Abdallah S, Duxbury C, Davies M, Sandler MB (September 2005) A tutorial on onset detection in music signals. IEEE Trans Speech Audio Process 13(5):1035–1047 (Part 2)
Bengio Y (January (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127
Berenzweig A (2007) Anchors and hubs in audio-based music similarity. PhD thesis, Columbia University, New York
Berenzweig A, Logan B, Ellis D, Whitman B (2003) A large-scale evaluation of acoustic and subjective music similarity measures. In: Proceedings of the international conference on music information retrieval, Baltimore
Bertin-Mahieux T, Ellis DPW (2012) Large-scale cover song recognition using the 2D Fourier transform magnitude. In: The 13th international society for music information retrieval conference, pp 241–246
Bertin-Mahieux T, Ellis DPW, Whitman B, Lamere P (2011) The million song dataset. In: Proceedings of the 12th international conference on music information retrieval (ISMIR 2011)
BMAT (2013) http://www.bmat.com/ Accessed July 20, 2013
Brown J (1991) Calculation of a constant Q spectral transform. J Acoust Soc Am 89(1):425–434
Burgoyne JA, Pugin L, Kereliuk C, Fujinaga I (2007) A cross-validated study of modelling strategies for automatic chord recognition in audio. In: ISMIR, pp 251–254
Burgoyne JA, Wild J, Fujinaga I (2011) An expert ground truth set for audio chord recognition and music analysis. In: Proceedings of the conference of the international society for music information retrieval (ISMIR), Miami, FL, pp 633–638
Cho T, Bello JP (2011) A feature smoothing method for chord recognition using recurrence plots. In: Proceedings of the conference of the international society for music information retrieval (ISMIR)
Taemin Cho; Bello, J.P., “On the Relative Importance of Individual Components of Chord Recognition Systems,” Audio, Speech, and Language Processing, IEEE/ACM Transactions on, vol.22, no.2, pp.477,492, Feb. 2014
Cho T, Weiss RJ, Bello JP (July 2010) Exploring common variations in state of the art chord recognition systems. In: Proceedings of the sound and music computing conference (SMC-10), Barcelona
Cook PR (2001) Music, cognition, and computerized sound: an introduction to psychoacoustics. The MIT Press, Cambridge, MA, USA.
Daudet L (September (2006) Sparse and structured decompositions of signals with the molecular matching pursuit. IEEE Trans Audio Speech Lang Process 14(5):1808–1816
Davies MEP, Plumbley MD (2007) Context-dependent beat tracking of musical audio. IEEE Trans Audio Speech Lang Process 15(3):1009–1020
Gouyon F, Klapuri A, Dixon S, Alonso M, Tzanetakis G, Uhle C, Cano P (2006) An experimental comparison of audio tempo induction algorithms. IEEE Trans Audio Speech Lang Process 14(5):1832–1844
Gracenote (2013) http://www.gracenote.com/music/
Grey JM (1975) An exploration of musical timbre. PhD thesis, Department of Music, Stanford University
Grosche P, Muller M (2011, to appear) Extracting predominant local pulse information from music recordings. IEEE Trans Audio Speech Lang Process
Hamel P, Eck D (2010) Learning features from music audio with deep belief networks. In: ISMIR, Utrecht, pp 339–344
Harte C, Sandler MB, Abdallah SA, Gómez E (2005) Symbolic representation of musical chords: a proposed syntax for text annotations. In: Proceedings of the conference of the international society for music information retrieval (ISMIR), London, pp 66–71
Henaff M, Jarrett K, Kavukcuoglu K, LeCun Y (2011) Unsupervised learning of sparse features for scalable audio classification. In: Proceedings of international symposium on music information retrieval (ISMIR’11)
Herrera P, Klapuri A, Davy M (2006) Automatic classification of pitched musical instrument sounds. In: Klapuri A, Davy M (eds) Signal processing methods for music transcription. Springer, New York, pp 163–200
Hockman J, Bello JP, Davies MEP, Plumbley M (September 2008) Automated rhythmic transformation of musical audio. In: Proceedings of the International Conference on Digital Audio Effects (DAFX-08), Espoo
Holzapfel A, Stylianou Y (2009) A scale transform based method for rhythmic similarity of music. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei
Holzapfel A, Flexer A, Widmer G (2011) Improving tempo-sensitive and tempo-robust descriptors for rhythmic similarity. In: Proceedings of SMC 2011, Conference on Sound and Music Computing
Honing H (2012) The structure and interpretation of rhythm in music. In: Deutsch D (ed) The psychology of music, 3rd edn. Academic Press, London, pp 369–404
Humphrey E, Glennon A, Bello JP (December 2011) Non-linear semantic embedding for organizing large instrument sample libraries. In: Proceedings of the IEEE international conference on machine learning and applications (ICMLA-11), Honolulu
Humphrey E, Cho T, Bello JP (2012) Learning a robust tonnetz-space transform for automatic chord recognition. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP-12). Kyoto, Japan. May, 2012
Humphrey E, Bello JP, LeCun Y (December 2013) Feature learning and deep architectures: new directions for music informatics. J Intell Inf Syst 41(3):461–481
Huron D (2006) Sweet anticipation: music and the psychology of expectation. MIT Press Cambridge, MA, USA.
Janata P, Birk JL, Van Horn JD, Leman M, Tillmann B, Bharucha JJ (2002) The cortical topography of tonal structures underlying western music. Science 298:2167–2170
Janata P, Tomic ST, Haberman JM (2012) Sensorimotor coupling in music and the psychology of the groove. J Exp Psychol Gen 141(1):54
Jehan T (2005) Creating music by listening. PhD thesis, Massachusetts Institute of Technology, MA, USA, September
Khadkevich M, Omologo M (2009) Use of hidden markov models and factored language models for automatic chord recognition. In: Proceedings of the conference of the International Society for Music Information Retrieval (ISMIR), Kobe, Japan, pp 561–566
Klapuri A (1999) Sound onset detection by applying psychoacoustic knowledge. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Washington, D.C., USA, pp 3089–3092
Kolinski M (1973) A cross-cultural approach to metro-rhythmic patterns. Ethnomusicology 17(3):494–506
Krumhansl CL (1990) Cognitive foundations of musical pitch. Oxford University Press, New York
Lee K (2006) Identifying cover songs from audio using harmonic representation. In: MIREX task on audio cover song ID
Lee K (May (2007) A system for chord transcription, key extraction, and cadence recognition from audio using hidden Markov models. PhD thesis. Stanford University, CA, USA, May 2007
Lee H, Largman Y, Pham P, Ng AY (2009) Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Advances in Neural Information Processing Systems (NIPS), pp 1096–1104
Lerdahl F (2001) Tonal pitch space. Oxford University Press, New York
Lewis AC (2007) Rhythm: what it is and how to improve your sense of it. RhythmSource Press, San Francisco
London J (2012) Hearing in time. Oxford University Press, New York
Martin B, Brown DG, Hanna P, Ferraro P (2012) Blast for audio sequences alignment: a fast scalable cover identification tool. In: ISMIR, pp 529–534
Mauch M, Dixon S (2010a) Approximate note transcription for the improved identification of difficult chords. In: ISMIR, pp 135–140
Mauch M, Dixon S (2010b) Simultaneous estimation of chords and musical context from audio. IEEE Trans Audio Speech Lang Process 18(6):1280–1289
McFee B, Barrington L, Lanckriet G (2012) Learning content similarity for music recommendation. IEEE Trans Audio Speech Lang Process 20(8):2207–2218
Nam J, Herrera J, Slaney M, Smith JO (2012) Learning sparse feature representations for music annotation and retrieval. In: ISMIR, pp 565–570
Ni Y, McVicar M, Santos-Rodriguez R, Bie TD (2012) An end-to-end machine learning system for harmonic analysis of music. IEEE Trans Audio Speech Lang Process 20(6):1771–1783
Oppenheim AV, Schafer RW (2004) From frequency to quefrency: a history of the cepstrum. Signal Processing Mag IEEE 21(5):95–106
Papadopoulos H, Peeters G (2007) Large-scale study of chord estimation algorithms based on chroma representation and hmm. In: Content-Based Multimedia Indexing. 2007. CBMI’07. International Workshop on (IEEE), pp 53–60
Peeters G (2011) Spectral and temporal periodicity representations of rhythm for the automatic classification of music audio signal. Audio Speech Lang Process IEEE Trans 19(5):1242–1252
Pohle T, Schnitzer D, Schedl M, Knees P, Widmer G (2009) On rhythm and general music similarity. In: Proceedings of the Conference of the International Society for Music Information Retrieval (ISMIR), Kobe, Japan, pp 525–530
Rabiner LR (1989) A tutorial on HMM and selected applications in speech recognition. Proc IEEE 77(2):257–286
Ravelli E, Bello JP, Sandler M (April 2007) Automatic rhythm modification of drum loops. IEEE Signal Proc Lett 14(4):228–231
Schluter J, Osendorfer C (2011) Music similarity estimation with the mean-covariance restricted boltzmann machine. In: Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on (IEEE), vol 2, pp 118–123
Schmidt EM, Kim YE (2011) Learning emotion-based acoustic features with deep belief networks. In: Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 IEEE Workshop on (IEEE), pp 65–68
Schnitzer D, Flexer A, Schedl M, Widmer G (2012) Local and global scaling reduce hubs in space. J Mach Learn Res 13:2871–2902
Serra J, Gomez E, Herrera P, Serra X (2008) Chroma binary similarity and local alignment applied to cover song identification. IEEE Transactions on Audio, Speech and Language Processing. 16, 2008
Serrà J, Serra X, (September 2009) Andrzejak RG (September 2009) Cross recurrence quantification for cover song identification. New J Phys 11:093017, September 2009
Sheh A, Ellis D (October 2003) Chord segmentation and recognition using EM- trained hidden Markov models. In: Proceedings of the 4th International Conference on Music Information Retrieval (ISMIR-03). Baltimore, USA, pp 185–191
Shepard R (1999) Pitch perception and measurement. In: Cook P (ed) Music, cognition, and computerized sound. MIT Press, Cambridge, pp 149–165
Smaragdis P, Brown JC (2003) Non-negative matrix factorization for polyphonic music transcription. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp 177–180
Smith JO (2007) Mathematics of the discrete fourier transform (DFT): with music and audio applications. W3K http://books.w3k.org/
The Echonest (2013) http://the.echonest.com/ Accessed July 20, 2013
Toussaint G (2013) The geometry of musical rhythm: what makes a good rhythm good? CRC Press, Boca Raton, FL, USA.
Turnbull D, Barrington L, Torres D, Lanckriet G (2008) Semantic annotation and retrieval of music and sound effects. IEEE Trans Audio Speech Lang Proces 16(2):467–476
Tzanetakis G, Cook P (July 2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Proces 10(5):293–302
Weiss RJ, Bello JP (2011) Unsupervised discovery of temporal structure in music. IEEE J Sel Top Signal Proces 5(6):1240–1251
Weller A, Ellis D, Jebara T (2009) Structured prediction models for chord transcription of music audio. In: Machine Learning and Applications, 2009. ICMLA’09. International Conference on (IEEE), pp 590–595
Wessel DL (1979) Timbre space as a musical control structure. Comp Music J 3(2):45–52
Widmer G, Dixon S, Goebl W, Pampalk E, Tobudic A (2003) In search of the Horowitz factor. AI Mag 24(3):111–130
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this chapter
Cite this chapter
Bello, J. (2014). Machine Listening of Music. In: Lee, N. (eds) Digital Da Vinci. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0536-2_7
Download citation
DOI: https://doi.org/10.1007/978-1-4939-0536-2_7
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-0535-5
Online ISBN: 978-1-4939-0536-2
eBook Packages: Computer ScienceComputer Science (R0)