Abstract
In this chapter, we introduce mid-level representations of music for content-based music information retrieval (MIR). Although low-level features such as spectral and cepstral features were widely used for audio-based MIR, the necessity for developing more musically meaningful representations has recently been recognized. Here, we review attempts of exploring new representations of music based on this motivation. Such representations are called mid − level representations because they have levels of abstraction between those of waveform representations and MIDI-like symbolic representations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Good, M.: MusicXML: An internet-friendly format for sheet music. In: The XML 2001 Conf. Proc. (2001)
Bellini, P., Nesi, P.: WEDELMUSIC format: An XML music notation format for emerging applications. In: Proc. Int’l Conf.WEB Delivering of Music, pp. 79–86 (2001)
Klapuri, A., Davy, M. (eds.): Signal Processing Methods for Music Transcription. Springer, Heidelberg (2006)
Ellis, D., Rosenthal, D.F.: Mid-level representations for computational auditory scene analysis. In: Rosenthal, D.F., Okuno, H.G. (eds.) Computational auditory scene analysis, ch. 17, pp. 257–272. Lawrence Erlbaum, Mahwah (1998)
Goto, M.: A real-time music-scene-description system: Predominant-F0 estimation for detecting melody and bass lines in real-world audio signals. Speech Comm. 43(4), 311–329 (2004)
Goto, M.: A robust predominant-f0 estimation method for real-time detection of melody and bass lines in CD recordings. In: Proc. ICASSP, vol. II, pp. 757–760 (2000)
Marolt, M.: A mid-level melody-based representation for calculating audio similarity. In: Proc. ISMIR (2006)
Sagayama, S., Takahashi, K., Kameoka, H., Nishimoto, T.: Specmurt anasylis: A piano-roll-visualization of polyphonic music by deconvolution of log-frequency spectrum. In: Proc. SAPA (2004)
Abdallah, S.A., Plumbley, M.D.: Polyphonic music transcription by non-negative sparse coding of power spectra. In: Proc. ISMIR, pp. 318–325 (2004)
Virtanen, T.: Unsupervised learning methods for source separation in monaural music signals. In: Klapuri, A., Davy, M. (eds.) Signal Processing Methods for Music Transcription. Springer, Heidelberg (2006)
Saito, S., Kameoka, H., Nishimoto, T., Sagamaya, S.: Specmurt analysis of multi-pitch music signals with adaptive estimation of common harmonic structure. In: Proc. ISMIR, pp. 84–91 (2005)
Leveau, P., Vincent, E., Richard, G., Daudet, L.: Instrument-specific harmonic atoms for mid-level music representation. IEEE Trans., Audio, Speech, Lang., Process. 16(1), 116–128 (2008)
Fujishima, T.: Realtime chord recognition of musical sound: a system using common Lisp music. In: Proc. ICMC, pp. 464–467 (1999)
Sheh, A., Ellis, D.P.W.: Chord segmentation and recognition using EM-trained hidden Markov models. In: Proc. ISMIR (2003)
Yoshioka, T., Kitahara, T., Komatani, K., Ogata, T., Okuno, H.G.: Automatic chord transcription with concurrent recognition of chord symbols and boundaries. In: Proc. ISMIR, pp. 100–105 (2004)
Bello, J.P., Pickens, J.: A robust mid-level representation for harmonic content in music signals. In: Proc. ISMIR (2005)
Cabral, G., Pachet, F., Briot, J.-P.: Automatic X traditional descriptor extraction: The case of chord recognition. In: Proc. ISMIR (2005)
Lee, K., Slaney, M.: A unified system for chord transcription and key extraction using hidden Markov models. In: Proc. ISMIR (2007)
Goto, M.: Music scene description. In: Klapuri, A., Davy, M. (eds.) Signal Processing Method for Music Transcription, ch. 11, pp. 327–359. Springer, Heidelberg (2006)
Shepard, R.N.: Circularity in judgments of relative pitch. J. Acoust. Soc. Am. 36(12), 2346–2353 (1964)
Fujisawa, T.X., Tani, M., Nagata, N., Katayose, H.: Music mood visualization based on quantitative model of chord perception. IPSJ Journal 50(3) (2009) (in Japanese)
Cook, N.D., Fujisawa, T.X.: The psychophysics of harmony perception: Harmony is a three-tone phenomenon. Empirical Musicology Review 1(2), 106–126 (2006)
Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002)
Lu, L., Liu, D., Zhang, H.-J.: Automatic mood detection and tracking of music audio signals. IEEE Trans. Audio, Speech, Lang. Process. 14(1) (2006)
Pampalk, E.: Computational Models of Music Similarity and their Application in Music Information Retrieval. PhD thesis, Technischen Universitat Wien (2006)
Aucouturier, J.-J., Pachet, F.: Improving timbre similarity: How high’s the sky? Journal of Negative Results in Speech and Audio Sciences (2004)
Herrera-Boyer, P., Klapuri, A., Davy, M.: Automatic classification of pitched instrument sounds. In: Klapuri, A., Davy, M. (eds.) Signal Processing Methods for Music Transcription. Springer, Heidelberg (2006)
Bregman, A.S.: Auditory Scene Analysis: The Perceptual Organization of Sound. MIT Press, Cambridge (1990)
Namba, S.: Definition of timbre. J. Acoust. Soc. Jpn. 49(11), 823–831 (1993) (in Japanese)
Martin, K.D.: Sound-Source Recognition: A Theory and Computational Model. PhD thesis, MIT (1999)
Brown, J.C.: Computer identification of musical instruments using pattern recognition with cepstral coefficients as features. J. Acoust. Soc. Am. 103(3), 1933–1941 (1999)
Eronen, A., Klapuri, A.: Musical instrument recognition using cepstral coefficients and temporal features. In: Proc. ICASSP, pp. 735–756 (2000)
Fujinaga, I., MacMillan, K.: Realtime recognition of orchestral instruments. In: Proc. ICMC, pp. 141–143 (2000)
Marques, J., Moreno, P.J.: A study of musical instrument classification using Gaussian mixture models and support vector machines. CRL Technical Report Series CRL/4, Compaq Cambridge Research Laboratory (1999)
Kitahara, T., Goto, M., Okuno, H.G.: Musical instrument identification based on F0-dependent multivariate normal distribution. In: Proc. ICASSP, vol. V, pp. 421–424 (2003)
Livshin, A.A., Peeters, G., Rodet, X.: Studies and improvements in automatic classification of musical sound samples. In: Proc. ICMC, pp. 171–174 (2003)
Essid, S., Richard, G., David, B.: Musical instrument recognition by pairwise classification strategies. IEEE Trans. Audio, Speech, Lang. Process. 14(4), 1401–1412 (2006)
Kashino, K., Nakadai, K., Kinoshita, T., Tanaka, H.: Application of the Bayesian probability network to music scene analysis. In: Rosenthal, D.F., Okuno, H.G. (eds.) Computational Auditory Scene Analysis, pp. 115–137. Lawrence Erlbaum Associates, Mahwah (1998)
Kashino, K., Murase, H.: A sound source identification system for ensemble music based on template adaptation and music stream extraction. Speech Comm. 27, 337–349 (1999)
Kinoshita, T., Sakai, S., Tanaka, H.: Musical sound source identification based on frequency component adaptation. In: Proc. IJCAI CASA Workshop, pp. 18–24 (1999)
Eggink, J., Brown, G.J.: A missing feature approach to instrument identification in polyphonic music. In: Proc. ICASSP, vol. V, pp. 553–556 (2003)
Eggink, J., Brown, G.J.: Application of missing feature theory to the recognition of musical instruments in polyphonic audio. In: Proc. ISMIR (2003)
Vincent, E., Rodet, X.: Instrument identification in solo and ensemble music using independent subspace analysis. In: Proc. ISMIR, pp. 576–581 (2004)
Essid, S., Richard, G., David, B.: Instrument recognition in polyphonic music based on automatic taxonomies. IEEE Trans. Audio, Speech, Lang. Process. 14(1), 68–80 (2006)
Kitahara, T., Goto, M., Komatani, K., Ogata, T., Okuno, H.G.: Instrument identification in polyphonic music: Feature weighting to minimize influence of sound overlaps. EURAIP J. Adv. Signal Processing 2007(51979), 1–15 (2007)
Kitahara, T., Goto, M., Komatani, K., Ogata, T., Okuno, H.G.: Instrogram: A new musical instrument recognition technique without using onset detection nor F0 estimation. In: Proc. ICASSP, vol. V, pp. 229–232 (2006)
Kitahara, T., Goto, M., Komatani, K., Ogata, T., Okuno, H.G.: Instrogram: Probabilistic representation of instrument existence for polyphonic music. IPSJ Journal 48(1), 214–226 (2007); (also published in IPSJ Digital Courier, vol.3, pp.1–13)
Kitahara, T.: Computational Musical Instrument Recognition and Its Application to Content-based Music Information Retrieval. PhD thesis, Kyoto University (2006)
Tzanetakis, G.: Manipulation, Analysis and Retrieval Systems for Audio Signals. PhD thesis, Princeton University (2002)
Bello, J.P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., Sandler, M.B.: A tutorial on onset detection in music signal. IEEE Trans. Audio, Speech, Lang. Process. 13(5), 1035–1047 (2005)
Goto, M.: An audio-based real-time beat tracking system for music with or without drums. J. New Music Res. 30(2), 159–171 (2001)
Davies, M.E.P., Plumbley, M.D.: Comparing mid-level representations for audio based beat tracking. In: Proc. DMRN Summer Conf., pp. 36–41 (2005)
Dixon, S., Pampalk, E., Widmer, G.: Classification of dance music by periodicity patterns. In: Proc. ISMIR (2003)
Dixon, S., Gouyon, F., Widmer, G.: Towards characteristics of music via rhythmic patterns. In: Proc. ISMIR, pp. 509–516 (2004)
Paulus, J., Klapuri, A.: Measuring the similarity of rhythmic patterns. In: Proc. ISMIR (2002)
Tsunoo, E., Ono, N., Sagayama, S.: Rhythm map: Extraction of unit rhythmic patterns and analysis of rhythmic structure from music acoustic signals. In: Proc. ICASSP, pp. 185–188 (2009)
Gouyon, F., Dixon, S.: A review of automatic rhythm description systems. Computer Music Journal 29(1), 34–54 (2005)
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Eggink, J., Brown, G.J.: Extracting melody lines from complex audio. In: Proc. ISMIR, pp. 84–91 (2004)
Totani, N., Kitahara, T., Katayose, H.: Music player with music thumbnailing and playlist generation functions based on instrumentation. In: Proc. Interaction (2008) (in Japanese)
Itoyama, K., Goto, M., Komatani, K., Ogata, T., Okuno, H.G.: Integration and adaptation of harmonic and inharmonic models for separating polyphonic musical signals. In: Proc. ICASSP, vol. I (2007)
Goto, M.: Analysis of musical audio signals. In: Wang, D., Brown, G.J. (eds.) Computational Auditory Scene Analysis, ch. 8, pp. 251–295. Wiley Interscience, Hoboken (2006)
Gomez, E., Bonada, J.: Tonality visualization of polyphonic audio. In: Proc. ICMC (2005)
Mardirossian, A., Chew, E.: Visualizing music: Tonal progressions and distributions. In: Proc. ISMIR (2007)
Yoshii, K., Goto, M.: Music thumbniler: Visualizing musical pieces in thumbnail images based on acoustic features. In: Proc. ISMIR, pp. 212–216 (2008)
Kimi, H.-G., Moreau, N., Sikora, T.: MPEG-7 Audio and Beyond. Wiley, Chichester (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Kitahara, T. (2010). Mid-level Representations of Musical Audio Signals for Music Information Retrieval. In: Raś, Z.W., Wieczorkowska, A.A. (eds) Advances in Music Information Retrieval. Studies in Computational Intelligence, vol 274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11674-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-11674-2_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11673-5
Online ISBN: 978-3-642-11674-2
eBook Packages: EngineeringEngineering (R0)