Mid-level Representations of Musical Audio Signals for Music Information Retrieval

Kitahara, Tetsuro

doi:10.1007/978-3-642-11674-2_4

Tetsuro Kitahara^4,5

Part of the book series: Studies in Computational Intelligence ((SCI,volume 274))

2033 Accesses
2 Citations

Abstract

In this chapter, we introduce mid-level representations of music for content-based music information retrieval (MIR). Although low-level features such as spectral and cepstral features were widely used for audio-based MIR, the necessity for developing more musically meaningful representations has recently been recognized. Here, we review attempts of exploring new representations of music based on this motivation. Such representations are called mid − level representations because they have levels of abstraction between those of waveform representations and MIDI-like symbolic representations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Good, M.: MusicXML: An internet-friendly format for sheet music. In: The XML 2001 Conf. Proc. (2001)
Google Scholar
Bellini, P., Nesi, P.: WEDELMUSIC format: An XML music notation format for emerging applications. In: Proc. Int’l Conf.WEB Delivering of Music, pp. 79–86 (2001)
Google Scholar
Klapuri, A., Davy, M. (eds.): Signal Processing Methods for Music Transcription. Springer, Heidelberg (2006)
Google Scholar
Ellis, D., Rosenthal, D.F.: Mid-level representations for computational auditory scene analysis. In: Rosenthal, D.F., Okuno, H.G. (eds.) Computational auditory scene analysis, ch. 17, pp. 257–272. Lawrence Erlbaum, Mahwah (1998)
Google Scholar
Goto, M.: A real-time music-scene-description system: Predominant-F0 estimation for detecting melody and bass lines in real-world audio signals. Speech Comm. 43(4), 311–329 (2004)
Article Google Scholar
Goto, M.: A robust predominant-f0 estimation method for real-time detection of melody and bass lines in CD recordings. In: Proc. ICASSP, vol. II, pp. 757–760 (2000)
Google Scholar
Marolt, M.: A mid-level melody-based representation for calculating audio similarity. In: Proc. ISMIR (2006)
Google Scholar
Sagayama, S., Takahashi, K., Kameoka, H., Nishimoto, T.: Specmurt anasylis: A piano-roll-visualization of polyphonic music by deconvolution of log-frequency spectrum. In: Proc. SAPA (2004)
Google Scholar
Abdallah, S.A., Plumbley, M.D.: Polyphonic music transcription by non-negative sparse coding of power spectra. In: Proc. ISMIR, pp. 318–325 (2004)
Google Scholar
Virtanen, T.: Unsupervised learning methods for source separation in monaural music signals. In: Klapuri, A., Davy, M. (eds.) Signal Processing Methods for Music Transcription. Springer, Heidelberg (2006)
Google Scholar
Saito, S., Kameoka, H., Nishimoto, T., Sagamaya, S.: Specmurt analysis of multi-pitch music signals with adaptive estimation of common harmonic structure. In: Proc. ISMIR, pp. 84–91 (2005)
Google Scholar
Leveau, P., Vincent, E., Richard, G., Daudet, L.: Instrument-specific harmonic atoms for mid-level music representation. IEEE Trans., Audio, Speech, Lang., Process. 16(1), 116–128 (2008)
Article Google Scholar
Fujishima, T.: Realtime chord recognition of musical sound: a system using common Lisp music. In: Proc. ICMC, pp. 464–467 (1999)
Google Scholar
Sheh, A., Ellis, D.P.W.: Chord segmentation and recognition using EM-trained hidden Markov models. In: Proc. ISMIR (2003)
Google Scholar
Yoshioka, T., Kitahara, T., Komatani, K., Ogata, T., Okuno, H.G.: Automatic chord transcription with concurrent recognition of chord symbols and boundaries. In: Proc. ISMIR, pp. 100–105 (2004)
Google Scholar
Bello, J.P., Pickens, J.: A robust mid-level representation for harmonic content in music signals. In: Proc. ISMIR (2005)
Google Scholar
Cabral, G., Pachet, F., Briot, J.-P.: Automatic X traditional descriptor extraction: The case of chord recognition. In: Proc. ISMIR (2005)
Google Scholar
Lee, K., Slaney, M.: A unified system for chord transcription and key extraction using hidden Markov models. In: Proc. ISMIR (2007)
Google Scholar
Goto, M.: Music scene description. In: Klapuri, A., Davy, M. (eds.) Signal Processing Method for Music Transcription, ch. 11, pp. 327–359. Springer, Heidelberg (2006)
Chapter Google Scholar
Shepard, R.N.: Circularity in judgments of relative pitch. J. Acoust. Soc. Am. 36(12), 2346–2353 (1964)
Article Google Scholar
Fujisawa, T.X., Tani, M., Nagata, N., Katayose, H.: Music mood visualization based on quantitative model of chord perception. IPSJ Journal 50(3) (2009) (in Japanese)
Google Scholar
Cook, N.D., Fujisawa, T.X.: The psychophysics of harmony perception: Harmony is a three-tone phenomenon. Empirical Musicology Review 1(2), 106–126 (2006)
Google Scholar
Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002)
Article Google Scholar
Lu, L., Liu, D., Zhang, H.-J.: Automatic mood detection and tracking of music audio signals. IEEE Trans. Audio, Speech, Lang. Process. 14(1) (2006)
Google Scholar
Pampalk, E.: Computational Models of Music Similarity and their Application in Music Information Retrieval. PhD thesis, Technischen Universitat Wien (2006)
Google Scholar
Aucouturier, J.-J., Pachet, F.: Improving timbre similarity: How high’s the sky? Journal of Negative Results in Speech and Audio Sciences (2004)
Google Scholar
Herrera-Boyer, P., Klapuri, A., Davy, M.: Automatic classification of pitched instrument sounds. In: Klapuri, A., Davy, M. (eds.) Signal Processing Methods for Music Transcription. Springer, Heidelberg (2006)
Google Scholar
Bregman, A.S.: Auditory Scene Analysis: The Perceptual Organization of Sound. MIT Press, Cambridge (1990)
Google Scholar
Namba, S.: Definition of timbre. J. Acoust. Soc. Jpn. 49(11), 823–831 (1993) (in Japanese)
Google Scholar
Martin, K.D.: Sound-Source Recognition: A Theory and Computational Model. PhD thesis, MIT (1999)
Google Scholar
Brown, J.C.: Computer identification of musical instruments using pattern recognition with cepstral coefficients as features. J. Acoust. Soc. Am. 103(3), 1933–1941 (1999)
Article Google Scholar
Eronen, A., Klapuri, A.: Musical instrument recognition using cepstral coefficients and temporal features. In: Proc. ICASSP, pp. 735–756 (2000)
Google Scholar
Fujinaga, I., MacMillan, K.: Realtime recognition of orchestral instruments. In: Proc. ICMC, pp. 141–143 (2000)
Google Scholar
Marques, J., Moreno, P.J.: A study of musical instrument classification using Gaussian mixture models and support vector machines. CRL Technical Report Series CRL/4, Compaq Cambridge Research Laboratory (1999)
Google Scholar
Kitahara, T., Goto, M., Okuno, H.G.: Musical instrument identification based on F0-dependent multivariate normal distribution. In: Proc. ICASSP, vol. V, pp. 421–424 (2003)
Google Scholar
Livshin, A.A., Peeters, G., Rodet, X.: Studies and improvements in automatic classification of musical sound samples. In: Proc. ICMC, pp. 171–174 (2003)
Google Scholar
Essid, S., Richard, G., David, B.: Musical instrument recognition by pairwise classification strategies. IEEE Trans. Audio, Speech, Lang. Process. 14(4), 1401–1412 (2006)
Article Google Scholar
Kashino, K., Nakadai, K., Kinoshita, T., Tanaka, H.: Application of the Bayesian probability network to music scene analysis. In: Rosenthal, D.F., Okuno, H.G. (eds.) Computational Auditory Scene Analysis, pp. 115–137. Lawrence Erlbaum Associates, Mahwah (1998)
Google Scholar
Kashino, K., Murase, H.: A sound source identification system for ensemble music based on template adaptation and music stream extraction. Speech Comm. 27, 337–349 (1999)
Article Google Scholar
Kinoshita, T., Sakai, S., Tanaka, H.: Musical sound source identification based on frequency component adaptation. In: Proc. IJCAI CASA Workshop, pp. 18–24 (1999)
Google Scholar
Eggink, J., Brown, G.J.: A missing feature approach to instrument identification in polyphonic music. In: Proc. ICASSP, vol. V, pp. 553–556 (2003)
Google Scholar
Eggink, J., Brown, G.J.: Application of missing feature theory to the recognition of musical instruments in polyphonic audio. In: Proc. ISMIR (2003)
Google Scholar
Vincent, E., Rodet, X.: Instrument identification in solo and ensemble music using independent subspace analysis. In: Proc. ISMIR, pp. 576–581 (2004)
Google Scholar
Essid, S., Richard, G., David, B.: Instrument recognition in polyphonic music based on automatic taxonomies. IEEE Trans. Audio, Speech, Lang. Process. 14(1), 68–80 (2006)
Article Google Scholar
Kitahara, T., Goto, M., Komatani, K., Ogata, T., Okuno, H.G.: Instrument identification in polyphonic music: Feature weighting to minimize influence of sound overlaps. EURAIP J. Adv. Signal Processing 2007(51979), 1–15 (2007)
Article Google Scholar
Kitahara, T., Goto, M., Komatani, K., Ogata, T., Okuno, H.G.: Instrogram: A new musical instrument recognition technique without using onset detection nor F0 estimation. In: Proc. ICASSP, vol. V, pp. 229–232 (2006)
Google Scholar
Kitahara, T., Goto, M., Komatani, K., Ogata, T., Okuno, H.G.: Instrogram: Probabilistic representation of instrument existence for polyphonic music. IPSJ Journal 48(1), 214–226 (2007); (also published in IPSJ Digital Courier, vol.3, pp.1–13)
Google Scholar
Kitahara, T.: Computational Musical Instrument Recognition and Its Application to Content-based Music Information Retrieval. PhD thesis, Kyoto University (2006)
Google Scholar
Tzanetakis, G.: Manipulation, Analysis and Retrieval Systems for Audio Signals. PhD thesis, Princeton University (2002)
Google Scholar
Bello, J.P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., Sandler, M.B.: A tutorial on onset detection in music signal. IEEE Trans. Audio, Speech, Lang. Process. 13(5), 1035–1047 (2005)
Article Google Scholar
Goto, M.: An audio-based real-time beat tracking system for music with or without drums. J. New Music Res. 30(2), 159–171 (2001)
Article Google Scholar
Davies, M.E.P., Plumbley, M.D.: Comparing mid-level representations for audio based beat tracking. In: Proc. DMRN Summer Conf., pp. 36–41 (2005)
Google Scholar
Dixon, S., Pampalk, E., Widmer, G.: Classification of dance music by periodicity patterns. In: Proc. ISMIR (2003)
Google Scholar
Dixon, S., Gouyon, F., Widmer, G.: Towards characteristics of music via rhythmic patterns. In: Proc. ISMIR, pp. 509–516 (2004)
Google Scholar
Paulus, J., Klapuri, A.: Measuring the similarity of rhythmic patterns. In: Proc. ISMIR (2002)
Google Scholar
Tsunoo, E., Ono, N., Sagayama, S.: Rhythm map: Extraction of unit rhythmic patterns and analysis of rhythmic structure from music acoustic signals. In: Proc. ICASSP, pp. 185–188 (2009)
Google Scholar
Gouyon, F., Dixon, S.: A review of automatic rhythm description systems. Computer Music Journal 29(1), 34–54 (2005)
Article Google Scholar
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Article Google Scholar
Eggink, J., Brown, G.J.: Extracting melody lines from complex audio. In: Proc. ISMIR, pp. 84–91 (2004)
Google Scholar
Totani, N., Kitahara, T., Katayose, H.: Music player with music thumbnailing and playlist generation functions based on instrumentation. In: Proc. Interaction (2008) (in Japanese)
Google Scholar
Itoyama, K., Goto, M., Komatani, K., Ogata, T., Okuno, H.G.: Integration and adaptation of harmonic and inharmonic models for separating polyphonic musical signals. In: Proc. ICASSP, vol. I (2007)
Google Scholar
Goto, M.: Analysis of musical audio signals. In: Wang, D., Brown, G.J. (eds.) Computational Auditory Scene Analysis, ch. 8, pp. 251–295. Wiley Interscience, Hoboken (2006)
Google Scholar
Gomez, E., Bonada, J.: Tonality visualization of polyphonic audio. In: Proc. ICMC (2005)
Google Scholar
Mardirossian, A., Chew, E.: Visualizing music: Tonal progressions and distributions. In: Proc. ISMIR (2007)
Google Scholar
Yoshii, K., Goto, M.: Music thumbniler: Visualizing musical pieces in thumbnail images based on acoustic features. In: Proc. ISMIR, pp. 212–216 (2008)
Google Scholar
Kimi, H.-G., Moreau, N., Sikora, T.: MPEG-7 Audio and Beyond. Wiley, Chichester (2005)
Book Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Science and Tehcnology, Kwansei Gakuin Univeristy, 2-1 Gakuen, Sanda, 669-1337, Japan
Tetsuro Kitahara
CrestMuse Project, CREST, JST, Japan
Tetsuro Kitahara

Authors

Tetsuro Kitahara
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of North Carolina, Charlotte, NC, USA
Zbigniew W. Raś
Polish-Japanese Institute of IT, Warsaw, Poland
Alicja A. Wieczorkowska

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kitahara, T. (2010). Mid-level Representations of Musical Audio Signals for Music Information Retrieval. In: Raś, Z.W., Wieczorkowska, A.A. (eds) Advances in Music Information Retrieval. Studies in Computational Intelligence, vol 274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11674-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-11674-2_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11673-5
Online ISBN: 978-3-642-11674-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics