Skip to main content

Mid-level Representations of Musical Audio Signals for Music Information Retrieval

  • Chapter
Advances in Music Information Retrieval

Part of the book series: Studies in Computational Intelligence ((SCI,volume 274))

Abstract

In this chapter, we introduce mid-level representations of music for content-based music information retrieval (MIR). Although low-level features such as spectral and cepstral features were widely used for audio-based MIR, the necessity for developing more musically meaningful representations has recently been recognized. Here, we review attempts of exploring new representations of music based on this motivation. Such representations are called mid − level representations because they have levels of abstraction between those of waveform representations and MIDI-like symbolic representations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Good, M.: MusicXML: An internet-friendly format for sheet music. In: The XML 2001 Conf. Proc. (2001)

    Google Scholar 

  2. Bellini, P., Nesi, P.: WEDELMUSIC format: An XML music notation format for emerging applications. In: Proc. Int’l Conf.WEB Delivering of Music, pp. 79–86 (2001)

    Google Scholar 

  3. Klapuri, A., Davy, M. (eds.): Signal Processing Methods for Music Transcription. Springer, Heidelberg (2006)

    Google Scholar 

  4. Ellis, D., Rosenthal, D.F.: Mid-level representations for computational auditory scene analysis. In: Rosenthal, D.F., Okuno, H.G. (eds.) Computational auditory scene analysis, ch. 17, pp. 257–272. Lawrence Erlbaum, Mahwah (1998)

    Google Scholar 

  5. Goto, M.: A real-time music-scene-description system: Predominant-F0 estimation for detecting melody and bass lines in real-world audio signals. Speech Comm. 43(4), 311–329 (2004)

    Article  Google Scholar 

  6. Goto, M.: A robust predominant-f0 estimation method for real-time detection of melody and bass lines in CD recordings. In: Proc. ICASSP, vol. II, pp. 757–760 (2000)

    Google Scholar 

  7. Marolt, M.: A mid-level melody-based representation for calculating audio similarity. In: Proc. ISMIR (2006)

    Google Scholar 

  8. Sagayama, S., Takahashi, K., Kameoka, H., Nishimoto, T.: Specmurt anasylis: A piano-roll-visualization of polyphonic music by deconvolution of log-frequency spectrum. In: Proc. SAPA (2004)

    Google Scholar 

  9. Abdallah, S.A., Plumbley, M.D.: Polyphonic music transcription by non-negative sparse coding of power spectra. In: Proc. ISMIR, pp. 318–325 (2004)

    Google Scholar 

  10. Virtanen, T.: Unsupervised learning methods for source separation in monaural music signals. In: Klapuri, A., Davy, M. (eds.) Signal Processing Methods for Music Transcription. Springer, Heidelberg (2006)

    Google Scholar 

  11. Saito, S., Kameoka, H., Nishimoto, T., Sagamaya, S.: Specmurt analysis of multi-pitch music signals with adaptive estimation of common harmonic structure. In: Proc. ISMIR, pp. 84–91 (2005)

    Google Scholar 

  12. Leveau, P., Vincent, E., Richard, G., Daudet, L.: Instrument-specific harmonic atoms for mid-level music representation. IEEE Trans., Audio, Speech, Lang., Process. 16(1), 116–128 (2008)

    Article  Google Scholar 

  13. Fujishima, T.: Realtime chord recognition of musical sound: a system using common Lisp music. In: Proc. ICMC, pp. 464–467 (1999)

    Google Scholar 

  14. Sheh, A., Ellis, D.P.W.: Chord segmentation and recognition using EM-trained hidden Markov models. In: Proc. ISMIR (2003)

    Google Scholar 

  15. Yoshioka, T., Kitahara, T., Komatani, K., Ogata, T., Okuno, H.G.: Automatic chord transcription with concurrent recognition of chord symbols and boundaries. In: Proc. ISMIR, pp. 100–105 (2004)

    Google Scholar 

  16. Bello, J.P., Pickens, J.: A robust mid-level representation for harmonic content in music signals. In: Proc. ISMIR (2005)

    Google Scholar 

  17. Cabral, G., Pachet, F., Briot, J.-P.: Automatic X traditional descriptor extraction: The case of chord recognition. In: Proc. ISMIR (2005)

    Google Scholar 

  18. Lee, K., Slaney, M.: A unified system for chord transcription and key extraction using hidden Markov models. In: Proc. ISMIR (2007)

    Google Scholar 

  19. Goto, M.: Music scene description. In: Klapuri, A., Davy, M. (eds.) Signal Processing Method for Music Transcription, ch. 11, pp. 327–359. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  20. Shepard, R.N.: Circularity in judgments of relative pitch. J. Acoust. Soc. Am. 36(12), 2346–2353 (1964)

    Article  Google Scholar 

  21. Fujisawa, T.X., Tani, M., Nagata, N., Katayose, H.: Music mood visualization based on quantitative model of chord perception. IPSJ Journal 50(3) (2009) (in Japanese)

    Google Scholar 

  22. Cook, N.D., Fujisawa, T.X.: The psychophysics of harmony perception: Harmony is a three-tone phenomenon. Empirical Musicology Review 1(2), 106–126 (2006)

    Google Scholar 

  23. Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002)

    Article  Google Scholar 

  24. Lu, L., Liu, D., Zhang, H.-J.: Automatic mood detection and tracking of music audio signals. IEEE Trans. Audio, Speech, Lang. Process. 14(1) (2006)

    Google Scholar 

  25. Pampalk, E.: Computational Models of Music Similarity and their Application in Music Information Retrieval. PhD thesis, Technischen Universitat Wien (2006)

    Google Scholar 

  26. Aucouturier, J.-J., Pachet, F.: Improving timbre similarity: How high’s the sky? Journal of Negative Results in Speech and Audio Sciences (2004)

    Google Scholar 

  27. Herrera-Boyer, P., Klapuri, A., Davy, M.: Automatic classification of pitched instrument sounds. In: Klapuri, A., Davy, M. (eds.) Signal Processing Methods for Music Transcription. Springer, Heidelberg (2006)

    Google Scholar 

  28. Bregman, A.S.: Auditory Scene Analysis: The Perceptual Organization of Sound. MIT Press, Cambridge (1990)

    Google Scholar 

  29. Namba, S.: Definition of timbre. J. Acoust. Soc. Jpn. 49(11), 823–831 (1993) (in Japanese)

    Google Scholar 

  30. Martin, K.D.: Sound-Source Recognition: A Theory and Computational Model. PhD thesis, MIT (1999)

    Google Scholar 

  31. Brown, J.C.: Computer identification of musical instruments using pattern recognition with cepstral coefficients as features. J. Acoust. Soc. Am. 103(3), 1933–1941 (1999)

    Article  Google Scholar 

  32. Eronen, A., Klapuri, A.: Musical instrument recognition using cepstral coefficients and temporal features. In: Proc. ICASSP, pp. 735–756 (2000)

    Google Scholar 

  33. Fujinaga, I., MacMillan, K.: Realtime recognition of orchestral instruments. In: Proc. ICMC, pp. 141–143 (2000)

    Google Scholar 

  34. Marques, J., Moreno, P.J.: A study of musical instrument classification using Gaussian mixture models and support vector machines. CRL Technical Report Series CRL/4, Compaq Cambridge Research Laboratory (1999)

    Google Scholar 

  35. Kitahara, T., Goto, M., Okuno, H.G.: Musical instrument identification based on F0-dependent multivariate normal distribution. In: Proc. ICASSP, vol. V, pp. 421–424 (2003)

    Google Scholar 

  36. Livshin, A.A., Peeters, G., Rodet, X.: Studies and improvements in automatic classification of musical sound samples. In: Proc. ICMC, pp. 171–174 (2003)

    Google Scholar 

  37. Essid, S., Richard, G., David, B.: Musical instrument recognition by pairwise classification strategies. IEEE Trans. Audio, Speech, Lang. Process. 14(4), 1401–1412 (2006)

    Article  Google Scholar 

  38. Kashino, K., Nakadai, K., Kinoshita, T., Tanaka, H.: Application of the Bayesian probability network to music scene analysis. In: Rosenthal, D.F., Okuno, H.G. (eds.) Computational Auditory Scene Analysis, pp. 115–137. Lawrence Erlbaum Associates, Mahwah (1998)

    Google Scholar 

  39. Kashino, K., Murase, H.: A sound source identification system for ensemble music based on template adaptation and music stream extraction. Speech Comm. 27, 337–349 (1999)

    Article  Google Scholar 

  40. Kinoshita, T., Sakai, S., Tanaka, H.: Musical sound source identification based on frequency component adaptation. In: Proc. IJCAI CASA Workshop, pp. 18–24 (1999)

    Google Scholar 

  41. Eggink, J., Brown, G.J.: A missing feature approach to instrument identification in polyphonic music. In: Proc. ICASSP, vol. V, pp. 553–556 (2003)

    Google Scholar 

  42. Eggink, J., Brown, G.J.: Application of missing feature theory to the recognition of musical instruments in polyphonic audio. In: Proc. ISMIR (2003)

    Google Scholar 

  43. Vincent, E., Rodet, X.: Instrument identification in solo and ensemble music using independent subspace analysis. In: Proc. ISMIR, pp. 576–581 (2004)

    Google Scholar 

  44. Essid, S., Richard, G., David, B.: Instrument recognition in polyphonic music based on automatic taxonomies. IEEE Trans. Audio, Speech, Lang. Process. 14(1), 68–80 (2006)

    Article  Google Scholar 

  45. Kitahara, T., Goto, M., Komatani, K., Ogata, T., Okuno, H.G.: Instrument identification in polyphonic music: Feature weighting to minimize influence of sound overlaps. EURAIP J. Adv. Signal Processing 2007(51979), 1–15 (2007)

    Article  Google Scholar 

  46. Kitahara, T., Goto, M., Komatani, K., Ogata, T., Okuno, H.G.: Instrogram: A new musical instrument recognition technique without using onset detection nor F0 estimation. In: Proc. ICASSP, vol. V, pp. 229–232 (2006)

    Google Scholar 

  47. Kitahara, T., Goto, M., Komatani, K., Ogata, T., Okuno, H.G.: Instrogram: Probabilistic representation of instrument existence for polyphonic music. IPSJ Journal 48(1), 214–226 (2007); (also published in IPSJ Digital Courier, vol.3, pp.1–13)

    Google Scholar 

  48. Kitahara, T.: Computational Musical Instrument Recognition and Its Application to Content-based Music Information Retrieval. PhD thesis, Kyoto University (2006)

    Google Scholar 

  49. Tzanetakis, G.: Manipulation, Analysis and Retrieval Systems for Audio Signals. PhD thesis, Princeton University (2002)

    Google Scholar 

  50. Bello, J.P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., Sandler, M.B.: A tutorial on onset detection in music signal. IEEE Trans. Audio, Speech, Lang. Process. 13(5), 1035–1047 (2005)

    Article  Google Scholar 

  51. Goto, M.: An audio-based real-time beat tracking system for music with or without drums. J. New Music Res. 30(2), 159–171 (2001)

    Article  Google Scholar 

  52. Davies, M.E.P., Plumbley, M.D.: Comparing mid-level representations for audio based beat tracking. In: Proc. DMRN Summer Conf., pp. 36–41 (2005)

    Google Scholar 

  53. Dixon, S., Pampalk, E., Widmer, G.: Classification of dance music by periodicity patterns. In: Proc. ISMIR (2003)

    Google Scholar 

  54. Dixon, S., Gouyon, F., Widmer, G.: Towards characteristics of music via rhythmic patterns. In: Proc. ISMIR, pp. 509–516 (2004)

    Google Scholar 

  55. Paulus, J., Klapuri, A.: Measuring the similarity of rhythmic patterns. In: Proc. ISMIR (2002)

    Google Scholar 

  56. Tsunoo, E., Ono, N., Sagayama, S.: Rhythm map: Extraction of unit rhythmic patterns and analysis of rhythmic structure from music acoustic signals. In: Proc. ICASSP, pp. 185–188 (2009)

    Google Scholar 

  57. Gouyon, F., Dixon, S.: A review of automatic rhythm description systems. Computer Music Journal 29(1), 34–54 (2005)

    Article  Google Scholar 

  58. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)

    Article  Google Scholar 

  59. Eggink, J., Brown, G.J.: Extracting melody lines from complex audio. In: Proc. ISMIR, pp. 84–91 (2004)

    Google Scholar 

  60. Totani, N., Kitahara, T., Katayose, H.: Music player with music thumbnailing and playlist generation functions based on instrumentation. In: Proc. Interaction (2008) (in Japanese)

    Google Scholar 

  61. Itoyama, K., Goto, M., Komatani, K., Ogata, T., Okuno, H.G.: Integration and adaptation of harmonic and inharmonic models for separating polyphonic musical signals. In: Proc. ICASSP, vol. I (2007)

    Google Scholar 

  62. Goto, M.: Analysis of musical audio signals. In: Wang, D., Brown, G.J. (eds.) Computational Auditory Scene Analysis, ch. 8, pp. 251–295. Wiley Interscience, Hoboken (2006)

    Google Scholar 

  63. Gomez, E., Bonada, J.: Tonality visualization of polyphonic audio. In: Proc. ICMC (2005)

    Google Scholar 

  64. Mardirossian, A., Chew, E.: Visualizing music: Tonal progressions and distributions. In: Proc. ISMIR (2007)

    Google Scholar 

  65. Yoshii, K., Goto, M.: Music thumbniler: Visualizing musical pieces in thumbnail images based on acoustic features. In: Proc. ISMIR, pp. 212–216 (2008)

    Google Scholar 

  66. Kimi, H.-G., Moreau, N., Sikora, T.: MPEG-7 Audio and Beyond. Wiley, Chichester (2005)

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Kitahara, T. (2010). Mid-level Representations of Musical Audio Signals for Music Information Retrieval. In: Raś, Z.W., Wieczorkowska, A.A. (eds) Advances in Music Information Retrieval. Studies in Computational Intelligence, vol 274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11674-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-11674-2_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-11673-5

  • Online ISBN: 978-3-642-11674-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics