Skip to main content

On Hierarchical Clustering of Spectrogram

  • Conference paper
  • First Online:
  • 999 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11265))

Abstract

We propose a new method of applying Generative Theory of Tonal Music directly to a spectrogram of music to produce a time-span segmentation as hierarchical clustering. We first consider a vertically long rectangle in a spectrogram (bin) as a pitch event and a spectrogram as a sequence of bins. The texture feature of a bin is extracted using a gray level co-occurrence matrix to generate a sequence of the texture features. The proximity and change of phrases are calculated by the distance between the adjacent bins by their texture features. The global structures such as parallelism and repetition are detected by a self-similarity matrix of a sequence of bins. We develop an algorithm which is given a sequence of the boundary strength between adjacent bins, iteratively merges adjacent bins in the bottom-up manner, and finally generates a dendrogram, which corresponds to a time-span segmentation. We conducted an experiment with inputting Mozart’s K.331 and K.550 and obtained promising results although the algorithm does not take into account almost any musical knowledge such as pitch and harmony.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Since the space is limited, for more detail, see literatures [8, 5, 6].

  2. 2.

    Note that \(b_{i,i+1}\) means the strength of boundary between bins \(b_i\) and \(b_{i+1}\), and \(b_{i,i+1 i+2}\) means that between \(b_i\) and \(b_{i+1 i+2}\).

References

  1. Chen, R., Li, M.: Music structural segmentation by combining harmonic and timbral information. In: Proceedings of ISMIR, pp. 477–482 (2011)

    Google Scholar 

  2. Costa, Y.M.G., Oliveira, L.S., Koerich, A.L., Gouyon, F.: Comparing textural features for music genre classification. In: Proceedings of the 2012 International Joint Conference on Neural Networks, pp. 1867–1872 (2012)

    Google Scholar 

  3. Foote, J.: Visualizing music and audio using self similarity. In: Proceedings of the 7th ACM international conference on Multimedia, pp. 77–80 (1999)

    Google Scholar 

  4. Foote, J.: Automatic audio segmentation using a measure of audio novelty. In: Proceedings of IEEE International Conference on Multimedia and Expo, vol. 1, pp. 452–455 (2000)

    Google Scholar 

  5. Hamanaka, M., Hirata, K., Tojo, S.: Implementing “A Generative Theory of Tonal Music”. J. New Music Res. 35(4), 249–277 (2007)

    Article  Google Scholar 

  6. Hamanaka, M., Hirata, K., Tojo, S.: Implementing methods for analysing music based on Lerdahl and Jackendoff’s Generative Theory of Tonal Music. Computational Music Analysis, pp. 221–249. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-25931-4_9

    Chapter  MATH  Google Scholar 

  7. Haralick, R.M.: Statistical and structural approaches to texture. Proc. IEEE 67(5), 786–804 (1979)

    Article  Google Scholar 

  8. Lerdahl, F., Jackendoff, R.: A Generative Theory of Tonal Music, The MIT Press (1983)

    Google Scholar 

  9. McFee, B. and Ellis, D. P. W.: Analyzing song structure with spectral clustering. In: Proceedings of ISMIR, pp. 405–410 (2014)

    Google Scholar 

  10. McFee, B. and Ellis, D. P. W.: Learning to segment songs with ordinal linear discriminant analysis. In: Proceedings of ICASSP (2014)

    Google Scholar 

  11. Nakashika, T., Garcia, C., Takiguchi, T.: Local-feature-map integration using convolutional neural networks for music genre classification. In: Proceedeings of Interspeech, ISCA, pp. 1752–1755 (2012)

    Google Scholar 

  12. Ullrich, K., Schlüter, J., and Grill, T.: Boundary detection in music structure analysis using convolutional neural networks. In: Proceedings of ISMIR, pp. 417–422 (2014)

    Google Scholar 

  13. Goto, M., Hashiguchi, H., Nishimura, T., and Oka, R.: RWC Music Database: popular, classical and jazz music databases. In: Proceedings of ISMIR, pp. 287–288 (2002)

    Google Scholar 

Download references

Acknowledgement

This work has been supported by JSPS Kakenhi 16H01744.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shun Sawada .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sawada, S., Takegawa, Y., Hirata, K. (2018). On Hierarchical Clustering of Spectrogram. In: Aramaki, M., Davies , M., Kronland-Martinet, R., Ystad, S. (eds) Music Technology with Swing. CMMR 2017. Lecture Notes in Computer Science(), vol 11265. Springer, Cham. https://doi.org/10.1007/978-3-030-01692-0_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-01692-0_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01691-3

  • Online ISBN: 978-3-030-01692-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics