On Hierarchical Clustering of Spectrogram

Sawada, Shun; Takegawa, Yoshinari; Hirata, Keiji

doi:10.1007/978-3-030-01692-0_16

On Hierarchical Clustering of Spectrogram

Shun Sawada¹⁷,
Yoshinari Takegawa¹⁷ &
Keiji Hirata¹⁷

Conference paper
First Online: 24 November 2018

999 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11265))

Abstract

We propose a new method of applying Generative Theory of Tonal Music directly to a spectrogram of music to produce a time-span segmentation as hierarchical clustering. We first consider a vertically long rectangle in a spectrogram (bin) as a pitch event and a spectrogram as a sequence of bins. The texture feature of a bin is extracted using a gray level co-occurrence matrix to generate a sequence of the texture features. The proximity and change of phrases are calculated by the distance between the adjacent bins by their texture features. The global structures such as parallelism and repetition are detected by a self-similarity matrix of a sequence of bins. We develop an algorithm which is given a sequence of the boundary strength between adjacent bins, iteratively merges adjacent bins in the bottom-up manner, and finally generates a dendrogram, which corresponds to a time-span segmentation. We conducted an experiment with inputting Mozart’s K.331 and K.550 and obtained promising results although the algorithm does not take into account almost any musical knowledge such as pitch and harmony.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Since the space is limited, for more detail, see literatures [8, 5, 6].
2.
Note that \(b_{i,i+1}\) means the strength of boundary between bins \(b_i\) and \(b_{i+1}\), and \(b_{i,i+1 i+2}\) means that between \(b_i\) and \(b_{i+1 i+2}\).

References

Chen, R., Li, M.: Music structural segmentation by combining harmonic and timbral information. In: Proceedings of ISMIR, pp. 477–482 (2011)
Google Scholar
Costa, Y.M.G., Oliveira, L.S., Koerich, A.L., Gouyon, F.: Comparing textural features for music genre classification. In: Proceedings of the 2012 International Joint Conference on Neural Networks, pp. 1867–1872 (2012)
Google Scholar
Foote, J.: Visualizing music and audio using self similarity. In: Proceedings of the 7th ACM international conference on Multimedia, pp. 77–80 (1999)
Google Scholar
Foote, J.: Automatic audio segmentation using a measure of audio novelty. In: Proceedings of IEEE International Conference on Multimedia and Expo, vol. 1, pp. 452–455 (2000)
Google Scholar
Hamanaka, M., Hirata, K., Tojo, S.: Implementing “A Generative Theory of Tonal Music”. J. New Music Res. 35(4), 249–277 (2007)
Article Google Scholar
Hamanaka, M., Hirata, K., Tojo, S.: Implementing methods for analysing music based on Lerdahl and Jackendoff’s Generative Theory of Tonal Music. Computational Music Analysis, pp. 221–249. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-25931-4_9
Chapter MATH Google Scholar
Haralick, R.M.: Statistical and structural approaches to texture. Proc. IEEE 67(5), 786–804 (1979)
Article Google Scholar
Lerdahl, F., Jackendoff, R.: A Generative Theory of Tonal Music, The MIT Press (1983)
Google Scholar
McFee, B. and Ellis, D. P. W.: Analyzing song structure with spectral clustering. In: Proceedings of ISMIR, pp. 405–410 (2014)
Google Scholar
McFee, B. and Ellis, D. P. W.: Learning to segment songs with ordinal linear discriminant analysis. In: Proceedings of ICASSP (2014)
Google Scholar
Nakashika, T., Garcia, C., Takiguchi, T.: Local-feature-map integration using convolutional neural networks for music genre classification. In: Proceedeings of Interspeech, ISCA, pp. 1752–1755 (2012)
Google Scholar
Ullrich, K., Schlüter, J., and Grill, T.: Boundary detection in music structure analysis using convolutional neural networks. In: Proceedings of ISMIR, pp. 417–422 (2014)
Google Scholar
Goto, M., Hashiguchi, H., Nishimura, T., and Oka, R.: RWC Music Database: popular, classical and jazz music databases. In: Proceedings of ISMIR, pp. 287–288 (2002)
Google Scholar

Download references

Acknowledgement

This work has been supported by JSPS Kakenhi 16H01744.

Author information

Authors and Affiliations

Future University Hakodate, Hokkaido, 041-8655, Japan
Shun Sawada, Yoshinari Takegawa & Keiji Hirata

Authors

Shun Sawada
View author publications
You can also search for this author in PubMed Google Scholar
Yoshinari Takegawa
View author publications
You can also search for this author in PubMed Google Scholar
Keiji Hirata
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shun Sawada .

Editor information

Editors and Affiliations

Laboratoire PRISM, AMU-CNRS, Marseille, France
Mitsuko Aramaki
INESC TEC, Porto, Portugal
Matthew E. P. Davies
Laboratoire PRISM, AMU-CNRS, Marseille, France
Richard Kronland-Martinet
Laboratoire PRISM, AMU-CNRS, Marseille, France
Sølvi Ystad

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sawada, S., Takegawa, Y., Hirata, K. (2018). On Hierarchical Clustering of Spectrogram. In: Aramaki, M., Davies , M., Kronland-Martinet, R., Ystad, S. (eds) Music Technology with Swing. CMMR 2017. Lecture Notes in Computer Science(), vol 11265. Springer, Cham. https://doi.org/10.1007/978-3-030-01692-0_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-01692-0_16
Published: 24 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01691-3
Online ISBN: 978-3-030-01692-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics