A Bag-of-Tones Model with MFCC Features for Musical Genre Classification

  • Zengchang Qin
  • Wei Liu
  • Tao Wan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8346)


Musical genres are categorical labels created by humans to characterize pieces of music. These labels may be highly subjective but typically are related to the instrumentation, rhythmic structure, and harmonic content of the music. In this paper, we propose a model for music genre classification. The new model is referred to as the bag-of-tones (BOT) model which follows the conceptually similar idea of the bag-of-words (BOW) model in natural language processing and the bag-of-feature (BOF) model in image processing. The basic low-level music features such as Mel-frequency cepstral coefficients (MFCC) are clustered into a set of codewords referred to as “tones”. By using such a model, each piece of music can be represented by a new feature vector of distribution on tones. Classical machine learning models such as support vector machines (SVM) can be applied for genre classification. The model is tested using two datasets. We found that the polynomial kernel function has the best performance in the SVM classification. By comparing to the previous work, we found the new proposed model outperform classical models on a given benchmark dataset. In general, this model can be used to structure the large collections of music available on the Web. It can play an important role in automatic digital music categorization and retrieval.


bag-of-words bag-of-tones MFCC musical genre classification 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Dannenberg, R.B., Thom, B., Watson, D.: A machine learning approach to musical style recognition. In: Proc. International Computer Music Conference (1997)Google Scholar
  2. 2.
    Chai, W., Barry, V.: Folk music classification using hidden Markov models. In: Proceedings of International Conference on Artificial Intelligence, vol. 6 (2001)Google Scholar
  3. 3.
    Shan, M.K., Kuo, F.-F.: Music style mining and classification by melody. IEICE Transactions on Information and Systems 86(3), 655–659 (2003)Google Scholar
  4. 4.
    Matityaho, B., Furst, M.: Neural network based model for classification of music type. In: Eighteenth Convention of Electrical and Electronics Engineers in Israel. IEEE (1995)Google Scholar
  5. 5.
    Han, K.-P., Park, Y.-S., Jeon, S.-G., Lee, G.-C.: Genre classification system of TV sound signals based on a spectrogram analysis. IEEE Transactions on Consumer Electronics 44(1), 33–42 (1998)CrossRefGoogle Scholar
  6. 6.
    Pye, D.: Content-based methods for the management of digital music. In: Proceedings of the 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 6. IEEE (2000)Google Scholar
  7. 7.
    Jiang, D.N., Lu, L., Zhang, H.J., Tao, J.-H.: Music type classification by spectral contrast feature. In: Proceedings of the IEEE International Conference on Multimedia and Expo, ICME 2002, vol. 1. IEEE (2002)Google Scholar
  8. 8.
    Liu, N.H.: Comparison of content-based music recommendation using different distance estimation methods. Applied Intelligence 38(2), 160–174 (2013)CrossRefGoogle Scholar
  9. 9.
    Logan, B.: Mel frequency cepstral coefficients for music modeling. In: MUSIC IR (2000)Google Scholar
  10. 10.
    Qin, Z., Thint, M., Huang, Z.: Ranking answers by hierarchical topic models. In: Chien, B.-C., Hong, T.-P., Chen, S.-M., Ali, M. (eds.) IEA/AIE 2009. LNCS (LNAI), vol. 5579, pp. 103–112. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  11. 11.
    Zhao, Q., Qin, Z., Wan, T.: What is the Basic Semantic Unit of Chinese Language? A Computational Approach Based on Topic Models. In: Kanazawa, M., Kornai, A., Kracht, M., Seki, H. (eds.) MOL 12. LNCS (LNAI), vol. 6878, pp. 143–157. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  12. 12.
    Zhao, Q., Qin, Z., Wan, T.: Topic modeling of Chinese language using character-word relations. In: Lu, B.-L., Zhang, L., Kwok, J. (eds.) ICONIP 2011, Part III. LNCS, vol. 7064, pp. 139–147. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  13. 13.
    Yuan, X., Yu, J., Qin, Z., Wan, T.: A bag-of-features model with integrated SIFT-LBP features for content-based image retrieval. In: Proceedings of the International Conference on Image Processing, pp. 1061–1064 (2011)Google Scholar
  14. 14.
    Yu, J., Qin, Z., Wan, T., Zhang, X.: Feature integration analysis of bag-of-features model for image retrieval. Neurocomputing 120, 355–364 (2013)CrossRefGoogle Scholar
  15. 15.
    Lie, L., Jiang, H., Zhang, H.: A robust audio classification and segmentation method. In: Proceedings of the Ninth ACM International Conference on Multimedia (2001)Google Scholar
  16. 16.
    Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing 10, 293–302 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Zengchang Qin
    • 1
  • Wei Liu
    • 1
    • 2
  • Tao Wan
    • 3
    • 4
  1. 1.Intelligent Computing and Machine Learning Lab, School of ASEEBeihang UniversityBeijingChina
  2. 2.School of Advanced EngineeringBeihang UniversityBeijingChina
  3. 3.School of Biological Science and Medical EngineeringBeihang UniversityBeijingChina
  4. 4.Department of Biomedical EngineeringCase Western Reserve UniversityClevelandUSA

Personalised recommendations