Evaluation of Audio Feature Groups for the Prediction of Arousal and Valence in Music
Computer-aided prediction of arousal and valence ratings helps to automatically associate emotions with music pieces, providing new music categorisation and recommendation approaches, and also theoretical analysis of listening habits. The impact of several groups of music properties like timbre, harmony, melody or rhythm on perceived emotions has often been studied in literature. However, only little work has been done to extensively measure the potential of specific feature groups, when they supplement combinations of other possible features already integrated into the regression model. In our experiment, we measure the performance of multiple linear regression applied to combinations of energy, harmony, rhythm and timbre audio features to predict arousal and valence ratings. Each group is represented by a smaller number of dimensions estimated with the help of Minimum Redundancy–Maximum Relevance (MRMR) feature selection. The results show that cepstral timbre features are particularly useful to predict arousal, and rhythm features are the most relevant to predict valence.
We thank Philipp Kramer for providing the code and explanations of experiments from his bachelor’s thesis, in particular for the extraction of MFCC and OBSC features.
- Fujishima, T. (1999). Realtime chord recognition of musical sound: a system using common lisp music. In Proceedings of the international computer music conference (ICMC) (pp. 464–467).Google Scholar
- Jiang, D. N., Lu, L., Zhang, H. J., Tao, J. H., & Cai, L. H. (2002). Music type classification by spectral contrast feature. In Proceedings IEEE international conference on multimedia and expo (ICME) (vol. 1, pp. 113–116). IEEE.Google Scholar
- Katayose, H., Imai, M., & Inokuchi, S. (1988). Sentiment extraction in music. In Proceedings of the 9th international conference on pattern recognition (ICPR) (pp. 1083–1087). IEEE.Google Scholar
- Kramer, P. (2016). Relevanz cepstraler Merkmale für Vorhersagen im Arousal-Valence Modell auf Musiksignaldaten. Bachelor’s thesis. TU Dortmund: Department of Computer Science.Google Scholar
- Malik, M., Adavanne, S., Drossos, K., Virtanen, T., Ticha, D., & Jarina, R. (2017). Stacked convolutional and recurrent neural networks for music emotion recognition. CoRR. arXiv:abs/1706.02292. (2017)
- Martin, R., & Nagathil, A. (2009). Cepstral modulation ratio regression (CMRARE) parameters for audio signal analysis and classification. In Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP).Google Scholar
- Mauch, M., & Dixon, S. (2010). Approximate note transcription for the improved identification of difficult chords. In J. S. Downie, & R. C. Veltkamp (Eds.), Proceedings of the 11th international society for music information retrieval conference (ISMIR) (pp. 135–140).Google Scholar
- McFee, B., Raffel, C., Liang, D., Ellis, D., McVicar, M., & Battenberg, E. (2015). Librosa: Audio and music signal analysis in python. In Proceedings of the 14th Python in science conference (pp. 1–7).Google Scholar
- McKinney, M. F., & Breebaart, J. (2003). Features for audio and music classification. In Proceedings of international society of music information retrieval conference (ISMIR) (vol. 3, pp. 151–158).Google Scholar
- Müller, M., & Ewert, S. (2011). Chroma toolbox: MATLAB implementations for extracting variants of chroma-based audio features. In: A. Klapuri, & C. Leider (Eds.), Proceedings of the 12th international conference on music information retrieval (ISMIR) (pp. 215–220). University of Miami.Google Scholar
- Nagathil, A., & Martin, R. (2016). Signal-level features. In C. Weihs, D. Jannach, I. Vatolkin, & G. Rudolph (Eds.), Music data analysis: foundations and applications (pp. 145–164). CRC Press.Google Scholar
- Panda, R., Malheiro, R., Rocha, B., Oliveira, A., & Paiva, R. P. (2013). Multi-modal music emotion recognition: A new dataset, methodology and comparative analysis. In Proceedings of the 10th international symposium on computer music multidisciplinary research (CMMR). Berlin: Springer.Google Scholar
- Panda, R., Malheiro, R. M., & Paiva, R. P. (2018). Novel audio features for music emotion recognition. IEEE Transactions on Affective Computing, 1–1. https://doi.org/10.1109/TAFFC.2018.2820691
- Panda, R., Rocha, B., & Paiva, R. P. (2013). Dimensional music emotion recognition: Combining standard and melodic audio features. In Proceedings of the 10th international symposium on computer music multidisciplinary research (CMMR). Berlin: Springer.Google Scholar
- Scherer, K. R. (1982). Vokale Kommunikation: Nonverbale Aspekte des Sprachverhaltens. Weinheim/Basel: Beltz.Google Scholar
- Schmidt, E.M., & Kim, Y. E. (2011). Learning emotion-based acoustic features with deep belief networks. In 2011 IEEE workshop on applications of signal processing to audio and acoustics (WASPAA) (pp. 65–68). https://doi.org/10.1109/ASPAA.2011.6082328.
- Soleymani, M., Caro, M. N., Schmidt, E. M., Sha, C. Y., & Yang, Y. H. (2013). 1000 songs for emotional analysis of music. In Proceedings of the 2nd ACM international workshop on crowdsourcing for multimedia (pp. 1–6). USA: CrowdMM 13. https://doi.org/10.1145/2506364.2506365.
- Vatolkin, I., Theimer, W., & Botteck, M. (2010). AMUSE (Advanced MUSic Explorer)—a multitool framework for music data analysis. In: J. S. Downie, & R. C. Veltkamp (Eds.), Proceedings of the 11th international society on music information retrieval conference (ISMIR) (pp. 33–38).Google Scholar
- Vatolkin, I., & Rudolph, G. (2018). Comparison of audio features for recognition of western and ethnic instruments in polyphonic mixtures. In Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018 (pp. 554–560). Paris, France.Google Scholar
- Yang, Y. H., & Chen, H. H. (2011). Music emotion recognition. CRC Press.Google Scholar