Abstract
Much of the appeal of music lies in its power to convey emotions/moods and to evoke them in listeners. In consequence, the past decade witnessed a growing interest in modeling emotions from musical signals in the music information retrieval (MIR) community. In this chapter, we present a novel generative approach to music emotion modeling, with a specific focus on the valence–arousal (VA) dimension model of emotion. The presented generative model, called acoustic emotion Gaussians (AEG), better accounts for the subjectivity of emotion perception by the use of probability distributions. Specifically, it learns from the emotion annotations of multiple subjects a Gaussian mixture model in the VA space with prior constraints on the corresponding acoustic features of the training music pieces. Such a computational framework is technically sound, capable of learning in an online fashion, and thus applicable to a variety of applications, including user-independent (general) and user-dependent (personalized) emotion recognition, emotion-based music retrieval, and tag-to-VA projection. We report evaluations of the aforementioned applications of AEG on a larger-scale emotion-annotated corpora, AMG1608, to demonstrate the effectiveness of AEG and to showcase how evaluations are conducted for research on emotion-based MIR. Directions of future work are also discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barthet, M., Fazekas, G., Sandler, M.: Multidisciplinary perspectives on music emotion recognition: implications for content and context-based models. In: Proceedings International Symposium Computer Music Modeling and Retrieval, pp. 492–507 (2012)
Bigand, E., Vieillard, S., Madurell, F., Marozeau, J., Dacquet, A.: Multidimensional scaling of emotional responses to music: the effect of musical expertise and of the duration of the excerpts. Cogn. Emot. 19(8), 1113–1139 (2005)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Bottou, L.: Online algorithms and stochastic approximations. In: Saad, D. (ed.) Online Learning and Neural Networks. Cambridge University Press, Cambridge (1998)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:39 (2011)
Chen, Y.A., Wang, J.C., Yang, Y.H., Chen, H.H.: Linear regression-based adaptation of music emotion recognition models for personalization. In: Proceedings IEEE International Conference Acoustics, Speech, and Signal Processing, pp. 2149–2153 (2014)
Chen, Y.A., Yang, Y.H., Wang, J.C., Chen, H.H.: The AMG1608 dataset for music emotion recognition. In: Proceedings IEEE International Conference Acoustics, Speech, and Signal Processing (2015). http://mpac.ee.ntu.edu.tw/dataset/AMG1608/
Chou, W.: Minimum classification error approach in pattern recognition. In: Chou, W., Juang, B.H. (eds.) Pattern Recognition in Speech and Language Processing. CRC Press, New York (2003)
Collier, G.: Beyond valence and activity in the emotional connotations of music. Psychol. Music 35(1), 110–131 (2007)
Davis, J.V., Dhillon, I.S.: Differential entropic clustering of multivariate Gaussians. Adv. Neural Inf. Process. Syst. 19, 337–344 (2007)
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)
Eerola, T.: Modelling emotions in music: advances in conceptual, contextual and validity issues. In: Proceedings AES International Conference (2014)
Eerola, T., Vuoskoski, J.K.: A comparison of the discrete and dimensional models of emotion in music. Psychol. Music 39, 18–49 (2010)
Gabrielsson, A.: Emotion perceived and emotion felt: same or different? Musicae Scientiae pp. 123–147 (2002)
Gauvain, J., Lee, C.H.: Maximum a posteriori estimation for multivariate Gaussian mixture observations of markov chains. IEEE Trans. Speech Audio Process. 2, 291–298 (1994)
Gillet, O., Richard, G.: Automatic transcription of drum loops. In: Proceedings IEEE International Conference Acoutstics, Speech, and Signal Processing, pp. 269–272 (2004)
Hallam, S., Cross, I., Thaut, M.: The Oxford Handbook of Music Psychology. Oxford University Press, Oxford (2008)
Hevner, K.: Expression in music: a discussion of experimental studies and theories. Psychol. Rev. 48(2), 186–204 (1935)
Hoffman, M., Blei, D., Cook, P.: Easy as CBA: a simple probabilistic model for tagging music. In: Proceedings International Society Music Information Retrieval Conference, pp. 369–374 (2009)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings ACM SIGIR Conference Research and Development in Information Retrieval, pp. 50–57 (1999)
Hu, X., Downie, J.S.: When lyrics outperform audio for music mood classification: a feature analysis. In: Proceedings International Society Music Information Retrieval Conference, pp. 619–624 (2010)
Hu, X., Yang, Y.H.: A study on cross-cultural and cross-dataset generalizability of music mood regression models. In: Proceedings Sound and Music Computing Conference (2014)
Hu, X., Downie, J.S., Laurier, C., Bay, M., Ehmann, A.F.: The 2007 MIREX audio mood classification task: Lessons learned. In: Proceedings International Society Music Information Retrieval Conference, pp. 462–467 (2008)
Huq, A., Bello, J.P., Rowe, R.: Automated music emotion recognition: a systematic evaluation. J. New Music Res. 39(3), 227–244 (2010)
Huron, D.: Sweet Anticipation: Music and the Psychology of Expectation. MIT Press, Cambridge (2006)
Imbrasaite, V., Baltrusaitis, T., Robinson, P.: Emotion tracking in music using continuous conditional random fields and relative feature representation. In: Proceedings International Works Affective Analysis in Multimedia (2013)
Jarvelin, K., Kekalainen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)
Juang, B.H., Chou, W., Lee, C.H.: Minimum classification error rate methods for speech recognition. IEEE Trans. Speech Audio Process. 5(3), 257–265 (1997)
Juslin, P.N.: Cue utilization in communication of emotion in music performance: relating performance to perception. J. Exp. Psychol. Hum. Percept. Perform. 16(6), 1797–1813 (2000)
Juslin, P., Laukka, P.: Expression, perception, and induction of musical emotions: a review and a questionnaire study of everyday listening. J. New Music Res. 33(3), 217–238 (2004)
Juslin, P.N., Sloboda, J.A.: Music and Emotion: Theory and Research. Oxford University Press, New York (2001)
Kim, Y.E., Schmidt, E.M., Migneco, R., Morton, B.G., Richardson, P., Scott, J.J., Speck, J.A., Turnbull, D.: Music emotion recognition: A state of the art review. In: Proceedings International Society Music Information Retrieval Conference, pp. 255–266 (2010)
Korhonen, M.D., Clausi, D.A., Jernigan, M.E.: Modeling emotional content of music using system identification. IEEE Trans. Syst. Man Cybern. 36(3), 588–599 (2006)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Lartillot, O., Toiviainen, P.: A matlab toolbox for musical feature extraction from audio. In: Proceedings International Conference Digital Audio Effects, pp. 237–244 (2007)
Lonsdale, A.J., North, A.C.: Why do we listen to music? A uses and gratifications analysis. Br. J. Psychol. 102, 108–134 (2011)
Lu, L., Liu, D., Zhang, H.: Automatic mood detection and tracking of music audio signals. IEEE Trans. Audio Speech Lang. Process. 14(1), 5–18 (2006)
MacDorman, K.F., Ough, S., Ho, C.C.: Automatic emotion prediction of song excerpts: index construction, algorithm design, and empirical comparison. J. New Music Res. 36(4), 281–299 (2007)
Madsen, J., Jensen, B.S., Larsen, J.: Modeling temporal structure in music for emotion prediction using pairwise comparisons. In: Proceedings International Society Music Information Retrieval Conference, pp. 319–324 (2014)
Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975)
Mathieu, B., Essid, S., Fillon, T., Prado, J., Richard, G.: YAAFE, an easy to use and efficient audio feature extraction software. In: Proceedings International Society Music Information Retrieval Conference, pp. 441–446 (2010)
Panda, R., Rocha, B., Paiva, R.P.: Dimensional music emotion recognition: Combining standard and melodic audio features. In: Proceedings International Symposium Computer Music Modeling and Retrieval (2013)
Paolacci, G., Chandler, J., Ipeirotis, P.: Running experiments on Amazon Mechanical Turk. Judgm. Decis. Making 5(5), 411–419 (2010)
Peeters, G.: A large set of audio features for sound description (similarity and classification) in the CUIDADO project. Technical report, IRCAM, Paris, France (2004)
Pesek, M., et al.: Gathering a dataset of multi-modal mood-dependent perceptual responses to music. In: Proceedings the EMPIRE Workshop (2014)
Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., Moy, L.: Learning from crowds. J. Mach. Learn. Res. 11, 1297–1322 (2010)
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digital Signal Process. 10(1–3), 19–41 (2000)
Russell, J.A.: A circumplex model of affect. J. Pers. Social Sci. 39(6), 1161–1178 (1980)
Saari, P., Eerola, T.: Semantic computing of moods based on tags in social media of music. IEEE Trans. Knowl. Data Eng. 26(10), 2548–2560 (2014)
Saari, P., Eerola, T., Fazekasy, G., Barthet, M., Lartillot, O., Sandler, M.: The role of audio and tags in music mood prediction: a study using semantic layer projection. In: Proceedings International Society Music Information Retrieval Conference, pp. 201–206 (2013)
Schmidt, E.M., Kim, Y.E.: Prediction of time-varying musical mood distributions from audio. In: Proceedings International Society Music Information Retrieval Conference, pp. 465–470 (2010)
Schmidt, E.M., Kim, Y.E.: Modeling musical emotion dynamics with conditional random fields. In: Proceedings International Society Music Information Retrieval Conference, pp. 777–782 (2011)
Schmidt, E.M., Kim, Y.E.: Learning rhythm and melody features with deep belief networks. In: Proceedings International Society Music Information Retrieval Conference, pp. 21–26 (2013)
Schölkopf, B., Smola, A.J., Williamson, R.C., Bartlett, P.L.: New support vector algorithms. Neural Comput. 12, 1207–1245 (2000)
Schubert, E.: Modeling perceived emotion with continuous musical features. Music Percept. 21(4), 561–585 (2004)
Schuller, B., Hage, C., Schuller, D., Rigoll, G.: ‘Mister D.J., Cheer Me Up!’: musical and textual features for automatic mood classification. J. New Music Res. 39(1), 13–34 (2010)
Sen, A., Srivastava, M.S.: Regression Analysis: Theory, Methods, and Applications. Springer Science & Business Media (1990)
Soleymani, M., Caro, M.N., Schmidt, E., Sha, C.Y., Yang, Y.H.: 1000 songs for emotional analysis of music. In: Proceedings International Workshop Crowdsourcing for Multimedia, pp. 1–6 (2013)
Soleymani, M., Aljanaki, A., Yang, Y.H., Caro, M.N., Eyben, F., Markov, K., Schuller, B., Veltkamp, R., Weninger, F., Wiering, F.: Emotional analysis of music: a comparison of methods. In: Proceedings ACM Multimedia, pp. 1161–1164 (2014)
Su, L., Yeh, C.C.M., Liu, J.Y., Wang, J.C., Yang, Y.H.: A systematic evaluation of the bag-of-frames representation for music information retrieval. IEEE Trans. Multimedia 16(5), 1188–1200 (2014)
Wang, M.Y., Zhang, N.Y., Zhu, H.C.: User-adaptive music emotion recognition. In: Proceedings IEEE International Conference Signal Processing, pp. 1352–1355 (2004)
Wang, J.C., Lee, H.S., Wang, H.M., Jeng, S.K.: Learning the similarity of audio music in bag-of-frames representation from tagged music data. In: Proceedings International Society Music Information Retrieval Conference, pp. 85–90 (2011)
Wang, J.C., Wang, H.M., Jeng, S.K.: Playing with tagging: a real-time tagging music player. In: Proceedings IEEE International Conference Acoustics, Speech, and Signal Processing, pp. 77–80 (2012)
Wang, J.C., Yang, Y.H., Chang, K., Wang, H.M., Jeng, S.K.: Exploring the relationship between categorical and dimensional emotion semantics of music. In: Proceedings ACM International Workshop Music Information Retrieval with User-Centered and Multimodal Strategies, pp. 63–68 (2012)
Wang, J.C., Yang, Y.H., Jhuo, I., Lin, Y.Y., Wang, H.M.: The acousticvisual emotion Gaussians model for automatic generation of music video. In: Proceedings ACM Multimedia, pp. 1379–1380 (2012)
Wang, J.C., Yang, Y.H., Wang, H.M., Jeng, S.K.: The acoustic emotion Gaussians model for emotion-based music annotation and retrieval. In: Proceedings ACM Multimedia, pp. 89–98 (2012)
Wang, J.C., Yang, Y.H., Wang, H.M., Jeng, S.K.: Personalized music emotion recognition via model adaptation. In: Proceedings APSIPA Annual Summit & Conference (2012)
Wang, X., Wu, Y., Chen, X., Yang, D.: A two-layer model for music pleasure regression. In: Proceedings International Workshop Affective Analysis in Multimedia (2013)
Wang, S.Y., Wang, J.C., Yang, Y.H., Wang, H.M.: Towards time-varying music auto-tagging based on CAL500 expansion. In: Proceedings IEEE International Conference Multimedia and Expo, pp. 1–6 (2014)
Wang, J.C., Wang, H.M., Lanckriet, G.: A histogram density modeling approach to music emotion recognition. In: Proceedings IEEE International Conference Acoustics, Speech, and Signal Processing, pp. 698–702 (2015)
Wang, J.C., Yang, Y.H., Wang, H.M., Jeng, S.K.: Modeling the affective content of music with a Gaussian mixture model. IEEE Trans. Affect. Comput. 6(1), 56–68 (2015)
Weninger, F., Eyben, F., Schuller, B.: On-line continuous-time music mood regression with deep recurrent neural networks. In: Proceedings IEEE International Conference Acoustics, Speech, and Signal Processing, pp. 5449–5453 (2014)
Yang, Y.H., Chen, H.H.: Music Emotion Recognition. CRC Press, Boca Raton (2011)
Yang, Y.H., Chen, H.H.: Predicting the distribution of perceived emotions of a music signal for content retrieval. IEEE Trans. Audio Speech Lang. Process. 19(7), 2184–2196 (2011)
Yang, Y.H., Chen, H.H.: Ranking-based emotion recognition for music organization and retrieval. IEEE Trans. Audio Speech Lang. Process. 19(4), 762–774 (2011)
Yang, Y.H., Chen, H.H.: Machine recognition of music emotion: a review. ACM Trans. Intell. Syst. Technol. 3(4) (2012)
Yang, Y.H., Liu, J.Y.: Quantitative study of music listening behavior in a social and affective context. IEEE Trans. Multimedia 15(6), 1304–1315 (2013)
Yang, Y.H., Su, Y.F., Lin, Y.C., Chen, H.H.: Music emotion recognition: The role of individuality. In: Proceedings ACM International Workshop Human-Centered Multimedia, pp. 13–21 (2007)
Yang, Y.H., Lin, Y.C., Cheng, H.T., Chen, H.H.: Mr. Emo: Music retrieval in the emotion plane. In: Proceedings ACM Multimedia, pp. 1003–1004 (2008)
Yang, Y.H., Lin, Y.C., Su, Y.F., Chen, H.H.: A regression approach to music emotion recognition. IEEE Trans. Audio Speech Lang. Process. 16(2), 448–457 (2008)
Yang, Y.H., Lin, Y.C., Chen, H.H.: Personalized music emotion recognition. In: Proceedings ACM SIGIR International Conference Research and Development in Information Retrieval, pp. 748–749 (2009)
Yang, Y.H., Wang, J.C., Chen, Y.A., Chen, H.H.: Model adaptation for personalized music emotion recognition. In: Chen, C.H. (ed.) Handbook of Pattern Recognition and Computer Vision, 5th Edition, World Scientific Publishing Co., Singapore (2015)
Yeh, C.C., Tseng, S.S., Tsai, P.C., Weng, J.F.: Building a personalized music emotion prediction system. In: Advances in Multimedia Information Processing-PCM 2006, pp. 730–739. Springer (2006)
Zentner, M., Grandjean, D., Scherer, K.R.: Emotions evoked by the sound of music: characterization, classification, and measurement. Emotion 8(4), 494 (2008)
Zhu, B., Liu, T.: Research on emotional vocabulary-driven personalized music retrieval. In: Edutainment, pp. 252–261 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Wang, JC., Yang, YH., Wang, HM. (2016). Affective Music Information Retrieval. In: Tkalčič, M., De Carolis, B., de Gemmis, M., Odić, A., Košir, A. (eds) Emotions and Personality in Personalized Services. Human–Computer Interaction Series. Springer, Cham. https://doi.org/10.1007/978-3-319-31413-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-31413-6_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31411-2
Online ISBN: 978-3-319-31413-6
eBook Packages: Computer ScienceComputer Science (R0)