Abstract
Emotions are increasingly and controversially central to our public life. Compared to text or image data, voice is the most natural and direct way to express ones’ emotions in real-time. With the increasing adoption of smart phone voice dialogue applications (e.g., Siri and Sogou Voice Assistant), the large-scale networked voice data can help us better quantitatively understand the emotional world we live in. In this paper, we study the problem of inferring public emotions from large-scale networked voice data. In particular, we first investigate the primary emotions and the underlying emotion patterns in human-mobile voice communication. Then we propose a partially-labeled factor graph model (PFG) to incorporate both acoustic features (e.g., energy, f0, MFCC, LFPC) and correlation features (e.g., individual consistency, time associativity, environment similarity) to automatically infer emotions. We evaluate the proposed model on a real dataset from Sogou Voice Assistant application. The experimental results verify the effectiveness of the proposed model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Plutchik, R.: Emotion: A Psychoevolutionary Synthesis. Harper & Row, New York (1980)
Tang, J., Zhang, Y., Sun, J., Rao, J., Yu, W., Chen, Y., Fong, A.: Quantitative study of individual emotional states in social networks. IEEE Transactions on Affective Computing 3(2), 132–144 (2012)
Wu, D., Parsons, T.D., Narayanan, S.: Acoustic feature analysis in speech emotion primitives estimation. In: Proc. of INTERSPEECH 2010, Makuhari, Japan, pp. 785–788 (2010)
Cui, D.: Analysis and Conversion for Affective Speech. Tsinghua University, Beijing (2007) (in Chinese) (doctoral dissertation)
Nicolaou, M.A., Gunes, H., Pantic, M.: Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE Transactions on Affective Computing 2(2), 92–105 (2010)
Mei, J.: Tongyici Cilin (version 2). Shanghai Dictionary Press, Shanghai (1996) (in Chinese)
Bollen, J., Mao, H., Pepe, A.: Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena. In: Proc. AAAI 2011, San Francisco, California, USA, pp. 450–453 (2011)
Kuhn, H.W.: The Hungarian Method for the assignment problem. Naval Research Logistics Quarterly 2, 83–97 (1955)
Ayadi, M.E., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: Features, classification schemes, and databases. Patter Recognition 44, 572–587 (2011)
Kawahara, H., Cheveigne, A., De, B.H., Takahashi, T., Irino, T.: Nearly Defect-free F0 Trajectory Extraction for Expressive Speech Modifications based on STRAIGHT. In: Proc. of INTERSPEECH 2005, Lisboa, pp. 537–540 (2005)
Nwe, T., Foo, S., Silva, L.D.: Speech emotion recognition using hidden Markov models. Speech Commun. 41, 603–623 (2003)
Wang, D., Narayanan, S.: An acoustic measure for word prominence in spontaneous speech. IEEE Transactions on Speech, Audio, and Language Processing 15(2), 690–701 (2007)
Tang, W., Zhuang, H., Tang, J.: Learning to infer social ties in large networks. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part III. LNCS, vol. 6913, pp. 381–397. Springer, Heidelberg (2011)
Frey, B., Dueck, D.: Mixture modeling by affinity propagation. In: Weiss, Y., Schölkopf, B., Platt, J. (eds.) NIPS, pp. 379–386 (2006)
Fasel, B., Luettin, J.: Automatic facial expression analysis: a survey. Pattern Recognition 36(1), 259–275 (2003)
Fairclough, S.H.: Fundamentals of physiological computing. Interacting with Computers 21, 133–145 (2009)
Jia, J., Wu, S., Wang, X., Hu, P., Cai, L., Tang, J.: Can We Understand van Gogh’s Mood? Learning to Infer Affects from Images in Social Networks. In: Proc. of ACM Multimedia, Nara, Japan (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Ren, Z., Jia, J., Cai, L., Zhang, K., Tang, J. (2014). Learning to Infer Public Emotions from Large-Scale Networked Voice Data. In: Gurrin, C., Hopfgartner, F., Hurst, W., Johansen, H., Lee, H., O’Connor, N. (eds) MultiMedia Modeling. MMM 2014. Lecture Notes in Computer Science, vol 8325. Springer, Cham. https://doi.org/10.1007/978-3-319-04114-8_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-04114-8_28
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04113-1
Online ISBN: 978-3-319-04114-8
eBook Packages: Computer ScienceComputer Science (R0)