Learning to Infer Public Emotions from Large-Scale Networked Voice Data

Ren, Zhu; Jia, Jia; Cai, Lianhong; Zhang, Kuo; Tang, Jie

doi:10.1007/978-3-319-04114-8_28

Zhu Ren^22,23,
Jia Jia^22,23,
Lianhong Cai^22,23,
Kuo Zhang²⁴ &
…
Jie Tang²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8325))

Included in the following conference series:

International Conference on Multimedia Modeling

3359 Accesses
4 Citations

Abstract

Emotions are increasingly and controversially central to our public life. Compared to text or image data, voice is the most natural and direct way to express ones’ emotions in real-time. With the increasing adoption of smart phone voice dialogue applications (e.g., Siri and Sogou Voice Assistant), the large-scale networked voice data can help us better quantitatively understand the emotional world we live in. In this paper, we study the problem of inferring public emotions from large-scale networked voice data. In particular, we first investigate the primary emotions and the underlying emotion patterns in human-mobile voice communication. Then we propose a partially-labeled factor graph model (PFG) to incorporate both acoustic features (e.g., energy, f0, MFCC, LFPC) and correlation features (e.g., individual consistency, time associativity, environment similarity) to automatically infer emotions. We evaluate the proposed model on a real dataset from Sogou Voice Assistant application. The experimental results verify the effectiveness of the proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Plutchik, R.: Emotion: A Psychoevolutionary Synthesis. Harper & Row, New York (1980)
Google Scholar
Tang, J., Zhang, Y., Sun, J., Rao, J., Yu, W., Chen, Y., Fong, A.: Quantitative study of individual emotional states in social networks. IEEE Transactions on Affective Computing 3(2), 132–144 (2012)
Article Google Scholar
Wu, D., Parsons, T.D., Narayanan, S.: Acoustic feature analysis in speech emotion primitives estimation. In: Proc. of INTERSPEECH 2010, Makuhari, Japan, pp. 785–788 (2010)
Google Scholar
Cui, D.: Analysis and Conversion for Affective Speech. Tsinghua University, Beijing (2007) (in Chinese) (doctoral dissertation)
Google Scholar
Nicolaou, M.A., Gunes, H., Pantic, M.: Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE Transactions on Affective Computing 2(2), 92–105 (2010)
Article Google Scholar
Mei, J.: Tongyici Cilin (version 2). Shanghai Dictionary Press, Shanghai (1996) (in Chinese)
Google Scholar
Bollen, J., Mao, H., Pepe, A.: Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena. In: Proc. AAAI 2011, San Francisco, California, USA, pp. 450–453 (2011)
Google Scholar
Kuhn, H.W.: The Hungarian Method for the assignment problem. Naval Research Logistics Quarterly 2, 83–97 (1955)
Article MathSciNet Google Scholar
Ayadi, M.E., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: Features, classification schemes, and databases. Patter Recognition 44, 572–587 (2011)
Article MATH Google Scholar
Kawahara, H., Cheveigne, A., De, B.H., Takahashi, T., Irino, T.: Nearly Defect-free F0 Trajectory Extraction for Expressive Speech Modifications based on STRAIGHT. In: Proc. of INTERSPEECH 2005, Lisboa, pp. 537–540 (2005)
Google Scholar
Nwe, T., Foo, S., Silva, L.D.: Speech emotion recognition using hidden Markov models. Speech Commun. 41, 603–623 (2003)
Article Google Scholar
Wang, D., Narayanan, S.: An acoustic measure for word prominence in spontaneous speech. IEEE Transactions on Speech, Audio, and Language Processing 15(2), 690–701 (2007)
Article Google Scholar
Tang, W., Zhuang, H., Tang, J.: Learning to infer social ties in large networks. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part III. LNCS, vol. 6913, pp. 381–397. Springer, Heidelberg (2011)
Chapter Google Scholar
Frey, B., Dueck, D.: Mixture modeling by affinity propagation. In: Weiss, Y., Schölkopf, B., Platt, J. (eds.) NIPS, pp. 379–386 (2006)
Google Scholar
Fasel, B., Luettin, J.: Automatic facial expression analysis: a survey. Pattern Recognition 36(1), 259–275 (2003)
Article MATH Google Scholar
Fairclough, S.H.: Fundamentals of physiological computing. Interacting with Computers 21, 133–145 (2009)
Article Google Scholar
Jia, J., Wu, S., Wang, X., Hu, P., Cai, L., Tang, J.: Can We Understand van Gogh’s Mood? Learning to Infer Affects from Images in Social Networks. In: Proc. of ACM Multimedia, Nara, Japan (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, China
Zhu Ren, Jia Jia, Lianhong Cai & Jie Tang
TNList and Key Laboratory of Pervasive Computing, Ministry of Education, China
Zhu Ren, Jia Jia & Lianhong Cai
Sogou Corporation, Beijing, China
Kuo Zhang

Authors

Zhu Ren
View author publications
You can also search for this author in PubMed Google Scholar
Jia Jia
View author publications
You can also search for this author in PubMed Google Scholar
Lianhong Cai
View author publications
You can also search for this author in PubMed Google Scholar
Kuo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jie Tang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing, Dublin City University, Dublin 9, Ireland
Cathal Gurrin
Fakultät IV für Elektrotechnik und Informatik, Technische Universität Berlin / DAI-Labor, 10587, Berlin, Germany
Frank Hopfgartner
Department of Information and Computing Sciences, Universiteit Utrecht, 3584 CC, Utrecht, The Netherlands
Wolfgang Hurst
UiT The Arctic University of Norway, 9019, Tromsø, Norway
Håvard Johansen
Singapore University of Technology and Design, Singapore
Hyowon Lee
School of Electrical Engineering, Dublin City University, Ireland
Noel O’Connor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ren, Z., Jia, J., Cai, L., Zhang, K., Tang, J. (2014). Learning to Infer Public Emotions from Large-Scale Networked Voice Data. In: Gurrin, C., Hopfgartner, F., Hurst, W., Johansen, H., Lee, H., O’Connor, N. (eds) MultiMedia Modeling. MMM 2014. Lecture Notes in Computer Science, vol 8325. Springer, Cham. https://doi.org/10.1007/978-3-319-04114-8_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-04114-8_28
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04113-1
Online ISBN: 978-3-319-04114-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics