Predicting Relative Popularity via an End-to-End Multi-modality Model

Cai, Hongxiang; Zhang, Ya; Wang, Yanfeng; Wang, Xie; Mei, Jianping; Huang, Zhuowei

doi:10.1007/978-981-10-8108-8_32

Hongxiang Cai¹²,
Ya Zhang¹²,
Yanfeng Wang¹²,
Xie Wang¹³,
Jianping Mei¹⁴ &
…
Zhuowei Huang¹⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 815))

Included in the following conference series:

International Forum on Digital TV and Wireless Multimedia Communications

1811 Accesses
1 Citations

Abstract

Popularity prediction is important for many applications such as service design, network management and so on. Among several factors affecting popularity, content plays a key role, especially when we lack the time sequence data of historical consumption. However, exploring the influence of content-factors on popularity is not easy because of the increasing heterogeneous modalities and their sophisticated inner interplay. In this paper, we utilize several modes to predict popularity. In the meanwhile, considering that it is difficult and little significant to predict the exact number of popularity, we aim to rank pairs of content which is called relative popularity prediction. We cast the relative popularity prediction problem as a classification task and propose an end-to-end multi-modality model with the help of deep neural network. This model combines visual and textual information, maps them to a common feature space and implicitly constructs the interaction between them. Experimental result on real-world data has demonstrated the effectiveness of our model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)
Bao, P.: Modeling and predicting popularity dynamics via an influence-based self-excited hawkes process. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 1897–1900. ACM (2016)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). http://dl.acm.org/citation.cfm?id=944919.944937
MATH Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 886–893. IEEE (2005)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE (2009)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Google Scholar
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw. 18(5), 602–610 (2005)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hessel, J., Lee, L., Mimno, D.: Cats and captions vs. creators and the clock: comparing multimodal content to context in predicting relative popularity. In: Proceedings of the 26th International Conference on World Wide Web, pp. 927–936. International World Wide Web Conferences Steering Committee (2017)
Google Scholar
Hill, F., Cho, K., Korhonen, A.: Learning distributed representations of sentences from unlabelled data. arXiv preprint arXiv:1602.03483 (2016)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Jiang, L., Miao, Y., Yang, Y., Lan, Z., Hauptmann, A.G.: Viral video style: a closer look at viral videos on Youtube. In: Proceedings of International Conference on Multimedia Retrieval, p. 193. ACM (2014)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)
Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: The Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157. IEEE (1999)
Google Scholar
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)
Article MATH Google Scholar
Pinto, H., Almeida, J.M., Gonçalves, M.A.: Using early view patterns to predict the popularity of Youtube videos. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, pp. 365–374. ACM (2013)
Google Scholar
Szabo, G., Huberman, B.A.: Predicting the popularity of online content. Commun. ACM 53(8), 80–88 (2010)
Article Google Scholar
Trzcinski, T., Rokita, P.: Predicting popularity of online videos using support vector regression. IEEE Trans. Multimed. 19, 2561–2570 (2017)
Article Google Scholar
Wu, J., Zhou, Y., Chiu, D.M., Zhu, Z.: Modeling dynamics of online video popularity. IEEE Trans. Multimed. 18(9), 1882–1895 (2016)
Article Google Scholar
Yan, F., Mikolajczyk, K.: Deep correlation for matching images and text. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3441–3450 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, 200240, China
Hongxiang Cai, Ya Zhang & Yanfeng Wang
National Engineering Research Center of Digital Television, Shanghai, China
Xie Wang
China Central Television, Beijing, China
Jianping Mei & Zhuowei Huang

Authors

Hongxiang Cai
View author publications
You can also search for this author in PubMed Google Scholar
Ya Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yanfeng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xie Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jianping Mei
View author publications
You can also search for this author in PubMed Google Scholar
Zhuowei Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongxiang Cai .

Editor information

Editors and Affiliations

Shanghai Jiao Tong University , Shanghai, China
Guangtao Zhai
Shanghai Jiao Tong University , Shanghai, China
Jun Zhou
Jiao Tong University , Shanghai, China
Xiaokang Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cai, H., Zhang, Y., Wang, Y., Wang, X., Mei, J., Huang, Z. (2018). Predicting Relative Popularity via an End-to-End Multi-modality Model. In: Zhai, G., Zhou, J., Yang, X. (eds) Digital TV and Wireless Multimedia Communication. IFTC 2017. Communications in Computer and Information Science, vol 815. Springer, Singapore. https://doi.org/10.1007/978-981-10-8108-8_32

Download citation

DOI: https://doi.org/10.1007/978-981-10-8108-8_32
Published: 03 February 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8107-1
Online ISBN: 978-981-10-8108-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics