Modeling of Information Diffusion in Sina Weibo Based on Random Forest Classifier and SIR Model
Recent developments in information diffusion model for social network have not taken into account its topological structures. Characteristics such as the degree of connections and clustering of nodes in a network are known to influence the speed of information propagation. Yet, existing models (such as SIR with an average probability to repost received message) are not sophisticate enough to reflect the fine-grain characteristics. Differences among nodes are often overlooked, leading to inaccurate description of the information dissemination process.
In this work, a new approach to predict the information diffusion probability in social network is studied. We combine the Random Forest classification and the SIR model together to analyze the dissemination of information in Weibo. Python crawlers are employed to obtain a total of 316,329 microblogs concerning major news events in 2018, together with related features of nodes from Sina Weibo. The unbalanced positive and negative repost behavior together with 15 features that characterize the nodes and edges data are rebalanced by SMOTE resampling, then used to train a Random Forest classifier to predict individual user’s forwarding behavior. For comparison, we find the performance of the Random Forest classifier, judging from the AUC of receiver operating characteristic (ROC) curve, is higher than a comparable SVM model. Finally, a Susceptible Infected Recovered (SIR) information propagation model with the forwarding rates obtained from the Random Forest classifier as input parameter is used to simulate the information dissemination process of Weibo. The predicted time behaviors of the Susceptible, Infected, and Recovered populations are in good agreement with real-life data obtained from Sina Weibo.
KeywordsSocial network Information diffusion Random Forest classifier Machine learning SIR model SMOTE resampling
This work is supported by internal research grant from Beijing Normal University-Hong Kong Baptist University-United International College (UIC) R201809.
- 2.Guo, Z., Li, Z.: Sina microblog: an information-driven online social network. In: International Conference on Cyberworlds, pp. 160–167 (2011)Google Scholar
- 3.The 36th China Internet network development state statistics report. CONNIC (2015)Google Scholar
- 4.Wang, X., Chen, Z., Liu, P., Gu, Y.: Edge balance ratio: Power law from vertices to edges in directed complex network. IEEE J. Sel. Top. Sig. Process. 7(2), 189–194 (2013)Google Scholar
- 5.Top Weibo accounts of 2018. https://www.whatsonweibo.com/top-weibo-accounts-of-2018-most-popular-celebrities-on-sina-weibo/. Accessed 21 May 2019