Resampling Approaches to Improve News Importance Prediction
- 3 Citations
- 1.2k Downloads
Abstract
The methods used to produce news rankings by recommender systems are not public and it is unclear if they reflect the real importance assigned by readers. We address the task of trying to forecast the number of times a news item will be tweeted, as a proxy for the importance assigned by its readers. We focus on methods for accurately forecasting which news will have a high number of tweets as these are the key for accurate recommendations. This type of news is rare and this creates difficulties to standard prediction methods. Recent research has shown that most models will fail on tasks where the goal is accuracy on a small sub-set of rare values of the target variable. In order to overcome this, resampling approaches with several methods for handling imbalanced regression tasks were tested in our domain. This paper describes and discusses the results of these experimental comparisons.
Keywords
Target Variable Minority Class Multivariate Adaptive Regression Spline News Item News EventPreview
Unable to display preview. Download preview PDF.
References
- 1.Asur, S., Huberman, B.A.: Predicting the future with social media. In: Proc. of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2010, vol. 1, pp. 492–499. IEEE Computer Society (2010)Google Scholar
- 2.Bandari, R., Asur, S., Huberman, B.A.: The pulse of news in social media: Forecasting popularity. CoRR (2012)Google Scholar
- 3.Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. JAIR 16, 321–357 (2002)Google Scholar
- 4.Curtiss, M., Bharat, K., Schmitt, M.: Systems and methods for improving the ranking of news articles. US Patent App. 10/662,931 (March 17, 2005)Google Scholar
- 5.Feinerer, I., Hornik, K., Meyer, D.: Text mining infrastructure in r. Journal of Statistical Software 5(25), 1–54 (2008)Google Scholar
- 6.Filloux, F., Gassee, J.: Google news: The secret sauce. Monday Note (2013), URL, http://www.mondaynote.com/2013/02/24/google-news-the-secret-sauce/
- 7.Gupta, M., Gao, J., Zhai, C., Han, J.: Predicting future popularity trend of events in microblogging platforms. In: ASIS&T 75th Annual Meeting (2012)Google Scholar
- 8.Hong, L., Dom, B., Gurumurthy, S., Tsioutsiouliklis, K.: A time-dependent topic model for multiple text streams. In: Proc. of the 17th ACM SIGKDD, KDD 2011. pp. 832–840. ACM (2011)Google Scholar
- 9.Hsieh, C., Moghbel, C., Fang, J., Cho, J.: Experts vs. the crowd: examining popular news prediction performance on twitter. In: Proc. of ACM KDD Conference (2013)Google Scholar
- 10.Hu, M., Liu, B.: Mining opinion features in customer reviews. In: Proc. of the 19th National Conference on Artificial Intelligence, AAAI 2004 (2004)Google Scholar
- 11.Kim, S., Kim, S., Cho, H.: Predicting the virtual temperature of web-blog articles as a measurement tool for online popularity. In: Proc. of the 2011 IEEE 11th International Conference on Computer and Information Technology, CIT 2011, pp. 449–454. IEEE Computer Society (2011)Google Scholar
- 12.Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: One-sided selection. In: Proc. of the 14th Int. Conf. on Machine Learning, Nashville, TN, USA, pp. 179–186. Morgan Kaufmann (1997)Google Scholar
- 13.Lee, J.G., Moon, S., Salamatian, K.: An approach to model and predict the popularity of online contents with explanatory factors. In: Proc. of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2010, vol. 1, pp. 623–630. IEEE Computer Society (2010)Google Scholar
- 14.Lerman, K., Hogg, T.: Using a model of social dynamics to predict popularity of news. In: Proc. of the 19th International Conference on World Wide Web, WWW 2010, pp. 621–630. ACM (2010)Google Scholar
- 15.Leskovec, J., Backstrom, L., Kleinberg, J.: Meme-tracking and the dynamics of the news cycle. In: Proc. of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, pp. 497–506. ACM (2009)Google Scholar
- 16.Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University (1998)Google Scholar
- 17.Ribeiro, R.: Utility-based Regression. PhD thesis, Dep. Computer Science, Faculty of Sciences - University of Porto (2011)Google Scholar
- 18.Rinker, T.W.: qdap: Quantitative Discourse Analysis Package. University at Buffalo/SUNY (2013)Google Scholar
- 19.Szabo, G., Huberman, B.A.: Predicting the popularity of online content. Commun. ACM 53(8), 80–88 (2010)CrossRefGoogle Scholar
- 20.Tatar, A., Leguay, J., Antoniadis, P., Limbourg, A., Amorim, M.D.d., Fdida, S.: Predicting the popularity of online articles based on user comments. In: Proc. of the International Conference on Web Intelligence, Mining and Semantics, WIMS 2011, pp. 67:1–67:8. ACM (2011)Google Scholar
- 21.Torgo, L.: An Infra-Structure for Performance Estimation and Experimental Comparison of Predictive Models (2013), https://github.com/ltorgo/performanceEstimation
- 22.Torgo, L., Ribeiro, R.: Utility-based regression. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 597–604. Springer, Heidelberg (2007)Google Scholar
- 23.Torgo, L., Ribeiro, R.P., Pfahringer, B., Branco, P.: SMOTE for regression. In: Correia, L., Reis, L.P., Cascalho, J. (eds.) EPIA 2013. LNCS, vol. 8154, pp. 378–389. Springer, Heidelberg (2013)CrossRefGoogle Scholar
- 24.Torgo, L., Branco, P., Ribeiro, R., Pfahringer, B.: Re-sampling strategies for regression. Expert Systems (to appear, 2014)Google Scholar
- 25.Zaman, T., Fox, E.B., Bradlow, E.T.: A Bayesian Approach for Predicting the Popularity of Tweets. Technical Report arXiv:1304.6777 (April 2013)Google Scholar