Abstract
We attack the task of predicting which news-stories are more appealing to a given audience by comparing ‘most popular stories’, gathered from various online news outlets, over a period of seven months, with stories that did not become popular despite appearing on the same page at the same time. We cast this as a learning-to-rank task, and train two different learning algorithms to reproduce the preferences of the readers, within each of the outlets. The first method is based on Support Vector Machines, the second on the Lasso. By just using words as features, SVM ranking can reach significant accuracy in correctly predicting the preference of readers for a given pair of articles. Furthermore, by exploiting the sparsity of the solutions found by the Lasso, we can also generate lists of keywords that are expected to trigger the attention of the outlets’ readers.
Keywords
Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Gans, H.J.: Deciding What’s News: A Study of CBS Evening News, NBC Nightly News, Newsweek, and Time, 25th anniversary edition edn. Northwestern University Press (2004)
Wu, F., Huberman, B.A.: Popularity, novelty and attention. In: Proceedings 9th ACM Conference on Electronic Commerce (EC 2008), pp. 240–245 (2008)
Szabó, G., Huberman, B.A.: Predicting the popularity of online content. CoRR abs/0811.0405 (2008)
Ghose, A., Yang, S.: An empirical analysis of search engine advertising: Sponsored search in electronic markets. Management Science 55(10), 1605–1622 (2009)
Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 133–142 (2002)
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B (Methodological) 58(1), 267–288 (1996)
Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Conference on Computational Learning Theory (COLT), pp. 144–152 (1992)
Cristianini, N., Shawe-Taylor, J.: An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge (2000)
Flaounas, I.N., Turchi, M., Bie, T.D., Cristianini, N.: Inference and validation of networks. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) Machine Learning and Knowledge Discovery in Databases. LNCS, vol. 5781, pp. 344–358. Springer, Heidelberg (2009)
Porter, M.: An algorithm for suffix stripping. Program 14, 130–137 (1980)
Liu, B.: Web Data Mining, Exploring Hyperlinks, Contents, and Usage Data. Springer, Heidelberg (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 IFIP
About this paper
Cite this paper
Hensinger, E., Flaounas, I., Cristianini, N. (2010). Learning the Preferences of News Readers with SVM and Lasso Ranking. In: Papadopoulos, H., Andreou, A.S., Bramer, M. (eds) Artificial Intelligence Applications and Innovations. AIAI 2010. IFIP Advances in Information and Communication Technology, vol 339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16239-8_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-16239-8_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16238-1
Online ISBN: 978-3-642-16239-8
eBook Packages: Computer ScienceComputer Science (R0)