Abstract
In the recent years, the number of social network users has increased dramatically. The resulting amount of data associated with users of social networks has created great opportunities for data mining problems. One data mining problem of interest for social networks is the friendship link prediction problem. Intuitively, a friendship link between two users can be predicted based on their common friends and interests. However, using user interests directly can be challenging, given the large number of possible interests. In the past, approaches that make use of an explicit user interest ontology have been proposed to tackle this problem, but the construction of the ontology proved to be computationally expensive and the resulting ontology was not very useful. As an alternative, we propose a topic modeling approach to the problem of predicting new friendships based on interests and existing friendships. Specifically, we use Latent Dirichlet Allocation (LDA) to model user interests and, thus, we create an implicit interest ontology. We construct features for the link prediction problem based on the resulting topic distributions. Experimental results on several LiveJournal data sets of varying sizes show the usefulness of the LDA features for predicting friendships.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Boyd, M.D., Ellison, B.N.: Social Network Sites: Definition, History, and Scholarship. Journal of Computer-Mediated Communication 13 (2007)
comScore Press Release, http://www.comscore.com/Press_Events/Press_Releases/2007/07/Social_Networking_Goes_Globa
TechCrunch Report, http://eu.techcrunch.com/2010/06/08/report-social-networks-overtake-search-engines-in-uk-should-google-be-worried
Fitzpatrick, B.: LiveJournal: Online Service, http://www.livejournal.com
Geetor, L., Lu, Q.: Link-based Classification. In: Twelth International Conference on Machine Learning (ICML 2003), Washington DC (2003)
Na, J.C., Thet, T.T.: Effectiveness of web search results for genre and sentiment classification. Journal of Information Science 35(6), 709–726 (2009)
Castillo, C., Donato, D., Gionis, A., Murdock, V., Silvestri, F.: Know your Neighbors: Web Spam Detection using the web Topology. In: Proceedings of SIGIR 2007, Amsterdam, Netherlands (2007)
Taskar, B., Wong, M., Abbeel, P., Koller, D.: Link Prediction in Relational Data. In: Proc. of 17th Neural Information Processing Systems, NIPS (2003)
Hsu, H.W., Weninger, T., Paradesi, R.S.M., Lancaster, J.: Structural link analysis from user profiles and friends networks: a feature construction approach. In: Proceedings of International Conference on Weblogs and Social Media (ICWSM), Boulder, CO, USA (2007)
Caragea, D., Bahirwani, V., Aljandal, W., Hsu, H.W.: Link Mining: Ontology-Based Link Prediction in the LiveJournal Social Network. In: Proceedings of Association of the Advancement of Artificial Intelligence, pp. 192–196 (2009)
Haridas, M., Caragea, D.: Link Mining: Exploring Wikipedia and DMoz as Knowledge Bases for Engineering a User Interests Hierarchy for Social Network Applications. In: Proceedings of the Confederated International Conferences on On the Move to Meaningful Internet Systems: Part II, Portugal, pp. 1238–1245 (2009)
Steyvers, M., Griffiths, T.: Probabilistic Topic Models. In: Landauer, T., Mcnamara, D., Dennis, S., Kintsch, W. (eds.) Handbook of Latent Semantic Analysis. Lawrence Erlbaum Associates, Mahwah (2007)
Steyvers, M., Griffiths, T., Tenenbaum, J.B.: Topics in Semantic Representation. American Psychological Association 114(2), 211–244 (2007)
Steyvers, M., Griffiths, T.: Finding Scientific Topics. Proceedings of National Academy of Sciences, U.S.A, 5228–5235 (2004)
Blei, D., Ng, Y.A., Jordan, I.M.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Blei, D., Boyd-Graber, J., Zhu, X.: A Topic Model for Word Sense Disambiguation. In: Proc. of the 2007 Joint Conf. on Empirical Methods in Natural Language Processing and Comp. Natural Language Learning, pp. 1024–1033 (2007)
Guo, J., Xu, G., Cheng, X., Li, H.: Named Entity Recognition in Query. In: Proceedings of SIGIR 2009, Boston, USA (2009)
Krestel, R., Fankhauser, P., Nejdl, W.: Latent Dirichlet Allocation for Tag Recommendation. In: Proceedings of RecSys 2009, New York, USA (2009)
Chen, W., Chu, J., Luan, J., Bai, H., Wang, Y., Chang, Y.E.: Collaborative Filtering for Orkut Communities: Discovery of User Latent Behavior. In: Proceedings of International World Wide Web Conference (2009)
McCallam, K.A.: Mallet: A Machine Learning for Language Toolkit (2002), http://mallet.cs.umass.edu
Phanse, S.: Study on the Performance of Ontology Based Approaches to Link Prediction in Social Networks as the Number of Users Increases. M.S. Thesis (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Parimi, R., Caragea, D. (2011). Predicting Friendship Links in Social Networks Using a Topic Modeling Approach. In: Huang, J.Z., Cao, L., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 6635. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20847-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-20847-8_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20846-1
Online ISBN: 978-3-642-20847-8
eBook Packages: Computer ScienceComputer Science (R0)