Abstract
Community detection is an unsupervised learning task that discovers groups such that group members share more similarities or interact more frequently among themselves than with people outside groups. In social media, link information can reveal heterogeneous relationships of various strengths, but often can be noisy. Since different sources of data in social media can provide complementary information, e.g., bookmarking and tagging data indicates user interests, frequency of commenting suggests the strength of ties, etc., we propose to integrate social media data of multiple types for improving the performance of community detection. We present a joint optimization framework to integrate multiple data sources for community detection. Empirical evaluation on both synthetic data and real-world social media data shows significant performance improvement of the proposed approach. This work elaborates the need for and challenges of multi-source integration of heterogeneous data types, and provides a principled way of multi-source community detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Backstrom, L., Huttenlocher, D., Kleinberg, J., Lan, X.: Group formation in large social networks: membership, growth, and evolution. In: KDD, pp. 44–54. ACM (2006)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)
Chaudhuri, K., Kakade, S.M., Livescu, K., Sridharan, K.: Multi-view clustering via canonical correlation analysis. In: ICML (2009)
Erosheva, E., Fienberg, S., Lafferty, J.: Mixed-membership models of scientific publications. Proceedings of the National Academy of Sciences of the United States of America 101(suppl. 1), 5220 (2004)
Evans, T., Lambiotte, R.: Line graphs, link partitions, and overlapping communities. Physical Review E 80(1), 16105 (2009)
Lin, Y.-R., Sun, J., Castro, P., Konuru, R., Sundaram, H., Kelliher, A.: Metafac: community discovery via relational hypergraph factorization. In: KDD, pp. 527–536. ACM (2009)
Liu, Y., Niculescu-Mizil, A., Gryc, W.: Topic-link lda: Joint models of topic and author community. In: ICML 2009 (2009)
Luxburg, U.: A tutorial on spectral clustering. Statistics and Computing 17(4), 395–416 (2007)
McPherson, M., Lovin, L.S., Cook, J.M.: Birds of a feather: Homophily in social networks. Annual Review of Sociology 27(1), 415–444 (2001)
Newman, M.E.: Finding community structure in networks using the eigenvectors of matrices. Physical Review E 74(3), 36104 (2006)
Newman, M.E., Leicht, E.: Mixture models and exploratory analysis in networks. Proceedings of the National Academy of Sciences 104(23), 9564 (2007)
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Physical Review E 69(2), 26113 (2004)
Palla, G., Dernyi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043), 814–818 (2005)
Scellato, S., Mascolo, C., Musolesi, M., Latora, V.: Distance matters: Geo-social metrics for online social networks. In: WOSN 2010 (2010)
Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3, 583–617 (2002)
Tang, J., Gao, H., Liu, H.: mtrust: discerning multi-faceted trust in a connected world. In: WSDM, pp. 93–102. ACM (2012)
Tang, J., Liu, H.: Feature selection with linked data in social media. In: SDM (2012)
Tang, J., Liu, H.: Unsupervised feature selection for linked social media data. In: KDD (2012)
Tang, L., Liu, H.: Scalable learning of collective behavior based on sparse social dimensions. In: CIKM, pp. 1107–1116. ACM (2009)
Tang, L., Wang, X., Liu, H.: Uncovering groups via heterogeneous interaction analysis. In: ICDM, Miami, FL, USA, December 6-9 (2009)
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B 58(1), 267–288 (1996)
Wang, X., Tang, L., Gao, H., Liu, H.: Discovering overlapping groups in social media. In: ICDM, Sydney, Australia, December 14 - 17 (2010)
White, S., Smyth, P.: A spectral clustering approach to finding communities in graphs. In: SDM, p. 274. Society for Industrial Mathematics (2005)
Xiang, R., Neville, J., Rogati, M.: Modeling relationship strength in online social networks. In: WWW, pp. 981–990. ACM (2010)
Yang, T., Jin, R., Chi, Y., Zhu, S.: Combining link and content for community detection: a discriminative approach. In: KDD, pp. 927–936. ACM (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tang, J., Wang, X., Liu, H. (2012). Integrating Social Media Data for Community Detection. In: Atzmueller, M., Chin, A., Helic, D., Hotho, A. (eds) Modeling and Mining Ubiquitous Social Media. MUSE MSM 2011 2011. Lecture Notes in Computer Science(), vol 7472. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33684-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-33684-3_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33683-6
Online ISBN: 978-3-642-33684-3
eBook Packages: Computer ScienceComputer Science (R0)