Modeling Community Structure and Topics in Dynamic Text Networks

Abstract

The last decade has seen great progress in both dynamic network modeling and topic modeling. This paper draws upon both areas to create a bespoke Bayesian model applied to a dataset consisting of the top 467 US political blogs in 2012, their posts over the year, and their links to one another. Our model allows dynamic topic discovery to inform the latent network model and the network structure to facilitate topic identification. Our results find complex community structure within this set of blogs, where community membership depends strongly upon the set of topics in which the blogger is interested. We examine the time varying nature of the Sensational Crime topic, as well as the network properties of the Election News topic, as notable and easily interpretable empirical examples.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Notes

  1. 1.

    Gingrich, Santorum, and Cain all refer to candidates in the 2012 Republican presidential primary.

  2. 2.

    George Zimmerman shot and killed Trayvon Martin in March of 2012.

  3. 3.

    Newt Gingrich gradually faded to political irrelevance after a failed presidential primary run.

  4. 4.

    Trayvon Martin was a young African American man shot by George Zimmerman, in what he claimed to be an act of self defense, while Martin was walking in Zimmerman’s neighborhood. The Aurora theater massacre was a mass shooting at a movie theater in Aurora, Colorado. The Sikh Temple shooting was a mass shooting at a Sikh temple in Wisconsin. The Sandy Hook massacre was a mass shooting at an elementary school in Connecticut.

References

  1. Airoldi, E.M., Blei, D.M., Fienberg, S.E., Xing, E.P. (2008). Mixed Membership Stochastic Blockmodels. Journal of Machine Learning Research, 9(2008), 1981–2014.

    MATH  Google Scholar 

  2. Arun, R., Suresh, V., Madhavan, C.V., Murthy, M.N. (2010). On finding the natural number of topics with latent dirichlet allocation: Some observations. In Advances in knowledge discovery and data mining (pp. 391–402). Springer.

    Google Scholar 

  3. Blei, D., Ng, A., Jordan, M. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993–1022.

    MATH  Google Scholar 

  4. Blei, D.M., & Lafferty, J.D. (2006). Dynamic topic models. In Proceedings of the 23rd international conference on machine learning (pp. 113–120). ACM.

  5. Brown, P.F., Desouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C. (1992). Class-based n-gram models of natural language. Computational linguistics, 18 (4), 467–479.

    Google Scholar 

  6. Chang, J., & Blei, D.M. (2009). Relational topic models for document networks. In International conference on artificial intelligence and statistics (pp. 81–88).

  7. Faust, K., & Wasserman, S. (1992). Blockmodels: interpretation and evaluation. Social networks, 14(1), 5–61.

    Article  Google Scholar 

  8. Frank, O., & Strauss, D. (1986). Markov graphs. Journal of the American Statistical Association, 81(395), 832–842.

    MathSciNet  Article  Google Scholar 

  9. Gilks, W.R., Best, N., Tan, K. (1995). Adaptive rejection metropolis sampling within gibbs sampling. Applied Statistics, 44, 455–472.

    Article  Google Scholar 

  10. Ho, Q., Eisenstein, J., Xing, E.P. (2012). Document hierarchies from text and links. In Proceedings of the 21st international conference on World Wide Web (pp. 739–748). ACM.

  11. Hoff, P.D., Raftery, A.E., Handcock, M.S. (2002). Latent space approaches to social network analysis. Journal of the American Statistical Association, 97(460), 1090–1098.

    MathSciNet  Article  Google Scholar 

  12. Hoffman, M., Bach, F.R., Blei, D.M. (2010). Online learning for latent dirichlet allocation. In Advances in neural information processing systems (pp. 856–864).

  13. Holland, P.W., & Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs. Journal of the american Statistical association, 76 (373), 33–50.

    MathSciNet  Article  Google Scholar 

  14. Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.

    Article  Google Scholar 

  15. Hunter, D.R., Goodreau, S.M., Handcock, M.S. (2008). Goodness of Fit of Social Network Models. Journal of the American Statistical Association, 103(481), 248–258.

    MathSciNet  Article  Google Scholar 

  16. Krivitsky, P.N., & Handcock, M.S. (2014). A separable model for dynamic networks. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(1), 29–46.

    MathSciNet  Article  Google Scholar 

  17. Latouche, P., Birmelé, E., Ambroise, C. (2011). Overlapping stochastic block models with application to the French political blogosphere. Annals of Applied Statistics, 5(1), 309–336.

    MathSciNet  Article  Google Scholar 

  18. Lawrence, E., Sides, J., Farrell, H. (2010). Self-segregation or deliberation? blog readership, participation, and polarization in american politics. Perspectives on Politics, 8(01), 141.

    Article  Google Scholar 

  19. McNamee, P., & Mayfield, J. (2003). Jhu/apl experiments in tokenization and non-word translation. In Comparative evaluation of multilingual information access systems (pp. 85–97). Springer.

  20. Moody, J. (2004). The structure of a social science collaboration network: disciplinary cohesion from 1963 to 1999. American Sociological Review, 69(2), 213–238.

    Article  Google Scholar 

  21. Newman, M.E.J., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 69 (22), 026113.

    Article  Google Scholar 

  22. Pons, P., & Latapy, M. (2006). Computing communities in large networks using random walks. J. Graph Algorithms Appl., 10(2), 191.

    MathSciNet  Article  Google Scholar 

  23. Ramos, J. (2003). Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning.

  24. Robins, G., Elliott, P., Pattison, P. (2001). Network models for social selection processes. Social Networks, 23(1), 1–30.

    Article  Google Scholar 

  25. Snijders, T.A., & Nowicki, K. (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of Classification, 14(1), 75–100.

    MathSciNet  Article  Google Scholar 

  26. Steinley, D. (2004). Properties of the hubert-arable adjusted rand index. Psychological Methods, 9(3), 386.

    Article  Google Scholar 

  27. Technorati. (2002). https://web.archive.org/web/20140420052710/http://technorati.com/.

  28. Wang, E., Silva, J., Willett, R., Carin, L. (2011). Dynamic relational topic model for social network analysis with noisy links. In 2011 IEEE, statistical signal processing workshop (SSP) (pp. 497–500). IEEE.

  29. Wasserman, S., & Pattison, P. (1996). Logit models and logistic regressions for social networks: I. An introduction to Markov graphs and p. Psychometrika, 61(3), 401–425.

    MathSciNet  Article  Google Scholar 

  30. Yin, J., & Wang, J. (2014). A dirichlet multinomial mixture model-based approach for short text clustering. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 233–242). ACM.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Teague R. Henry.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Henry, T.R., Banks, D., Owens-Oas, D. et al. Modeling Community Structure and Topics in Dynamic Text Networks. J Classif 36, 322–349 (2019). https://doi.org/10.1007/s00357-018-9289-3

Download citation

Keywords

  • Networks
  • Natural language processing
  • Topic modeling
  • Political blogs
  • Community detection