Abstract
Topic modeling is widely used to uncover the latent thematic structure in corpora. Based on the separability assumption, the spectral method focuses on the word co-occurrence patterns at the document-level and it includes two steps: anchor selection and topic recovery. Biterm Topic Model (BTM) utilizes the word co-occurrence patterns in the whole corpus. Inspired by the word-pair pattern in BTM, we build a Word Co-occurrence Network (WCN) where nodes correspond to words and weights of edges stand for the empirical co-occurrence probability of word pairs. We exploit existing methods to deal with the word co-occurrence network for anchor selection. We find a K-clique in the unweighted complementary graph, or the maximum edge-weight clique in the weighted complementary graph for the anchor word selection. Experiments on real-world corpora evaluated on topic quality and interpretability demonstrate the effectiveness of the proposed approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Alidaee, B., Glover, F., Kochenberger, G., Wang, H.: Solving the maximum edge weight clique problem via unconstrained quadratic programming. European Journal of Operational Research 181(2), 592–597 (2007)
Arora, S., Ge, R., Halpern, Y., Mimno, D., Moitra, A., Sontag, D., Wu, Y., Zhu, M.: A practical algorithm for topic modeling with provable guarantees. In: ICML, pp. 280–288 (2013)
Arora, S., Ge, R., Moitra, A.: Learning topic models-going beyond SVD. In: FOCS, pp. 1–10. IEEE, (2012)
Bhadury, A., Chen, J., Zhu, J., Liu, S.: Scaling up dynamic topic models. In: WWW, pp. 381–390 (2016)
Blei, D.M., Lafferty, J.D.: A correlated topic model of science. In: The Annals of Applied Statistics, pp. 17–35 (2007)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Chen, J., Zhu, J., Wang, Z., Zheng, X., Zhang, B.: Scalable inference for logistic-normal topic models. In: NIPS, pp. 2445–2453 (2013)
Foulds, J., Boyles, L., DuBois, C., Smyth, P., Welling, M.: Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation. In: KDD, pp. 446–454. ACM, (2013)
Gillis, N.: Robustness analysis of Hottopixx, a linear programming model for factoring nonnegative matrices. SIAM Journal on Matrix Analysis and Applications 34(3), 1189–1212 (2013)
Gillis, N., Vavasis, S.A.: Fast and robust recursive algorithmsfor separable nonnegative matrix factorization. IEEE transactions on pattern analysis and machine intelligence 36(4), 698–714 (2014)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National academy of Sciences 101(Suppl. 1), 5228–5235 (2004)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international conference on Research and development in information retrieval, pp. 50–57. ACM, (1999)
Jiang, D., Leung, K.W.T., Ng, W.: Fast topic discovery from web search streams. In: WWW, pp. 949–960. ACM, (2014)
Jo, Y., Hopcroft, J.E., Lagoze, C.The web of topics: discovering the topology of topic evolution in a corpus. InWWW, pp. 257–266. ACM, (2011)
Kataria, S., Agarwal, A.: Supervised Topic Models for Microblog Classification. In: ICDM, pp. 793–798. IEEE, (2015)
Li, A.Q., Ahmed, A., Ravi, S., Smola, A.J.: Reducing the sampling complexity of topic models. In: KDD, pp. 891–900. ACM, (2014)
Li, C., Wang, H., Zhang, Z., Sun, A., Ma, Z.: Topic modeling for short texts with auxiliary word embeddings. In: Proceedings of the 39th International conference on Research and Development in Information Retrieval, pp. 165–174. ACM, (2016)
Lin, T., Tian, W., Mei, Q., Cheng, H.: The dual-sparse topic model: mining focused topics and focused terms in short text. In: WWW, pp. 539–550. ACM, (2014)
Liu, X., Zeng, J., Yang, X., Yan, J., Yang, Q.: Scalable parallel EM algorithms for latent Dirichlet allocation in multi-core systems. In: WWW, pp. 669–679. (2015)
Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: EMNLP, pp. 262–272. ACL, (2011)
Nguyen, T., Hu, Y., Boyd-Graber, J.L.: Anchors Regularized: Adding Robustness and Extensibility to Scalable Topic-Modeling Algorithms. In: ACL, pp. 359–369 (2014)
Palla, G., Dernyi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. (2005)
Pullan, W.: Approximating the maximum vertex/edge weighted clique using local search. Journal of Heuristics 14(2), 117–134 (2008)
Recht, B., Re, C., Tropp, J., Bittorf, V.: Factoring nonnegative matrices with linear programs. In: NIPS, pp. 1214–1222 (2012)
Wang, S., Chen, Z., Fei, G., Liu, B., Emery, S.: Targeted Topic Modeling for Focused Analysis. In: KDD, pp. 1235–1244 (2016)
Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: WWW, pp. 1445–1456. ACM, (2013)
Yan, X., Guo, J., Liu, S., Cheng, X., Wang, Y.: Learning topics in short texts by non-negative matrix factorization on term correlation matrix. In: Proceedings of the 2013 International Conference on Data Mining, pp. 749–757. SIAM, (2013)
Yang, S.H., Kolcz, A., Schlaikjer, A., Gupta, P.: Large-scale high-precision topic modeling on twitter. In: KDD, pp. 1907–1916. ACM, (2014)
Zhang, H., Kim, G., Xing, E.P.: Dynamic topic modeling for monitoring market competition from online text and image data. In: KDD, pp. 1425–1434. ACM, (2015)
Zuo, Y., Wu, J., Zhang, H., Lin, H., Wang, F., Xu, K., Xiong, H.: Topic Modeling of Short Texts: A Pseudo-Document View. In: KDD, pp. 2105–2114. ACM, (2016)
Acknowledgments
This research work is supported by National Natural Science Foundation of China (61772219, 61472147), US Army Research Office (W911NF-14-1-0477) and Shenzhen Science and Technology Planning Project (JCYJ20170307154749425). We also thank Junru Shao for valuable discussions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, W., Zhou, H., He, K., Hopcroft, J.E. (2017). Learning Latent Topics from the Word Co-occurrence Network. In: Du, D., Li, L., Zhu, E., He, K. (eds) Theoretical Computer Science. NCTCS 2017. Communications in Computer and Information Science, vol 768. Springer, Singapore. https://doi.org/10.1007/978-981-10-6893-5_2
Download citation
DOI: https://doi.org/10.1007/978-981-10-6893-5_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6892-8
Online ISBN: 978-981-10-6893-5
eBook Packages: Computer ScienceComputer Science (R0)