Learning Latent Topics from the Word Co-occurrence Network

Wang, Wu; Zhou, Houquan; He, Kun; Hopcroft, John E.

doi:10.1007/978-981-10-6893-5_2

Wu Wang^13,15,
Houquan Zhou¹³,
Kun He^13,14 &
…
John E. Hopcroft¹⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 768))

Included in the following conference series:

National Conference of Theoretical Computer Science

1246 Accesses
9 Citations

Abstract

Topic modeling is widely used to uncover the latent thematic structure in corpora. Based on the separability assumption, the spectral method focuses on the word co-occurrence patterns at the document-level and it includes two steps: anchor selection and topic recovery. Biterm Topic Model (BTM) utilizes the word co-occurrence patterns in the whole corpus. Inspired by the word-pair pattern in BTM, we build a Word Co-occurrence Network (WCN) where nodes correspond to words and weights of edges stand for the empirical co-occurrence probability of word pairs. We exploit existing methods to deal with the word co-occurrence network for anchor selection. We find a K-clique in the unweighted complementary graph, or the maximum edge-weight clique in the weighted complementary graph for the anchor word selection. Experiments on real-world corpora evaluated on topic quality and interpretability demonstrate the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://pypi.python.org/pypi/lda.

References

Alidaee, B., Glover, F., Kochenberger, G., Wang, H.: Solving the maximum edge weight clique problem via unconstrained quadratic programming. European Journal of Operational Research 181(2), 592–597 (2007)
Article MATH Google Scholar
Arora, S., Ge, R., Halpern, Y., Mimno, D., Moitra, A., Sontag, D., Wu, Y., Zhu, M.: A practical algorithm for topic modeling with provable guarantees. In: ICML, pp. 280–288 (2013)
Google Scholar
Arora, S., Ge, R., Moitra, A.: Learning topic models-going beyond SVD. In: FOCS, pp. 1–10. IEEE, (2012)
Google Scholar
Bhadury, A., Chen, J., Zhu, J., Liu, S.: Scaling up dynamic topic models. In: WWW, pp. 381–390 (2016)
Google Scholar
Blei, D.M., Lafferty, J.D.: A correlated topic model of science. In: The Annals of Applied Statistics, pp. 17–35 (2007)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Google Scholar
Chen, J., Zhu, J., Wang, Z., Zheng, X., Zhang, B.: Scalable inference for logistic-normal topic models. In: NIPS, pp. 2445–2453 (2013)
Google Scholar
Foulds, J., Boyles, L., DuBois, C., Smyth, P., Welling, M.: Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation. In: KDD, pp. 446–454. ACM, (2013)
Google Scholar
Gillis, N.: Robustness analysis of Hottopixx, a linear programming model for factoring nonnegative matrices. SIAM Journal on Matrix Analysis and Applications 34(3), 1189–1212 (2013)
Article MathSciNet MATH Google Scholar
Gillis, N., Vavasis, S.A.: Fast and robust recursive algorithmsfor separable nonnegative matrix factorization. IEEE transactions on pattern analysis and machine intelligence 36(4), 698–714 (2014)
Article MATH Google Scholar
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National academy of Sciences 101(Suppl. 1), 5228–5235 (2004)
Article Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international conference on Research and development in information retrieval, pp. 50–57. ACM, (1999)
Google Scholar
Jiang, D., Leung, K.W.T., Ng, W.: Fast topic discovery from web search streams. In: WWW, pp. 949–960. ACM, (2014)
Google Scholar
Jo, Y., Hopcroft, J.E., Lagoze, C.The web of topics: discovering the topology of topic evolution in a corpus. InWWW, pp. 257–266. ACM, (2011)
Google Scholar
Kataria, S., Agarwal, A.: Supervised Topic Models for Microblog Classification. In: ICDM, pp. 793–798. IEEE, (2015)
Google Scholar
Li, A.Q., Ahmed, A., Ravi, S., Smola, A.J.: Reducing the sampling complexity of topic models. In: KDD, pp. 891–900. ACM, (2014)
Google Scholar
Li, C., Wang, H., Zhang, Z., Sun, A., Ma, Z.: Topic modeling for short texts with auxiliary word embeddings. In: Proceedings of the 39th International conference on Research and Development in Information Retrieval, pp. 165–174. ACM, (2016)
Google Scholar
Lin, T., Tian, W., Mei, Q., Cheng, H.: The dual-sparse topic model: mining focused topics and focused terms in short text. In: WWW, pp. 539–550. ACM, (2014)
Google Scholar
Liu, X., Zeng, J., Yang, X., Yan, J., Yang, Q.: Scalable parallel EM algorithms for latent Dirichlet allocation in multi-core systems. In: WWW, pp. 669–679. (2015)
Google Scholar
Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: EMNLP, pp. 262–272. ACL, (2011)
Google Scholar
Nguyen, T., Hu, Y., Boyd-Graber, J.L.: Anchors Regularized: Adding Robustness and Extensibility to Scalable Topic-Modeling Algorithms. In: ACL, pp. 359–369 (2014)
Google Scholar
Palla, G., Dernyi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. (2005)
Google Scholar
Pullan, W.: Approximating the maximum vertex/edge weighted clique using local search. Journal of Heuristics 14(2), 117–134 (2008)
Article MATH Google Scholar
Recht, B., Re, C., Tropp, J., Bittorf, V.: Factoring nonnegative matrices with linear programs. In: NIPS, pp. 1214–1222 (2012)
Google Scholar
Wang, S., Chen, Z., Fei, G., Liu, B., Emery, S.: Targeted Topic Modeling for Focused Analysis. In: KDD, pp. 1235–1244 (2016)
Google Scholar
Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: WWW, pp. 1445–1456. ACM, (2013)
Google Scholar
Yan, X., Guo, J., Liu, S., Cheng, X., Wang, Y.: Learning topics in short texts by non-negative matrix factorization on term correlation matrix. In: Proceedings of the 2013 International Conference on Data Mining, pp. 749–757. SIAM, (2013)
Google Scholar
Yang, S.H., Kolcz, A., Schlaikjer, A., Gupta, P.: Large-scale high-precision topic modeling on twitter. In: KDD, pp. 1907–1916. ACM, (2014)
Google Scholar
Zhang, H., Kim, G., Xing, E.P.: Dynamic topic modeling for monitoring market competition from online text and image data. In: KDD, pp. 1425–1434. ACM, (2015)
Google Scholar
Zuo, Y., Wu, J., Zhang, H., Lin, H., Wang, F., Xu, K., Xiong, H.: Topic Modeling of Short Texts: A Pseudo-Document View. In: KDD, pp. 2105–2114. ACM, (2016)
Google Scholar

Download references

Acknowledgments

This research work is supported by National Natural Science Foundation of China (61772219, 61472147), US Army Research Office (W911NF-14-1-0477) and Shenzhen Science and Technology Planning Project (JCYJ20170307154749425). We also thank Junru Shao for valuable discussions.

Author information

Authors and Affiliations

Huazhong University of Science and Technology, Wuhan, 430074, China
Wu Wang, Houquan Zhou & Kun He
Computer Science Department, Cornell University, Ithaca, NY, 14853, USA
Kun He & John E. Hopcroft
Shenzhen Research Institute of Huazhong University of Science and Technology, Shenzhen, 518057, China
Wu Wang

Authors

Wu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Houquan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Kun He
View author publications
You can also search for this author in PubMed Google Scholar
John E. Hopcroft
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kun He .

Editor information

Editors and Affiliations

The University of Texas at Dallas, Richardson, Texas, USA
Dingzhu Du
Hefei University of Technology, Hefei, Anhui Province, China
Lian Li
National University of Defense Technology, Changsha, Hunan, China
En Zhu
Huazhong University of Science and Technology, Wuhan, Hubei, China
Kun He

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, W., Zhou, H., He, K., Hopcroft, J.E. (2017). Learning Latent Topics from the Word Co-occurrence Network. In: Du, D., Li, L., Zhu, E., He, K. (eds) Theoretical Computer Science. NCTCS 2017. Communications in Computer and Information Science, vol 768. Springer, Singapore. https://doi.org/10.1007/978-981-10-6893-5_2

Download citation

DOI: https://doi.org/10.1007/978-981-10-6893-5_2
Published: 14 October 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6892-8
Online ISBN: 978-981-10-6893-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics