Skip to main content

Learning Latent Topics from the Word Co-occurrence Network

  • Conference paper
  • First Online:
Theoretical Computer Science (NCTCS 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 768))

Included in the following conference series:

Abstract

Topic modeling is widely used to uncover the latent thematic structure in corpora. Based on the separability assumption, the spectral method focuses on the word co-occurrence patterns at the document-level and it includes two steps: anchor selection and topic recovery. Biterm Topic Model (BTM) utilizes the word co-occurrence patterns in the whole corpus. Inspired by the word-pair pattern in BTM, we build a Word Co-occurrence Network (WCN) where nodes correspond to words and weights of edges stand for the empirical co-occurrence probability of word pairs. We exploit existing methods to deal with the word co-occurrence network for anchor selection. We find a K-clique in the unweighted complementary graph, or the maximum edge-weight clique in the weighted complementary graph for the anchor word selection. Experiments on real-world corpora evaluated on topic quality and interpretability demonstrate the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://pypi.python.org/pypi/lda.

References

  1. Alidaee, B., Glover, F., Kochenberger, G., Wang, H.: Solving the maximum edge weight clique problem via unconstrained quadratic programming. European Journal of Operational Research 181(2), 592–597 (2007)

    Article  MATH  Google Scholar 

  2. Arora, S., Ge, R., Halpern, Y., Mimno, D., Moitra, A., Sontag, D., Wu, Y., Zhu, M.: A practical algorithm for topic modeling with provable guarantees. In: ICML, pp. 280–288 (2013)

    Google Scholar 

  3. Arora, S., Ge, R., Moitra, A.: Learning topic models-going beyond SVD. In: FOCS, pp. 1–10. IEEE, (2012)

    Google Scholar 

  4. Bhadury, A., Chen, J., Zhu, J., Liu, S.: Scaling up dynamic topic models. In: WWW, pp. 381–390 (2016)

    Google Scholar 

  5. Blei, D.M., Lafferty, J.D.: A correlated topic model of science. In: The Annals of Applied Statistics, pp. 17–35 (2007)

    Google Scholar 

  6. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    Google Scholar 

  7. Chen, J., Zhu, J., Wang, Z., Zheng, X., Zhang, B.: Scalable inference for logistic-normal topic models. In: NIPS, pp. 2445–2453 (2013)

    Google Scholar 

  8. Foulds, J., Boyles, L., DuBois, C., Smyth, P., Welling, M.: Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation. In: KDD, pp. 446–454. ACM, (2013)

    Google Scholar 

  9. Gillis, N.: Robustness analysis of Hottopixx, a linear programming model for factoring nonnegative matrices. SIAM Journal on Matrix Analysis and Applications 34(3), 1189–1212 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  10. Gillis, N., Vavasis, S.A.: Fast and robust recursive algorithmsfor separable nonnegative matrix factorization. IEEE transactions on pattern analysis and machine intelligence 36(4), 698–714 (2014)

    Article  MATH  Google Scholar 

  11. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National academy of Sciences 101(Suppl. 1), 5228–5235 (2004)

    Article  Google Scholar 

  12. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international conference on Research and development in information retrieval, pp. 50–57. ACM, (1999)

    Google Scholar 

  13. Jiang, D., Leung, K.W.T., Ng, W.: Fast topic discovery from web search streams. In: WWW, pp. 949–960. ACM, (2014)

    Google Scholar 

  14. Jo, Y., Hopcroft, J.E., Lagoze, C.The web of topics: discovering the topology of topic evolution in a corpus. InWWW, pp. 257–266. ACM, (2011)

    Google Scholar 

  15. Kataria, S., Agarwal, A.: Supervised Topic Models for Microblog Classification. In: ICDM, pp. 793–798. IEEE, (2015)

    Google Scholar 

  16. Li, A.Q., Ahmed, A., Ravi, S., Smola, A.J.: Reducing the sampling complexity of topic models. In: KDD, pp. 891–900. ACM, (2014)

    Google Scholar 

  17. Li, C., Wang, H., Zhang, Z., Sun, A., Ma, Z.: Topic modeling for short texts with auxiliary word embeddings. In: Proceedings of the 39th International conference on Research and Development in Information Retrieval, pp. 165–174. ACM, (2016)

    Google Scholar 

  18. Lin, T., Tian, W., Mei, Q., Cheng, H.: The dual-sparse topic model: mining focused topics and focused terms in short text. In: WWW, pp. 539–550. ACM, (2014)

    Google Scholar 

  19. Liu, X., Zeng, J., Yang, X., Yan, J., Yang, Q.: Scalable parallel EM algorithms for latent Dirichlet allocation in multi-core systems. In: WWW, pp. 669–679. (2015)

    Google Scholar 

  20. Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: EMNLP, pp. 262–272. ACL, (2011)

    Google Scholar 

  21. Nguyen, T., Hu, Y., Boyd-Graber, J.L.: Anchors Regularized: Adding Robustness and Extensibility to Scalable Topic-Modeling Algorithms. In: ACL, pp. 359–369 (2014)

    Google Scholar 

  22. Palla, G., Dernyi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. (2005)

    Google Scholar 

  23. Pullan, W.: Approximating the maximum vertex/edge weighted clique using local search. Journal of Heuristics 14(2), 117–134 (2008)

    Article  MATH  Google Scholar 

  24. Recht, B., Re, C., Tropp, J., Bittorf, V.: Factoring nonnegative matrices with linear programs. In: NIPS, pp. 1214–1222 (2012)

    Google Scholar 

  25. Wang, S., Chen, Z., Fei, G., Liu, B., Emery, S.: Targeted Topic Modeling for Focused Analysis. In: KDD, pp. 1235–1244 (2016)

    Google Scholar 

  26. Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: WWW, pp. 1445–1456. ACM, (2013)

    Google Scholar 

  27. Yan, X., Guo, J., Liu, S., Cheng, X., Wang, Y.: Learning topics in short texts by non-negative matrix factorization on term correlation matrix. In: Proceedings of the 2013 International Conference on Data Mining, pp. 749–757. SIAM, (2013)

    Google Scholar 

  28. Yang, S.H., Kolcz, A., Schlaikjer, A., Gupta, P.: Large-scale high-precision topic modeling on twitter. In: KDD, pp. 1907–1916. ACM, (2014)

    Google Scholar 

  29. Zhang, H., Kim, G., Xing, E.P.: Dynamic topic modeling for monitoring market competition from online text and image data. In: KDD, pp. 1425–1434. ACM, (2015)

    Google Scholar 

  30. Zuo, Y., Wu, J., Zhang, H., Lin, H., Wang, F., Xu, K., Xiong, H.: Topic Modeling of Short Texts: A Pseudo-Document View. In: KDD, pp. 2105–2114. ACM, (2016)

    Google Scholar 

Download references

Acknowledgments

This research work is supported by National Natural Science Foundation of China (61772219, 61472147), US Army Research Office (W911NF-14-1-0477) and Shenzhen Science and Technology Planning Project (JCYJ20170307154749425). We also thank Junru Shao for valuable discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kun He .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Wang, W., Zhou, H., He, K., Hopcroft, J.E. (2017). Learning Latent Topics from the Word Co-occurrence Network. In: Du, D., Li, L., Zhu, E., He, K. (eds) Theoretical Computer Science. NCTCS 2017. Communications in Computer and Information Science, vol 768. Springer, Singapore. https://doi.org/10.1007/978-981-10-6893-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6893-5_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6892-8

  • Online ISBN: 978-981-10-6893-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics