Finding Co-occurring Topics in Wikipedia Article Segments

  • Renzhi Wang
  • Jianmin Wu
  • Mizuho Iwaihara
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8839)


Wikipedia is the largest online encyclopedia, in which articles form knowledgeable and semantic resources. Identical topics in different articles indicate that the articles are related to each other about topics. Finding such co-occurring topics is useful to improve the accuracy of querying and clustering, and also to contrast related articles. Existing topic alignment work and topic relevance detection are based on term occurrence. In our research, we discuss incorporating latent topics existing in article segments by utilizing Latent Dirichlet Allocation (LDA), to detect topic relevance. We also study how segment proximities, arising from segment ordering and hyperlinks, shall be incorporated into topic detection and alignment. Experimental data show our method can find and distinguish three types of co-occurrence.


LDA MLE Link Wikipedia 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Blei, D.M., Ng, A.Y., Jordan, M.J.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHGoogle Scholar
  2. 2.
    Blei, D.M., Moreno, P.J.: Topic segmentation with an aspect hidden markov model. In: Proceedings of SIGIR (2001)Google Scholar
  3. 3.
    Lavrenko, V., Croft, W.B.: Relevance-based language models. In: SIGIR 2001, pp. 120–127 (2001)Google Scholar
  4. 4.
    Liu, X., Croft, W.B.: Cluster-based retrieval using language models. In: Proc. 27th International ACM SIGIRConf. Research and Development Information Retrieval, pp. 186–193 (2004)Google Scholar
  5. 5.
    Xing, W., Croft, W.B.: LDA-Based Document Models for Ad-hoc Retrieval. In: Proc. 29thACM SIGIR Conf., pp. 178–185 (2006)Google Scholar
  6. 6.
    Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proc. 24th ACM SIGIR 2001, pp. 334–34 (2001)Google Scholar
  7. 7.
    Evgeniy, G., Shaul, M.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proc. IJCAI 2007 Proceedings of the 20th International Joint Conference on Artifical Intelligence, San Francisco, pp. 1606–1611 (2007)Google Scholar
  8. 8.
    David, M., Ian, H.W.: An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links. In: Proc. AAAI Workshop on Wikipedia and Artificial Intelligence: an Evolving Synergy, Chicago, pp. 25–30 (2008)Google Scholar
  9. 9.

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Renzhi Wang
    • 1
  • Jianmin Wu
    • 1
  • Mizuho Iwaihara
    • 1
  1. 1.Graduate School of Information, Production and SystemsWaseda UniversityFukuokaJapan

Personalised recommendations