Skip to main content

Discovering Correlated Entities from News Archives

  • Conference paper
  • 2840 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8181))

Abstract

Most textual documents contain references to real-word entities such as people, locations and organizations. The understanding of their correlations is behind many applications including social relationship construction platform and major search engines, etc. This paper aims to discover entity correlations from news archives by means of the proposed hierarchical Entity Topic Model (hETM). hETM is a semantic-based analysis model which follows the gist of probabilistic topic models and in which a directed acyclic graph (DAG) is leveraged to capture arbitrary topic correlations. Entity extraction is taken as a preprocessing step of our model and we then employ different generative processes for ordinary words and entities. The discovering of entity correlations is achieved via the analysis of the dependencies between entities and their associated topics as well as topic correlations. We evaluate the approach upon BBC news dataset and results demonstrate the higher quality of discovered entity correlations compared with existing methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Elmacioglu, E., Lee, D.: On six degrees of separation in DBLP-DB and more. SIGMOD Record 34(2) (June 2005)

    Google Scholar 

  2. Kleinfeld, J.: Could it be a big world after all? the “six degrees of separation”. Myth. Society (2002)

    Google Scholar 

  3. Blei, D., Ng, A., Jordan, M., Lafferty, J.: Latent dirichlet allocation. Journal of Machine Learning Research 3(993-1022) (2003)

    Google Scholar 

  4. Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. Technical Report, Department of Statistics, UC Berkeley (2004)

    Google Scholar 

  5. Newman, D., Chemudugunta, C., Smyth, P.: Statistical entity-topic models. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2006)

    Google Scholar 

  6. Shiozaki, H., Eguchi, K., Ohkawa, T.: Entity Network Prediction Using Multitype Topic Models. IEICE-Transactions on Information and Systems E91-D(11), 2589–2598 (2008)

    Article  Google Scholar 

  7. Bhattacharya, I., Getoor, L.: A latent dirichlet model for unsupervised entity resolution. In: Sixth SIAM Conference on Data Mining (2006)

    Google Scholar 

  8. Shu, L., Long, B., Meng, W.: A Latent Topic Model for Complete Entity Resolution. In: Proceedings of the 2009 IEEE International Conference on Data Engineering, pp. 880–891 (2009)

    Google Scholar 

  9. Kataria, S.S., Kumar, K.S., Rastogi, R.R., et al.: Entity disambiguation with hierarchical topic models. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2011)

    Google Scholar 

  10. Dai, A.M., Storkey, A.J.: The grouped author-topic model for unsupervised entity resolution. In: Honkela, T. (ed.) ICANN 2011, Part I. LNCS, vol. 6791, pp. 241–249. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  11. Guo, J., Xu, G., Cheng, X., Li, H.: Named entity recognition in query. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (2009)

    Google Scholar 

  12. Du, J., Zhang, Z., Yan, J., et al.: Using search session context for named entity recognition in query. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (2010)

    Google Scholar 

  13. Xu, G., Yang, S.-H., Li, H.: Named entity mining from click-through data using weakly supervised latent dirichlet allocation. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2009)

    Google Scholar 

  14. Blei, D., Lafferty, J.: A correlated topic model of Science. The Annals of Applied Statistics 1(1), 17–35 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  15. Tam, Y.-C., Schultz, T.: Correlated latent semantic model for unsupervised LM adaptation. In: IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, pp. 41–44 (2007)

    Google Scholar 

  16. Li, W., McCallum, A.: Pachinko allocation: DAG-structured mixture models of topic correlations. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 577–584 (2006)

    Google Scholar 

  17. Blei, D., Griffiths, T., Jordan, M., Tenenbaum, J.: Hierarchical topic models and the nested Chinese restaurant process. In: Advances in Neural Information Processing Systems 16. MIT Press, Cambridge (2004)

    Google Scholar 

  18. Mimno, D., Li, W., McCallum, A.: Mixtures of hierarchical topics with Pachinko allocation. In: Proceedings of the 24th International Conference on Machine Learning, pp. 633–640 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yang, L., Li, C., Ding, Q., Li, L. (2013). Discovering Correlated Entities from News Archives. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds) Web Information Systems Engineering – WISE 2013. WISE 2013. Lecture Notes in Computer Science, vol 8181. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41154-0_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41154-0_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41153-3

  • Online ISBN: 978-3-642-41154-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics