Skip to main content

Web Person Disambiguation Using Hierarchical Co-reference Model

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9041))

Abstract

As one of the entity disambiguation tasks, Web Person Disambiguation (WPD) identifies different persons with the same name by grouping search results for different persons into different clusters. Most of current research works use clustering methods to conduct WPD. These approaches require the tuning of thresholds that are biased towards training data and may not work well for different datasets. In this paper, we propose a novel approach by using pairwise co-reference modeling for WPD without the need to do threshold tuning. Because person names are named entities, disambiguation of person names can use semantic measures using the so called co-reference resolution criterion across different documents. The algorithm first forms a forest with person names as observable leaf nodes. It then stochastically tries to form an entity hierarchy by merging names into a sub-tree as a latent entity group if they have co-referential relationship across documents. As the joining/partition of nodes is based on co-reference-based comparative values, our method is independent of training data, and thus parameter tuning is not required. Experiments show that this semantic based method has achieved comparable performance with the top two state-of-the-art systems without using any training data. The stochastic approach also makes our algorithm to exhibit near linear processing time much more efficient than HAC based clustering method. Because our model allows a small number of upper-level entity nodes to summarize a large number of name mentions, the model has much higher semantic representation power and it is much more scalable over large collections of name mentions compared to HAC based algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Artiles, J., Gonzalo, J., Verdejo, F.: A testbed for people searching strategies in the WWW. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 569–570 (2005)

    Google Scholar 

  2. Artiles, J., Gonzalo, J., Sekine, S.: The semEval-2007 WePS evaluation: establishing a benchmark for the web people search task. In: SemEval 2007, pp. 64–69 (2007)

    Google Scholar 

  3. Artiles, J., Gonzalo, J., Sekine, S.: WePS 2 evaluation campaign: overview of the web people search clustering task. In: 18th WWW Conference on 2nd Web People Search Evaluation Workshop (WePS 2009) (2009)

    Google Scholar 

  4. Balog, K., He, J., Hofmann, K., et al.: The University of Amsterdam at WePS2. In: WePS (2009)

    Google Scholar 

  5. Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. JMLR 3, 993–102 (2003)

    Google Scholar 

  6. Chen, Y., Lee, S.Y.M., Huang, C.: PolyUHK: a robust information extraction system for web personal names. In: WePS (2009)

    Google Scholar 

  7. Chen, Y., Lee, S.Y.M., Huang, C.: A robust web personal name information extraction system. Expert Systems with Applications 39(3), 2690–2699 (2012)

    Article  Google Scholar 

  8. Elmacioglu, E., Tan, Y.F., et al.: PSNUS: web people name disambiguation by simple clustering with rich features. In: SemEval 2007, pp. 268–271 (2007)

    Google Scholar 

  9. Gong, J., Oard, D.: Determine the entity number in hierarchical clustering for web personal name disambiguation. In: WePS (2009)

    Google Scholar 

  10. Gooi, C.H., Allan, J.: Cross-document co-reference on a large scale corpus. In: NAACL 2004 (2004)

    Google Scholar 

  11. Han, X., Zhao, J.: CASIANED: web personal name disambiguation based on professional categorization. In: WePS 2009 (2009)

    Google Scholar 

  12. Ikeda, M., Ono, S., et al.: Person name disambiguation on the web by two stage clustering. In: WePS 2009 (2009)

    Google Scholar 

  13. Ji, H., Lin, D.: Gender and animacy knowledge discovery from web-scale ngrams for unsupervised person mention detection. In: PACLIC, vol. 23, pp. 220–229 (2009)

    Google Scholar 

  14. Kozareva, Z., Vazquez, S., Montoyo, A.: UA-ZSA: web page clustering on the basis of name disambiguation. In: Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval 2007), pp. 338–341 (2007)

    Google Scholar 

  15. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML 2001, pp. 282–289 (2001)

    Google Scholar 

  16. Long, C., Shi, L.: Web person name disambiguation by relevance weighting of extended feature sets. In: Third Web People Search Evaluation Forum (WePS-3), CLEF (2010)

    Google Scholar 

  17. Rao, D., Garera, N., Yarowsky, D.: JHU1: an unsupervised approach to person name disambiguation using web snippets. In: Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval 2007), pp. 199–202 (2007)

    Google Scholar 

  18. Romano, L., Buza, K., Giuliano, C.: XMedia: web people search by clustering with machinely learned similarity measures. In: WePS (2009)

    Google Scholar 

  19. Sibson, R.: SLINK: an optimally efficient algorithm for the single-link cluster method. Comput. Journal 16, 30–34 (1973)

    Article  MathSciNet  Google Scholar 

  20. Singh, S., Subramanya, A., Pereira, F., McCallum, A.: Large-scale cross-document co-reference using distributed inference and hierarchical models. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp. 793–803 (2011)

    Google Scholar 

  21. Wellner, B., McCallum, A., Peng, F., Hay, M.: An Integrated, Conditional Model of Information Extraction and Coreference with Application to Citation Matching. In: Uncertainty in Artificial Intelligence (UAI), pp. 593–601 (2004)

    Google Scholar 

  22. Wick, M., Culotta, A., Rohanimanesh, K., McCallum, A.: An Entity-based Model for Coreference Resolution. In: SIAM International Conference on Data Mining (SDM) (2009a)

    Google Scholar 

  23. Wick, M., Singh, S., McCallum, A.: A discriminative hierarchical model for fast coreference at large scale. In: ACL 2012, pp. 379–388 (2012)

    Google Scholar 

  24. Wick, M., Kobren, A., McCallum, A.: Large-scale author co-reference via hierarchical entity representations. In: Proceedings of the 30th International Conference on Machine Learning (2013)

    Google Scholar 

  25. Xu, J., Lu, Q., Liu, Z.: Combining classification with clustering for web person disambiguation. In: WWW 2012, pp. 637–638 (2012)

    Google Scholar 

  26. Xu, J., Lu, Q., Liu, Z.: Aggregating skip bigrams into key phrase-based vector space model for web person disambiguation. In: KONVENS (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jian Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Xu, J., Lu, Q., Li, M., Li, W. (2015). Web Person Disambiguation Using Hierarchical Co-reference Model. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9041. Springer, Cham. https://doi.org/10.1007/978-3-319-18111-0_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18111-0_22

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18110-3

  • Online ISBN: 978-3-319-18111-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics