Advertisement

HEEL: exploratory entity linking for heterogeneous information networks

  • Chengyu Wang
  • Xiaofeng HeEmail author
  • Aoying Zhou
Regular Paper
  • 25 Downloads

Abstract

A heterogeneous information network (HIN) is a ubiquitous data model, consisting of multiple types of entities and relations. Names of entities in HINs are inherently ambiguous, making it difficult to fully disambiguate a HIN. In this paper, we introduce the task of exploratory entity linking for HINs. Given a partially disambiguated HIN, we aim at linking ambiguous names to disambiguated entities in the HIN if their referent entities are present. We also try to “explore” other alternatives by discovering new entities and adding them to the HIN. A partial classification EM-based approach is proposed to address this task. We present a constrained probability propagation model to link surface names to entities in the HIN. New entity detection process is modeled as a maximum edge weight clique problem. Experiments illustrate that our method outperforms state-of-the-art methods for entity linking with HINs and author name disambiguation.

Keywords

Heterogeneous information network Exploratory entity linking Partial classification EM Author name disambiguation 

Notes

Acknowledgements

This work is supported by the National Key Research and Development Program of China under Grant No. 2016YFB1000904. Chengyu Wang is partially supported by the Outstanding Doctoral Dissertation Cultivation Plan of Action under Grant No. YB2016040.

References

  1. 1.
    Alidaee B, Glover F, Kochenberger GA, Wang H (2007) Solving the maximum edge weight clique problem via unconstrained quadratic programming. Eur J Oper Res 181(2):592–597CrossRefzbMATHGoogle Scholar
  2. 2.
    Bagga A, Baldwin B (1998) Entity-based cross-document coreferencing using the vector space model. In: ACL-COLING, pp 79–85Google Scholar
  3. 3.
    Bunescu RC, Pasca M (2006) Using encyclopedic knowledge for named entity disambiguation. In: EACLGoogle Scholar
  4. 4.
    Carmel D, Chang M-W, Gabrilovich E, Hsu B-JP, Wang K (2014) Erd’14: entity recognition and disambiguation challenge. In: SIGIR Forum vol 48, no 2, pp 63–77Google Scholar
  5. 5.
    Celeux G, Govaert G (1992) A classification EM algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14(3):315–332MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Chiang M-F, Liou J-J, Wang J-L, Peng W-C, Shan M-K (2013) Exploring heterogeneous information networks and random walk with restart for academic search. Knowl Inf Syst 36(1):59–82CrossRefGoogle Scholar
  7. 7.
    Cornolti M, Ferragina P, Ciaramita M, Rüd S, Schütze H (2016) A piggyback system for joint entity mention detection and linking in web queries. In: WWW, pp 567–578Google Scholar
  8. 8.
    Dalvi BB, Cohen WW, Callan J (2013) Exploratory learning. In: ECML-PKDD, pp 128–143Google Scholar
  9. 9.
    Ferreira AA, Gonçalves MA, Laender AHF (2012) A brief survey of automatic methods for author name disambiguation. In: SIGMOD Record, vol 41, no 2, pp 15–26Google Scholar
  10. 10.
    Ganea O-E, Ganea M, Lucchi A, Eickhoff C, Hofmann T (2016) Probabilistic bag-of-hyperlinks model for entity linking. In: WWW, pp 927–938Google Scholar
  11. 11.
    Han X, Sun L, Zhao J (2011) Collective entity linking in web text: a graph-based method. In: SIGIR, pp 765–774Google Scholar
  12. 12.
    Kanani PH, McCallum A, Chris P (2007) Improving author coreference by resource-bounded information gathering from the web. In: IJCAI, pp 429–434Google Scholar
  13. 13.
    Lao N, Cohen WW (2010) Relational retrieval using a combination of path-constrained random walks. Mach Learn 81(1):53–67MathSciNetCrossRefGoogle Scholar
  14. 14.
    Li C, Cheung WK, Ye Y, Zhang X, Chu D-H, Li X (2015) The author-topic-community model for author interest profiling and community discovery. Knowl Inf Syst 44(2):359–383CrossRefGoogle Scholar
  15. 15.
    Pei L, Luna DX, Andrea M, Divesh S (2011) Linking temporal records. In: PVLDB, vol 4, no 11, pp 956–967Google Scholar
  16. 16.
    Li S, Cong G, Miao C (2012) Author name disambiguation using a new categorical distribution similarity. In: ECML-PKDD, pp 569–584Google Scholar
  17. 17.
    Li Y, Tan S, Sun H, Han J, Dan R, Yan X (2016) Entity disambiguation with linkless knowledge bases. In: WWW, pp 1261–1270Google Scholar
  18. 18.
    Pitts M, Savvana S, Roy SB, Mandava V (2014) ALIAS: author disambiguation in Microsoft academic search engine dataset. In: EDBT, pp 648–651Google Scholar
  19. 19.
    Qian Y, Hu Y, Cui J, Zheng Q, Nie Z (2011) Combining machine learning and human judgment in author disambiguation. In: CIKM, pp 1241–1246Google Scholar
  20. 20.
    Shen W, Han J, Wang J (2014) A probabilistic model for linking named entities in web text with heterogeneous information networks. In: SIGMOD, pp 1199–1210Google Scholar
  21. 21.
    Shen W, Wang J, Han J (2015) Entity linking with a knowledge base: issues, techniques, and solutions. TKDE 27(2):443–460Google Scholar
  22. 22.
    Shen W, Wang J, Luo P, Wang M (2012) LIEGE: link entities in web lists with knowledge base. In: KDD, pp 1424–1432Google Scholar
  23. 23.
    Shen W, Wang J, Luo P, Wang M (2012) LINDEN: linking named entities with knowledge base via semantic knowledge. In: WWWGoogle Scholar
  24. 24.
    Shi C, Li Y, Yu PS, Bin W (2016) Constrained-meta-path-based ranking in heterogeneous information network. Knowl Inf Syst 49(2):719–747CrossRefGoogle Scholar
  25. 25.
    Sil A, Florian R (2016) One for all: towards language independent named entity linking. In: ACL, pp 2255–2264Google Scholar
  26. 26.
    Solecki B, Silva L, Efimov D (2013) KDD cup 2013: author disambiguation. In: KDD Cup 2013 workshop, pp 9:1–9:3Google Scholar
  27. 27.
    Sun Y, Han J, Yan X, Yu PS, Tianyi W (2011) Pathsim: meta path-based top-k similarity search in heterogeneous information networks. In: PVLDB, vol 4, no 11, pp 992–1003Google Scholar
  28. 28.
    Sun Y, Han J, Zhao P, Yin Z, Cheng H, Wu T (2009) Rankclus: integrating clustering with ranking for heterogeneous information network analysis. In: EDBT, pp 565–576Google Scholar
  29. 29.
    Tang J (2016) Aminer: toward understanding big scholar data. In: WSDM, p 467Google Scholar
  30. 30.
    Wang C, Zhang R, He X, Zhou A (2016) Error link detection and correction in Wikipedia. In: CIKM, pp 307–316Google Scholar
  31. 31.
    Wang X, Tang J , Cheng H, Yu PS (2011) ADANA: active name disambiguation. In: ICDM, pp 794–803Google Scholar
  32. 32.
    Yang Y, Chang M-W (2015) S-MART: novel tree-based structured learning algorithms applied to tweet entity linking. In: ACL-IJCNLP, pp 504–513Google Scholar
  33. 33.
    Yin X, Han J, Yu PS (2007) Object distinction: distinguishing objects with identical names. In: ICDE, pp 1242–1246Google Scholar
  34. 34.
    Zhang B, Dundar M, Al Hasan M (2016) Bayesian non-exhaustive classification. A case study: online name disambiguation using temporal record streams. In: CIKM, pp 1341–1350Google Scholar
  35. 35.
    Zwicklbauer S, Seifert C, Granitzer M (2016) Robust and collective entity disambiguation through semantic embeddings. In: SIGIR, pp 425–434Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Computer Science and Software EngineeringEast China Normal UniversityShanghaiChina
  2. 2.School of Data Science and EngineeringEast China Normal UniversityShanghaiChina

Personalised recommendations