Random walk-based entity representation learning and re-ranking for entity search

Abstract

Linked Data (LD) has become a valuable source of factual records, and entity search is a fundamental task in LD. The task is, given a query consisting of a set of keywords, to retrieve a set of relevant entities in LD. The state-of-the-art approaches for entity search are based on information retrieval techniques. This paper first examines these approaches with a traditional evaluation metric, recall@k, to reveal their potential for improvement. To obtain evidence for the potentials, an investigation is carried out on the relationship between queries and answer entities in terms of path lengths on a graph of LD. On the basis of the investigation, learning representations of entities are dealt with. The existing methods of entity search are based on heuristics that determine relevant fields (i.e., predicates and related entities) to constitute entity representations. Since the heuristics require burdensome human decisions, this paper is aimed at removing the burden with a graph proximity measurement. To this end, in this paper, RWRDoc is proposed. It is an RWR (random walk with restart)-based representation learning method that learns representations of entities by using weighted combinations of representations of reachable entities w.r.t. RWR. RWRDoc is mainly designed to improve recall scores; therefore, as shown in experiments, it lacks capability in ranking. In order to improve the ranking qualities, this paper proposes a personalized PageRank-based re-ranking method, PPRSD (Personalized PageRank-based Score Distribution), for the retrieved results. PPRSD distributes relevance scores calculated by text-based entity search methods in a personalized PageRank manner. Experimental evaluations showcase that RWRDoc can improve search qualities in terms of recall@1000 and PPRSD can compensate for RWRDoc’s insufficient ranking capability, and the evaluations confirmed this compensation.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Notes

  1. 1.

    http://tiny.cc/dbpedia-entity.

  2. 2.

    https://github.com/iai-group/DBpedia-Entity/tree/master/runs/v2.

  3. 3.

    http://dbpedia.org/resource/Toyotomi_Hideyoshi.

  4. 4.

    http://dbpedia.org/resource/Japanese_invasions_of_Korea_(1592-98).

  5. 5.

    http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html.

  6. 6.

    http://downloads.dbpedia.org/2015-10/.

  7. 7.

    http://tiny.cc/dbpedia-entity.

  8. 8.

    https://github.com/iai-group/DBpedia-Entity/tree/master/runs/v2.

  9. 9.

    http://dbpedia.org/resource/Toyotomi_Hideyoshi.

  10. 10.

    http://dbpedia.org/resource/Nagoya.

  11. 11.

    http://dbpedia.org/resource/Japanese_invasions_of_Korea_(1592-98).

References

  1. 1.

    Balaneshinkordan S, Kotov A, Nikolaev F (2018) Attentive neural architecture for ad-hoc structured document retrieval. In: CIKM 2018, pp 1173–1182

  2. 2.

    Balmin A, Hristidis V, Papakonstantinou Y (2004) ObjectRank: authority-based keyword search in databases. In: VLDB 2004, pp 564–575

  3. 3.

    Bizer C, Heath T, Berners-Lee T (2009) Linked data—the story so far. Int J Semant Web Inf Syst 5(3):1–22

    Article  Google Scholar 

  4. 4.

    Burges CJC, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender GN (2005) Learning to rank using gradient descent. In: ICML 2005, pp 89–96

  5. 5.

    Chen J, Xiong C, Callan J (2016) An empirical study of learning to rank for entity search. In: SIGIR 2016, pp 737–740

  6. 6.

    Ciglan M, Nørvåg K, Hluchý L (2012) The SemSets model for ad-hoc semantic list search. In: WWW 2012, pp 131–140

  7. 7.

    Dali L, Fortuna B (2011) Learning to rank for semantic search. In: SemSearch@WWW2011

  8. 8.

    Delbru R, Toupikov N, Catasta M, Tummarello G, Decker S (2010) Hierarchical link analysis for ranking web data. In: ESWC 2010, pp 225–239

  9. 9.

    Grover A, Leskovec J (2016) node2vec: scalable feature learning for networks. In: SIGKDD 2016, pp 855–864

  10. 10.

    Hasibi F (2018) Semantic search with knowledge bases. PhD thesis, Norwegian University of Science and Technology, Trondheim, Norway

  11. 11.

    Hasibi F, Balog K, Bratsberg SE (2016) Exploiting entity linking in queries for entity retrieval. In: ICTIR 2016, pp 209–218

  12. 12.

    Hasibi F, Nikolaev F, Xiong C, Balog K, Bratsberg SE, Kotov A, Callan J (2017) DBpedia-entity v2: a test collection for entity search. In: SIGIR 2017, pp 1265–1268

  13. 13.

    Haveliwala TH (2002) Topic-sensitive PageRank. In: WWW 2002, pp 517–526

  14. 14.

    Hogan A, Harth A, Decker S (2006) ReConRank: a scalable ranking method for semantic web data with context. In: SSWS 2006

  15. 15.

    Interdonato R, Tagarelli A (2015) Multi-relational PageRank for tree structure sense ranking. World Wide Web 18(5):1301–1329

    Article  Google Scholar 

  16. 16.

    Ito H, Komamizu T, Amagasa T, Kitagawa H (2018) Community detection and correlated attribute cluster analysis on multi-attributed graphs. In: DARLI-AP@EDBT/ICDT 2018, pp 2–9

  17. 17.

    Ito H, Komamizu T, Amagasa T, Kitagawa H (2018) Network-word embedding for dynamic text attributed networks. In: SCSN@ICSC 2018, pp 334–339

  18. 18.

    Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst 20(4):422–446

    Article  Google Scholar 

  19. 19.

    Kim J, Xue X, Croft WB (2009) A probabilistic retrieval model for semistructured data. In: ECIR 2009, pp 228–239

  20. 20.

    Komamizu T, Okumura S, Amagasa T, Kitagawa H (2017) FORK: feedback-aware ObjectRank-based keyword search over linked data. In: AIRS 2017, pp 58–70

  21. 21.

    Li J, Dani H, Hu X, Tang J, Chang Y, Liu H (2017) Attributed network embedding for learning in a dynamic environment. In: CIKM 2017, pp 387–396

  22. 22.

    Lin X, Lam W, Lai KP (2018) Entity retrieval in the knowledge graph with hierarchical entity type and content. In: ICTIR 2018, pp 211–214

  23. 23.

    Metzler D, Croft WB (2005) A Markov random field model for term dependencies. In: SIGIR 2005, pp 472–479

  24. 24.

    Nikolaev F, Kotov A, Zhiltsov N (2016) Parameterized fielded term dependence models for ad-hoc entity retrieval from knowledge graph. In: SIGIR 2016, pp 435–444

  25. 25.

    Ogilvie P, Callan JP (2003) Combining document representations for known-item search. In: SIGIR 2003, pp 143–150

  26. 26.

    Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: bringing order to the web. Technical report 1999-66

  27. 27.

    Perozzi B, Al-Rfou R, Skiena S (2014) DeepWalk: online learning of social representations. In: SIGKDD 2014, pp 701–710

  28. 28.

    Ponte JM, Croft WB (1998) A language modeling approach to information retrieval. In: SIGIR 1998, pp 275–281

  29. 29.

    Pound J, Mika P, Zaragoza H (2010) Ad-hoc object retrieval in the web of data. In: WWW 2010, pp 771–780

  30. 30.

    Robertson SE, Zaragoza H (2009) The probabilistic relevance framework: BM25 and beyond. FTIR 3(4):333–389

    Google Scholar 

  31. 31.

    Shijia E, Xiang Y (2017) Entity search based on the representation learning model with different embedding strategies. IEEE Access 5:15174–15183

    Article  Google Scholar 

  32. 32.

    Tong H, Faloutsos C, Pan J (2008) Random walk with restart: fast solutions and applications. Knowl Inf Syst 14(3):327–346

    Article  Google Scholar 

  33. 33.

    Usbeck R, Ngomo AN, Haarmann B, Krithara A, Röder M, Napolitano G (2017) 7th open challenge on question answering over linked data (QALD-7). In: ESWC 2017, pp 59–69

  34. 34.

    Wang Q, Kamps J, Camps GR, Marx M, Schuth A, Theobald M, Gurajada S, Mishra A (2012) Overview of the INEX 2012 linked data track. In: CLEF 2012 evaluation labs and workshop

  35. 35.

    Yang C, Liu Z, Zhao D, Sun M, Chang EY (2015) Network representation learning with rich text information. In: IJCAI 2015, pp 2111–2117

  36. 36.

    Yoon M, Jung J, Kang U (2018) TPA: fast, scalable, and accurate method for approximate random walk with restart on billion scale graphs. In: ICDE 2018, pp 1132–1143

  37. 37.

    Zhang Z, Yang H, Bu J, Zhou S, Yu P, Zhang J, Ester M, Wang C (2018) ANRL: attributed network representation learning via deep neural networks. In: IJCAI 2018, pp 3155–3161

  38. 38.

    Zhiltsov N, Kotov A, Nikolaev F (2015) Fielded sequential dependence model for ad-hoc entity retrieval in the web of data. In: SIGIR 2015, pp 253–262

Download references

Acknowledgements

This work was partly supported by JSPS KAKENHI Grant Number JP18K18056 and the Artificial Intelligence Research Promotion Foundation.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Takahiro Komamizu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Komamizu, T. Random walk-based entity representation learning and re-ranking for entity search. Knowl Inf Syst 62, 2989–3013 (2020). https://doi.org/10.1007/s10115-020-01445-4

Download citation

Keywords

  • Linked Data
  • Graph analysis
  • Entity representation learning
  • PageRank-based re-ranking
  • Random walk with restart
  • Entity search