Skip to main content

What’s New? Analysing Language-Specific Wikipedia Entity Contexts to Support Entity-Centric News Retrieval

  • Chapter
  • First Online:
Book cover Transactions on Computational Collective Intelligence XXVI

Part of the book series: Lecture Notes in Computer Science ((TCCI,volume 10190))

Abstract

Representation of influential entities, such as celebrities and multinational corporations on the web can vary across languages, reflecting language-specific entity aspects, as well as divergent views on these entities in different communities. An important source of multilingual background knowledge about influential entities is Wikipedia—an online community-created encyclopaedia—containing more than 280 language editions. Such language-specific information could be applied in entity-centric information retrieval applications, in which users utilise very simple queries, mostly just the entity names, for the relevant documents. In this article we focus on the problem of creating language-specific entity contexts to support entity-centric, language-specific information retrieval applications. First, we discuss alternative ways such contexts can be built, including Graph-based and Article-based approaches. Second, we analyse the similarities and the differences in these contexts in a case study including 219 entities and five Wikipedia language editions. Third, we propose a context-based entity-centric information retrieval model that maps documents to aspect space, and apply language-specific entity contexts to perform query expansion. Last, we perform a case study to demonstrate the impact of this model in a news retrieval application. Our study illustrates that the proposed model can effectively improve the recall of entity-centric information retrieval while keeping high precision, and provide language-specific results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://en.wikipedia.org/wiki/List_of_Wikipedias.

  2. 2.

    http://www.dw.com/en/.

  3. 3.

    http://www.spiegel.de/international/.

  4. 4.

    http://www.thelocal.de/.

  5. 5.

    http://www.theguardian.com/.

  6. 6.

    http://www.express.co.uk/.

  7. 7.

    The annotated datasets are accessible at: https://github.com/zhouyiwei/WIKIIRDATA.

References

  1. Bunescu, R.C., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. In: EACL, vol. 6, pp. 9–16 (2006)

    Google Scholar 

  2. Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: EMNLP-CoNLL, vol. 7, pp. 708–716 (2007)

    Google Scholar 

  3. Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th International Conference on Semantic Systems, I-SEMANTICS 2013, pp. 121–124. ACM, New York (2013)

    Google Scholar 

  4. Egozi, O., Markovitch, S., Gabrilovich, E.: Concept-based information retrieval using explicit semantic analysis. ACM Trans. Inf. Syst. (TOIS) 29(2), 8 (2011)

    Article  Google Scholar 

  5. Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: IJCAI, vol. 7, pp. 1606–1611 (2007)

    Google Scholar 

  6. Gabrilovich, E., Markovitch, S.: Wikipedia-based semantic interpretation for natural language processing. J. Artif. Intell. Res. (JAIR) 34, 443–498 (2009). doi:10.1613/jair.2669

    MATH  Google Scholar 

  7. Han, X., Sun, L., Zhao, L.: Collective entity linking in web text: a graph-based method. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 765–774. ACM (2011)

    Google Scholar 

  8. Han, X., Zhao, J.: Named entity disambiguation by leveraging Wikipedia semantic knowledge. In: Proceedings of the 18th ACM Conference on Information and knowledge Management, pp. 215–224. ACM (2009)

    Google Scholar 

  9. Hu, J., Fang, L., Cao, Y., Zeng, H.-J., Li, H., Yang, Q., Chen, Z.: Enhancing text clustering by leveraging Wikipedia semantics. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 179–186. ACM (2008)

    Google Scholar 

  10. Kaptein, R., Kamps, J.: Exploiting the category structure of Wikipedia for entity ranking. Artif. Intell. 194, 111–129 (2013)

    Article  MATH  Google Scholar 

  11. Kataria, S.S., Kumar, K.S., Rastogi, R.R., Sen, P., Sengamedu, S.H.: Entity disambiguation with hierarchical topic models. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1037–1045. ACM (2011)

    Google Scholar 

  12. Kulkarni, S., Singh, A., Ramakrishnan, G., Chakrabarti, S.: Collective annotation of Wikipedia entities in web text. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 457–466. ACM (2009)

    Google Scholar 

  13. Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: Proceedings the 7th International Conference on Semantic Systems, I-SEMANTICS 2011, Graz, Austria, 7–9 September 2011, pp. 1–8 (2011)

    Google Scholar 

  14. Milne, D.N., Witten, I.H., Nichols, D.M.: A knowledge-based search engine powered by Wikipedia. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 445–454. ACM (2007)

    Google Scholar 

  15. Müller, C., Gurevych, I.: Using Wikipedia and Wiktionary in domain-specific information retrieval. In: Peters, C., et al. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 219–226. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04447-2_28

    Chapter  Google Scholar 

  16. Nastase, V., Strube, M.: Transforming Wikipedia into a large scale multilingual concept network. Artif. Intell. 194, 62–85 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  17. Nothman, J., Ringland, N., Radford, W., Murphy, T., Curran, J.R.: Learning multilingual named entity recognition from Wikipedia. Artif. Intell. 194, 151–175 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  18. Otegi, A., Arregi, X., Ansa, O., Agirre, E.: Using knowledge-based relatedness for information retrieval. Knowl. Inf. Syst. 44(3), 689–718 (2015). doi:10.1007/s10115-014-0785-4

    Article  Google Scholar 

  19. Ploch, D.: Exploring entity relations for named entity disambiguation. In: Proceedings of the ACL 2011 Student Session, pp. 18–23. Association for Computational Linguistics (2011)

    Google Scholar 

  20. Potthast, M., Stein, B., Anderka, M.: A Wikipedia-based multilingual retrieval model. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 522–530. Springer, Heidelberg (2008). doi:10.1007/978-3-540-78646-7_51

    Chapter  Google Scholar 

  21. Robertson, S., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009)

    Article  Google Scholar 

  22. Rogers, R.: Wikipedia as cultural reference. In: Rogers, R. (ed.) Digital Methods. The MIT Press, Cambridge (2013)

    Google Scholar 

  23. Schönhofen, P., Benczúr, A., Bíró, I., Csalogány, K.: Cross-language retrieval with Wikipedia. In: Peters, C., et al. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 72–79. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85760-0_9

    Chapter  Google Scholar 

  24. Sorg, P., Cimiano, P.: Exploiting Wikipedia for cross-lingual and multilingual information retrieval. Data Knowl. Eng. 74, 26–45 (2012)

    Article  Google Scholar 

  25. Wang, P., Hu, J., Zeng, H.-J., Chen, Z.: Using Wikipedia knowledge to improve text classification. Knowl. Inf. Syst. 19(3), 265–281 (2009)

    Article  Google Scholar 

  26. Witten, I., Milne, D.: An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In: Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pp. 25–30. AAAI Press, Chicago (2008)

    Google Scholar 

  27. Yazdani, M., Popescu-Belis, A.: Computing text semantic relatedness using the contents and links of a hypertext encyclopedia. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pp. 3185–3189. AAAI Press (2013)

    Google Scholar 

  28. Zhou, Y., Cristea, A.I., Roberts, Z.: Is Wikipedia really neutral? A sentiment perspective study of war-related Wikipedia articles since 1945. In: Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation, PACLIC 29, Shanghai, China, 30 October–1 November 2015

    Google Scholar 

  29. Zhou, Y., Demidova, E., Cristea, A.I.: Analysing entity context in multilingual Wikipedia to support entity-centric retrieval applications. In: Cardoso, J., Guerra, F., Houben, G.-J., Pinto, A.M., Velegrakis, Y. (eds.) KEYSTONE 2015. LNCS, vol. 9398, pp. 197–208. Springer, Cham (2015). doi:10.1007/978-3-319-27932-9_17

    Chapter  Google Scholar 

  30. Zhou, Y., Demidova, E., Cristea, A.I.: Who likes me more? Analysing entity-centric language-specific bias in multilingual Wikipedia. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing, SAC 2016 (2016)

    Google Scholar 

Download references

Acknowledgments

This work was partially funded by the COST Action IC1302 (KEYSTONE), the ERC under ALEXANDRIA (ERC 339233) and H2020-MSCA-ITN-2014 WDAqua (64279).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yiwei Zhou or Alexandra I. Cristea .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Zhou, Y., Demidova, E., Cristea, A.I. (2017). What’s New? Analysing Language-Specific Wikipedia Entity Contexts to Support Entity-Centric News Retrieval. In: Nguyen, N., Kowalczyk, R., Pinto, A., Cardoso, J. (eds) Transactions on Computational Collective Intelligence XXVI. Lecture Notes in Computer Science(), vol 10190. Springer, Cham. https://doi.org/10.1007/978-3-319-59268-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59268-8_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59267-1

  • Online ISBN: 978-3-319-59268-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics