Skip to main content

Eliminating Incorrect Cross-Language Links in Wikipedia

  • Conference paper
  • First Online:
Web Information Systems Engineering – WISE 2017 (WISE 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10570))

Included in the following conference series:

  • 1398 Accesses

Abstract

Many Wikipedia articles that cover the same topic in different language editions are interconnected via cross-language links that enable the understanding of topics in multiple languages, as well as cross-language information retrieval applications. However, cross-language links are added manually by the users of Wikipedia and, as such, are often incorrect. In this paper, we propose an approach to automatically eliminate incorrect cross-language links based on the observation that groups of articles that are pairwise connected through cross-language links form independent connected components. For each incoherent component (i.e., one that contains two or more articles from the same language edition), our approach assigns a correctness score to its crosslinks and removes those with the lowest score to make the component coherent. The results of our evaluation on a snapshot of Wikipedia in 8 languages indicates that our approach shows quantitative promise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Adafre, S.F., de Rijke, M.: Finding similar sentences across multiple languages in Wikipedia. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, pp. 62–69 (2006)

    Google Scholar 

  2. Adar, E., Skinner, M., Weld, D.S.: Information arbitrage across multi-lingual Wikipedia. In: Proceedings of WSDM, pp. 94–103. ACM (2009)

    Google Scholar 

  3. Bennacer, N., Johnson Vioulès, M., López, M.A., Quercini, G.: A multilingual approach to discover cross-language links in Wikipedia. In: Wang, J., Cellary, W., Wang, D., Wang, H., Chen, S.-C., Li, T., Zhang, Y. (eds.) WISE 2015. LNCS, vol. 9418, pp. 539–553. Springer, Cham (2015). doi:10.1007/978-3-319-26190-4_36

    Chapter  Google Scholar 

  4. Bolikowski, Ł.: Scale-free Topology of the Interlanguage Links in Wikipedia. arXiv preprint arXiv:0904.0564 (2009)

  5. de Melo G., Weikum, G.: MENTA: inducing multilingual taxonomies from Wikipedia. In: Procedings of CIKM, pp. 1099–1108. ACM (2010)

    Google Scholar 

  6. de Melo, G., Weikum, G.: Untangling the cross-lingual link structure of Wikipedia. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 844–853. Association for Computational Linguistics, Stroudsburg (2010)

    Google Scholar 

  7. Moreira, C.E.M., Moreira, V.P.: Finding missing cross-language links in Wikipedia. JIDM 4(3), 251–265 (2013)

    Google Scholar 

  8. Penta, A., Quercini, G., Reynaud, C., Shadbolt, N.: Discovering cross-language links in Wikipedia through semantic relatedness. In: Proceedings of ECAI, pp. 642–647 (2012)

    Google Scholar 

  9. Rinser, D., Lange, D., Naumann, F.: Cross-lingual entity matching and infobox alignment in Wikipedia. Inf. Syst. 38(6), 887–907 (2013)

    Article  Google Scholar 

  10. Sorg, P., Cimiano, P.: Enriching the crosslingual link structure of Wikipedia-a classification-based approach. In: Proceedings of the AAAI 2008 Workshop on Wikipedia and Artificial Intelligence, pp. 49–54 (2008)

    Google Scholar 

  11. Sorg, P., Cimiano, P.: Exploiting Wikipedia for cross-lingual and multilingual information retrieval. Data Knowl. Eng. 74, 26–45 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gianluca Quercini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Bennacer, N., Bugiotti, F., Galicia, J., Patricio, M., Quercini, G. (2017). Eliminating Incorrect Cross-Language Links in Wikipedia. In: Bouguettaya, A., et al. Web Information Systems Engineering – WISE 2017. WISE 2017. Lecture Notes in Computer Science(), vol 10570. Springer, Cham. https://doi.org/10.1007/978-3-319-68786-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68786-5_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68785-8

  • Online ISBN: 978-3-319-68786-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics