Abstract
Automatically linking Wikipedia pages can be done either content based by exploiting word similarities or structure based by exploiting characteristics of the link graph. Our approach focuses on a content based strategy by detecting Wikipedia titles as link candidates and selecting the most relevant ones as links. The relevance calculation is based on the context, i.e. the surrounding text of a link candidate. Our goal was to evaluate the influence of the link-context on selecting relevant links and determining a links best-entry-point. Results show, that a whole Wikipedia page provides the best context for resolving link and that straight forward inverse document frequency based scoring of anchor texts achieves around 4% less Mean Average Precision on the provided data set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baldridge, T.M.J., Bierner, G.: Opennlp: The maximum entropy framework (2001), http://maxent.sourceforge.net/about.html (last visited June 2008)
Choi, F.Y.Y.: Advances in domain independent linear text segmentation. In: Proceedings of the first conference on North American chapter of the Association for Computational Linguistics, pp. 26–33. Morgan Kaufmann Publishers Inc., San Francisco (2000)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Geva, S.: Gpx: Ad-hoc queries and automated link discovery in the wikipedia, pp. 404–416 (2008)
Hatcher, E., Gospodnetic, O.: Lucene in Action (In Action series). Manning Publications (December 2004)
Huang, D.W.C., Xu, Y., Trotman, A., Geva, S.: Overview of inex 2007 link the wiki track. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 373–387. Springer, Heidelberg (2008)
Itakura, K.Y., Clarke, C.L.: University of waterloo at inex2007: Adhoc and link-the-wiki tracks, pp. 417–425 (2008)
Milne, D., Witten, I.H.: Learning to link with wikipedia. In: CIKM 2008: Proceeding of the 17th ACM conference on Information and knowledge mining, pp. 509–518. ACM, New York (2008)
Smucker, M.D., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: CIKM 2007: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pp. 623–632. ACM, New York (2007)
Wu, F., Weld, D.S.: Autonomously semantifying wikipedia. In: CIKM 2007: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pp. 41–50. ACM, New York (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Granitzer, M., Seifert, C., Zechner, M. (2009). Context Based Wikipedia Linking. In: Geva, S., Kamps, J., Trotman, A. (eds) Advances in Focused Retrieval. INEX 2008. Lecture Notes in Computer Science, vol 5631. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03761-0_36
Download citation
DOI: https://doi.org/10.1007/978-3-642-03761-0_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03760-3
Online ISBN: 978-3-642-03761-0
eBook Packages: Computer ScienceComputer Science (R0)