An Automatic sameAs Link Discovery from Wikipedia

Kagawa, Kosuke; Tamagawa, Susumu; Yamaguchi, Takahira

doi:10.1007/978-3-319-06826-8_29

Kosuke Kagawa¹⁸,
Susumu Tamagawa¹⁸ &
Takahira Yamaguchi¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8388))

Included in the following conference series:

Joint International Semantic Technology Conference

989 Accesses

Abstract

Spelling variants of words or word sense ambiguity takes many costs in such processes as Data Integration, Information Searching, data preprocessing for Data Mining, and so on. It is useful to construct relations between a word or phrases and a representative name of the entity to meet these demands. To reduce the costs, this paper discusses how to automatically discover “sameAs” and “meaningOf” links from Japanese Wikipedia. In order to do so, we gathered relevant features such as IDF, string similarity, number of hypernym, and so on. We have identified the link-based score on salient features based on SVM results with 960,000 anchor link pairs. These case studies show us that our link discovery method goes well with more than 70 % precision/recall rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bøhn, C., Nørvåg, K.: Extracting named entities and synonyms from wikipedia. In: Advanced Information Networking and Applications (AINA), pp.1300–1307 (2010)
Google Scholar
www-nishio.ist.osaka-u.ac.jp/Thesis/master/2009/michishita/thesis.pdf‎
Google Scholar
Halpin, H., Hayes, P.J., McCusker, J.P., McGuinness, D.L., Thompson, H.S.: When owl:sameAs isn’t the same: an analysis of identity in linked data. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 305–320. Springer, Heidelberg (2010)
Google Scholar
Bechhofer, S., van Harmelen, F., Hendler, J., Horrocks, I., McGuinness, D.L., Patel-Schneider, P.F., Stein, L.A.: OWL Web Ontology Language Reference (2004)
Google Scholar
Yamada, I., Torisawa, K., Kazama, J., Kuroda, K., Murata, M., DeSaeger, S., Bond, F., Sumida, A., Hashimoto, C.: Hyponymy relation acquisition based on distributional similarity and hierarchical structure of wikipedia. Inf. Process. Soc. Jpn. 52, 3435–3447 (2011)
Google Scholar
Hearst, M. A.: Automatic acquisition of hyponyms from large text corpora. In: 14th International Conference on Computational Linguistics, pp.539–545 (1992)
Google Scholar
http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Tamagawa, S., Sakurai, S., Tejima, T., Morita, T., Izumi, N., Yamaguchi, T.: Learning a large scale of ontology from japanese wikipedia. In: Web Intelligence and Intelligent Agent Technology (WI-IAT), pp.279–286 (2010)
Google Scholar
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)
Google Scholar
Hoffart, J., Suchanek, F., Berberich, K., Weikum, G.: YAGO2: a spatially and temporally enhanced knowledge base from wikipedia. Research Report MPI-I-2010-5-007, Max-Planck-Institut für Informatik (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama-shi, Kanagawa, 223-8522, Japan
Kosuke Kagawa, Susumu Tamagawa & Takahira Yamaguchi

Authors

Kosuke Kagawa
View author publications
You can also search for this author in PubMed Google Scholar
Susumu Tamagawa
View author publications
You can also search for this author in PubMed Google Scholar
Takahira Yamaguchi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Takahira Yamaguchi .

Editor information

Editors and Affiliations

Yonsei University, Seoul, Korea, Republic of (South Korea)
Wooju Kim
Indiana University, Bloomington, Indiana, USA
Ying Ding
Seoul National University, Seoul, Korea, Republic of (South Korea)
Hong-Gee Kim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kagawa, K., Tamagawa, S., Yamaguchi, T. (2014). An Automatic sameAs Link Discovery from Wikipedia. In: Kim, W., Ding, Y., Kim, HG. (eds) Semantic Technology. JIST 2013. Lecture Notes in Computer Science(), vol 8388. Springer, Cham. https://doi.org/10.1007/978-3-319-06826-8_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-06826-8_29
Published: 21 May 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06825-1
Online ISBN: 978-3-319-06826-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics