Detecting Synonymous Predicates from Online Encyclopedia with Rich Features

Han, Zhe; Feng, Yansong; Zhao, Dongyan

doi:10.1007/978-3-319-48051-0_9

Zhe Han²⁰,
Yansong Feng²⁰ &
Dongyan Zhao²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9994))

Included in the following conference series:

Asia Information Retrieval Symposium

875 Accesses

Abstract

The integration of Linked Open Data faces great challenges on the semantic level, despite unified data models. Inappropriate use of ontology concepts, namely predicates, impedes knowledge discovery. Although predicate unification is one of the most crucial steps when building structured knowledge base, little effort has been put forward. In this paper, we propose a supervised approach to detect synonymous predicates. Our detection focuses on feature selection and their effectiveness analysis. We not only leverage different resources such as Wikipedia, Freebase, but also use different word embeddings to represent predicates. The experimental results indicate that wikitext defined by Wikipedia and predicate surface form are most useful features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Based on Chinese Wikipedia web pages in August 2014.
2.
Available at https://dumps.wikimedia.org/zhwiki/.
3.
The linking property in Freebase rdf dump is Wikipedia.zh-cn_id while the Freebase category predicate is rdf:type.
4.
https://zh.wikipedia.org/wiki?curid=472824.
5.
http://www.freebase.com/m/03cp9fl.
6.
The version of Freebase used in the experiment is 2013-06-02 (1.37 billion triples). We collected categories of 337042 entities in Freebase.

References

Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1247–1250. ACM (2008)
Google Scholar
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: Dbpedia-a crystallization point for the web of data. Web Semant. Sci. Serv. Agents World Wide Web 7, 154–165 (2009)
Article Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, pp. 697–706. ACM (2007)
Google Scholar
Wu, F., Hoffmann, R., Weld, D.S.: Information extraction from wikipedia: moving down the long tail. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 731–739. ACM (2008)
Google Scholar
Tan, C.H., Agichtein, E., Ipeirotis, P., Gabrilovich, E.: Trust, but verify: predicting contribution quality for knowledge base construction and curation. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp. 553–562. ACM (2014)
Google Scholar
Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: Proceedings IEEE International Conference on Data Mining, ICDM 2001, pp. 313–320. IEEE (2001)
Google Scholar
Cafarella, M.J., Halevy, A., Wang, D.Z., Wu, E., Zhang, Y.: Webtables: exploring the power of tables on the web. Proc. VLDB Endow. 1, 538–549 (2008)
Article Google Scholar
Abedjan, Z., Naumann, F.: Synonym analysis for predicate expansion. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 140–154. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38288-8_10
Chapter Google Scholar
Baroni, M., Bisi, S.: Using cooccurrence statistics and the web to discover synonyms in a technical language. In: LREC (2004)
Google Scholar
Wei, X., Peng, F., Tseng, H., Lu, Y., Dumoulin, B.: Context sensitive synonym discovery for web search queries. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 1585–1588. ACM (2009)
Google Scholar
Harris, Z.S.: Distributional structure. Word 10, 146 (1954)
Google Scholar
Naumann, F., Ho, C.T., Tian, X., Haas, L.M., Megiddo, N.: Attribute classification using feature analysis. In: ICDE, vol. 271 (2002)
Google Scholar
Li, W.S., Clifton, C.: Semint: a tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data Knowl. Eng. 33, 49–84 (2000)
Article MATH Google Scholar
Denoyer, L., Gallinari, P.: The wikipedia XML corpus. In: Fuhr, N., Lalmas, M., Trotman, A. (eds.) INEX 2006. LNCS, vol. 4518, pp. 12–19. Springer, Heidelberg (2007). doi:10.1007/978-3-540-73888-6_2
Chapter Google Scholar
Wu, F., Weld, D.S.: Autonomously semantifying wikipedia. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 41–50. ACM (2007)
Google Scholar
Wu, F., Weld, D.S.: Automatically refining the wikipedia infobox ontology. In: Proceedings of the 17th International Conference on World Wide Web, pp. 635–644. ACM (2008)
Google Scholar
Cucerzan, S.: Large-scale named entity disambiguation based on wikipedia data. In: EMNLP-CoNLL, vol. 7, pp. 708–716 (2007)
Google Scholar

Download references

Acknowledgement

This work was supported by National High Technology R&D Program of China (Grant No. 2015AA015403, 2014AA015102), Natural Science Foundation of China (Grant No. 61202233, 61272344, 61370055) and the joint project with IBM Research. Any correspondence please refer to Yansong Feng.

Author information

Authors and Affiliations

Institute of Computer Science and Technology, Peking University, Beijing, China
Zhe Han, Yansong Feng & Dongyan Zhao

Authors

Zhe Han
View author publications
You can also search for this author in PubMed Google Scholar
Yansong Feng
View author publications
You can also search for this author in PubMed Google Scholar
Dongyan Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhe Han .

Editor information

Editors and Affiliations

Tsinghua University , Beijing, China
Shaoping Ma
Renmin University of China , Beijing, China
Ji-Rong Wen
Tsinghua University , Beijing, China
Yiqun Liu
Renmin University of China , Beijing, China
Zhicheng Dou
Tsinghua University , Beijing, China
Min Zhang
Yahoo Labs , Sunnyvale, California, USA
Yi Chang
Renmin University of China , Beijing, China
Xin Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, Z., Feng, Y., Zhao, D. (2016). Detecting Synonymous Predicates from Online Encyclopedia with Rich Features. In: Ma, S., et al. Information Retrieval Technology. AIRS 2016. Lecture Notes in Computer Science(), vol 9994. Springer, Cham. https://doi.org/10.1007/978-3-319-48051-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-48051-0_9
Published: 15 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48050-3
Online ISBN: 978-3-319-48051-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics