Abstract
Recently, there is an increasing interest in extracting or mining type information from Web sources. Type information stating that an instance is of a certain type is an important component of knowledge bases. Although there has been some work on obtaining type information, most of current techniques are either language-dependent or to generate one or more general types for a given instance because of type sparseness. In this paper, we present a novel approach for mining type information from Chinese online encyclopedias. More precisely, we mine type information from abstracts, infoboxes and categories of article pages in Chinese encyclopedia Web sites. In particular, most of the generated Chinese type information is inferred from categories of article pages through an attribute propagation algorithm and a graph-based random walk method. We conduct experiments over Chinese encyclopedia Web sites: Baidu Baike, Hudong Baike and Chinese Wikipedia. Experimental results show that our approach can generate large scale and high-quality Chinese type information with types of appropriate granularity.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: Dbpedia-a crystallization point for the web of data. Web Semantics: Science, Services and Agents on the World Wide Web 7(3), 154–165 (2009)
Brown, L.D., Cai, T.T., DasGupta, A.: Interval estimation for a binomial proportion. Statistical Science, 101–117 (2001)
Cimiano, P., Handschuh, S., Staab, S.: Towards the self-annotating web. In: Proceedings of the 13th International Conference on World Wide Web (WWW 2004), pp. 462–471 (2004)
Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-scale information extraction in knowitall: (preliminary results). In: Proceedings of the 13th International Conference on World Wide Web (WWW 2004), pp. 100–110 (2004)
Fang, Y., Si, L., Somasundaram, N., Al-Ansari, S., Yu, Z., Xian, Y.: Purdue at trec 2010 entity track: a probabilistic framework for matching types between candidate and target entities. In: Proceedings of the 18th Text REtrieval Conference (TREC 2010) (2010)
Fellbaum, C. (ed.): WordNet: An electronic lexical database. MIT Press, Cambridge (1998)
Gangemi, A., Nuzzolese, A.G., Presutti, V., Draicchio, F., Musetti, A., Ciancarini, P.: Automatic typing of DBpedia entities. In: Cudré-Mauroux, P., Heflin, J., Sirin, E., Tudorache, T., Euzenat, J., Hauswirth, M., Parreira, J.X., Hendler, J., Schreiber, G., Bernstein, A., Blomqvist, E. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 65–81. Springer, Heidelberg (2012)
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conference on Computational Linguistics (COLING 1992), pp. 539–545 (1992)
Hepp, M.: GoodRelations: An ontology for describing products and services offers on the web. In: Gangemi, A., Euzenat, J. (eds.) EKAW 2008. LNCS (LNAI), vol. 5268, pp. 329–346. Springer, Heidelberg (2008)
Kalyanpur, A., Murdock, J.W., Fan, J., Welty, C.: Leveraging community-built knowledge for type coercion in question answering. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part II. LNCS, vol. 7032, pp. 144–156. Springer, Heidelberg (2011)
Lee, T., Chun, J., Shim, J., Lee, S.G.: An ontology-based product recommender system for b2b marketplaces. International Journal of Electronic Commerce 11(2), 125–155 (2006)
Navigli, R., Ponzetto, S.P.: Babelnet: Building a very large multilingual semantic network. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), pp. 216–225 (2010)
Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y.: Zhishi.me - weaving Chinese linking open data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part II. LNCS, vol. 7032, pp. 205–220. Springer, Heidelberg (2011)
Nuzzolese, A.G., Gangemi, A., Presutti, V., Ciancarini, P.: Type inference through the analysis of wikipedia links. In: Proceedings of WWW 2012 Workshop on Linked Data on the Web (LDOW 2012) (2012)
Paulheim, H., Bizer, C.: Type inference on noisy RDF data. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 510–525. Springer, Heidelberg (2013)
Presutti, V., Draicchio, F., Gangemi, A.: Knowledge extraction based on discourse representation theory and linguistic frames. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 114–129. Springer, Heidelberg (2012)
Qiu, X., Zhang, Q., Huang, X.: Fudannlp: A toolkit for Chinese natural language processing. In: Proceedings of the 51th Annual Meeting of the Association for Computational Linguistics (ACL 2010) (2013)
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web (WWW 2007), pp. 697–706 (2007)
Tonon, A., Catasta, M., Demartini, G., Cudré-Mauroux, P., Aberer, K.: TRank: Ranking entity types using the web of data. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 640–656. Springer, Heidelberg (2013)
Wang, H., Wu, T., Qi, G., Ruan, T.: On publishing chinese linked open schema. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C., Vrandečić, D., Groth, P., Noy, N., Janowicz, K., Goble, C. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 293–308. Springer, Heidelberg (2014)
Welty, C., Murdock, J.W., Kalyanpur, A., Fan, J.: A comparison of hard filters and soft evidence for answer typing in Watson. In: Cudré-Mauroux, P., Heflin, J., Sirin, E., Tudorache, T., Euzenat, J., Hauswirth, M., Parreira, J.X., Hendler, J., Schreiber, G., Bernstein, A., Blomqvist, E. (eds.) ISWC 2012, Part II. LNCS, vol. 7650, pp. 243–256. Springer, Heidelberg (2012)
Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD 2012), pp. 481–492 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Wu, T., Ling, S., Qi, G., Wang, H. (2015). Mining Type Information from Chinese Online Encyclopedias. In: Supnithi, T., Yamaguchi, T., Pan, J., Wuwongse, V., Buranarach, M. (eds) Semantic Technology. JIST 2014. Lecture Notes in Computer Science(), vol 8943. Springer, Cham. https://doi.org/10.1007/978-3-319-15615-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-15615-6_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-15614-9
Online ISBN: 978-3-319-15615-6
eBook Packages: Computer ScienceComputer Science (R0)