Abstract
The influence of inaccurate knowledge still exists in the Semantic Web. The problem of knowledge inaccuracy in Knowledge Bases (KBs) is one of the largest obstacles that limit the development of Linked Open Data (LOD) and Knowledge Graphs (KGs). To solve the semantic ambiguity and improper classification of knowledge triples in the process of constructing Chinese online encyclopedia KBs, first, a new TF-AICL algorithm is proposed to calculate the concentration level of predicates in each top-category. Second, the predicate which can best represent the features of a top-category is selected, and the related predicate candidate set is extracted. Third, based on the positive and negative examples counting strategy, the predicate candidate set is used as the comparison group to filter each entity. Finally, based on the TF-AICL algorithm, this paper proposes a new iterative filtering method called IFTA. IFTA adopts a new predicate feature extraction method, TF-AICL, which considers the hierarchical features of the predicate. In addition, IFTA can automatically prune, filter and refine large-scale online encyclopedia knowledge in an iterative way. The precision, recall and F-measure results on the BaiduBaike and Hudong datasets indicate that the refining effects on open-domain Chinese encyclopedia KBs by the IFTA method outperform the state-of-the-art methods.
This is a preview of subscription content, access via your institution.

















References
- 1.
Berners-Lee T, Hendler J, Lassila O (2001) The semantic web. Sci Amer 284(5):34–43
- 2.
Wu F, Weld D S (2007) Autonomously semantifying wikipedia. In: Proceedings of the 2007 ACM Conference on Information and Knowledge Management. ACM, New York, p 41
- 3.
Wu F, Weld D S (2008) Automatically refining the wikipedia infobox ontology. In: Proceedings of the 17th International Conference on World Wide Web. ACM, New York, p 635
- 4.
Suchanek F M, Kasneci G, Weikum G (2008) Yago: A large ontology from wikipedia and wordnet. J Web Semant 6(3):203–217. https://doi.org/10.1016/j.websem.2008.06.001
- 5.
Hoffart J, Suchanek F M, Berberich K, Weikum G (2013) Yago2: A spatially and temporally enhanced knowledge base from wikipedia. Artif Intell 194:28–61. https://doi.org/10.1016/j.artint.2012.06.001
- 6.
Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009) Dbpedia - a crystallization point for the web of data. J Web Semant 7(3):154–165. https://doi.org/10.1016/j.websem.2009.07.002
- 7.
Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes P N, Hellmann S, Morsey M, van Kleef P, Auer S, Bizer C (2015) Dbpedia – a large-scale, multilingual knowledge base extracted from wikipedia. Semant Web 6(2):167–195. https://doi.org/10.3233/SW-140134
- 8.
Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD 2008 : proceedings of the ACM SIGMOD international conference on management of data. ACM, Vancouver, pp 1247–1250
- 9.
Bing L, Lam W, Wong T-L (2013) Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning. In: Proceedings of the 6th ACM international conference on web search and data mining, WSDM 2013. ACM, New York, p 567
- 10.
Romadhony A, Widyantoro D H, Purwarianti A (2019) Utilizing structured knowledge bases in open ie based event template extraction. Appl Intell 49(1):206–219. https://doi.org/10.1007/s10489-018-1269-0
- 11.
Dong X, Gabrilovich E, Heitz G, Horn W, Lao N, Murphy K, Strohmann T, Sun S, Zhang W (2014) Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 601–610
- 12.
Zhang F, Ma Z M, Tong Q, Cheng J (2018) Storing fuzzy description logic ontology knowledge bases in fuzzy relational databases. Appl Intell 48(1):220–242. https://doi.org/10.1007/s10489-017-0965-5
- 13.
Huang Y, Wang Z (2017) Knowledge base completion by learning to rank model. In: Knowledge graph and semantic computing. Language, knowledge, and intelligence, communications in computer and information science. Springer, pp 1–6
- 14.
Gardner M, Mitchell T (2015) Efficient and expressive knowledge base completion using subgraph feature extraction. In: Proceedings of the 2015 conference on empirical methods in natural language processing. Association for computational linguistics, Stroudsburg, pp 1488–1498
- 15.
Chen Y, Chen L, Xu K (2012) Learning chinese entity attributes from online encyclopedia. In: Web technologies and applications, lecture notes in computer science, vol 7234. Springer Nature, Berlin, pp 179–186
- 16.
Ting W, Fujun J, Tiansheng X (2016) A novel knowledge extraction approach oriented on unstructured information of chinese online encyclopedia. Library and Information Service
- 17.
Wang Z, Wang Z, Li J, Pan J Z (2012) Building a large scale knowledge base from chinese wiki encyclopedia. In: Semantic web, lecture notes in computer science, vol 7185. Springer Nature, Berlin, pp 80–95
- 18.
Li J, Wang C, He X, Zhang R, Gao M (2015) User generated content oriented chinese taxonomy construction. In: Web Ttechnologies and applications: 17th Asia-PacificWeb conference, APWeb 2015, Guangzhou, proceedings, lecture notes in computer science, vol 9313. Springer International Publishing, Cham, pp 623–634
- 19.
Wang X, Jiang L, Shi H, Feng Z, Du P (2012) Jingwei+: A distributed large-scale rdf data server. In: Web technologies and applications, lecture notes in computer science, vol 7235. Springer Nature, Berlin, pp 779–783
- 20.
Fu Y, Wang X, Feng Z, Lv X (2015) Organization and integration of chinese encyclopedia knowledge based on semantic web. Comput Eng Appl 51(14)
- 21.
Papadakis I, Kyprianos K, Stefanidakis M (2015) Linked data uris and libraries: The story so far. D-Lib Mag 21(5/6). https://doi.org/10.1045/may2015-papadakis
- 22.
Isaac A, van der Meij L, Schlobach S, Wang S (2007) An empirical study of instance-based ontology matching. In: The semantic web, lecture notes in computer science, vol 4825. Springer, Berlin, pp 253–266
- 23.
Jain, P, Hitzler, P, Sheth, AP, Verma, K, Yeh, PZ (2010) Ontology alignment for linked open data. In: The semantic web, lecture notes in computer science. Springer, Shanghai, pp 402–417
- 24.
Volz J, Bizer C, Gaedke M, Kobilarov G (2009) Silk-a link discovery framework for the web of data. Ldow 538:53
- 25.
Volz J, Bizer C, Gaedke M, Kobilarov G (2009) Discovering and maintaining links on the web of data. In: The Semantic Web, Lecture notes in computer science. Lecture notes in artificial intelligence, vol 5823. Springer, New York, pp 650–665
- 26.
Dalton J, Dietz L, Allan J (2014) Entity query feature expansion using knowledge base links. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval. ACM, New York, pp 365–374
- 27.
Niu X, Sun X, Wang H, Rong S, Qi G, Yu Y (2011) Zhishi.me - weaving chinese linking open data. In: the semantic web, lecture notes in computer science, vol 7032. Springer, Berlin [Allemagne], pp 205–220
- 28.
Wang Z-, Wang Z-, Li J-, Pan J Z (2012) Knowledge extraction from chinese wiki encyclopedias. J Zhejiang Univ Sci C 13(4):268–280. https://doi.org/10.1631/jzus.C1101008
- 29.
Wang Z, Li J, Wang Z, Tang J (2012) Cross-lingual knowledge linking across wiki knowledge bases. In: WWW’12. Association for computing Machinery, New York, pp 459–468
- 30.
Wang X, Liu K, He S, Liu S, Zhang Y, Zhao J (2017) Multi-source knowledge bases entity alignment by leveraging semantic tags. Chin J Comput 40(3):701–711
- 31.
Xu B, Xu Y, Liang J, Xie C, Liang B, Cui W, Xiao Y (2017) Cn-dbpedia: A never-ending chinese knowledge extraction system. In: Advances in artificial intelligence, lecture notes in computer science, vol 10351. Springer, Cham, pp 428–438
- 32.
Soru T, Ngomo A-C N (2014) A comparison of supervised learning classifiers for link discovery. In: Proceedings of the 10th international conference on semantic systems. ACM, New York, pp 41–44
- 33.
Lin L, Liu J, Lv Y, Guo F (2020) A similarity model based on reinforcement local maximum connected same destination structure oriented to disordered fusion of knowledge graphs. Appl Intell 50 (9):2867–2886. https://doi.org/10.1007/s10489-020-01673-9
- 34.
Malaviya C, Bhagavatula C, Bosselut A, Choi Y (2020) Commonsense knowledge base completion with structural and semantic context. In: Proceedings of the 30th AAAI conference on artificial intelligence
- 35.
Jin H, Li C, Zhang J, Hou L, Li J, Zhang P (2019) Xlore2: Large-scale cross-lingual knowledge graph construction and application. Data Intell 1(1):77–98. https://doi.org/10.1162/dint_a_00003
- 36.
Bordes A, Weston J, Collobert R, Bengio Y (2011) Learning structured embeddings of knowledge bases. In: Proceedings of the 25th AAAI conference on artificial intelligence, AAAI’11. AAAI Press, pp 301–306
- 37.
Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems, vol 26. Curran Associates, Inc, pp 2787–2795
- 38.
Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the 28th AAAI conference on artificial intelligence, AAAI’14. AAAI Press, pp 1112–1119
- 39.
Lin Y, Liu Z, Sun M, Liu Y, Zhu X (2015) Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of the 29th AAAI conference on artificial intelligence, 2181–2187
- 40.
Wang Z, Li J (2016) Text-enhanced representation learning for knowledge graph. In: Proceedings of the 25th international joint conference on artificial intelligence, IJCAI’16. AAAI Press, pp 1293–1299
- 41.
He S, Liu K, Ji G, Zhao J (2015) Learning to represent knowledge graphs with gaussian embedding. In: Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, pp 623–632
- 42.
Xiao H, Huang M, Zhu X (2016) Transg: A generative model for knowledge graph embedding. In: Proceedings of the 54th annual meeting of the association for computational linguistics, vol 1, pp 2316–2325
- 43.
Nickel M, Rosasco L, Poggio T Holographic embeddings of knowledge graphs
- 44.
Nickel M, Murphy K, Tresp V, Gabrilovich E (2016) A review of relational machine learning for knowledge graphs. Proc IEEE 104(1):11–33. https://doi.org/10.1109/JPROC.2015.2483592
- 45.
Xiong C, Power R, Callan J (2017) Explicit semantic ranking for academic search via knowledge graph embedding. In: Proceedings of the 26th international conference on World Wide Web. International World Wide Web conferences steering committee, pp 1271–1279
- 46.
Zhou Z, Xu G, Zhu W, Li J, Zhang W (5/14/2017–5/19/2017) Structure embedding for knowledge base completion and analytics. In: 2017 international joint conference on neural networks (IJCNN). IEEE, pp 737–743
- 47.
He T, Gao L, Song J, Wang X, Huang K, Li Y (2020) Sneq: Semi-supervised attributed network embedding with attention-based quantisation. In: Proceedings of the 34th international joint conference on artificial intelligence, pp 4091–4098
- 48.
Lin Y, Liu Z, Luan H, Sun M, Rao S, Liu S (2015) Modeling relation paths for representation learning of knowledge bases. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 705–714
- 49.
Liu F, Shen Y, Zhang T, Gao H (2020) Entity-related paths modeling for knowledge base completion. Front Comput Sci 14(5). https://doi.org/10.1007/s11704-019-8264-4
- 50.
Socher R, Chen D, Manning C D, Ng A (2013) Reasoning with neural tensor networks for knowledge base completion. In: Advances in neural information processing systems, vol 26. Curran Associates, Inc, pp 926–934
- 51.
Schlichtkrull M, Kipf T N, Bloem P, van den Berg R, Titov I, Welling M (2018) Modeling relational data with graph convolutional networks. In: The semantic web on 15th international conference on extended semantic web conference, Lecture Notes in Computer Science, vol 10843. Springer international PU, pp 593–607
- 52.
Vashishth S, Sanyal S, Nitin V, Agrawal N, Talukdar P (2020) Interacte: Improving convolution-based knowledge graph embeddings by increasing feature interactions. In: Proceedings of the 30th AAAI conference on artificial intelligence
- 53.
Chen X, Jia S, Ding L, Shen H, Xiang Y (2020) Sdt: An integrated model for open-world knowledge graph reasoning. Expert Syst Appl 162:113889. https://doi.org/10.1016/j.eswa.2020.113889
- 54.
Che F, Zhang D, Tao J, Niu M, Zhao B (2020) Parame: Regarding neural network parameters as relation embeddings for knowledge graph completion. In: AAAI, pp 2774–2781
- 55.
Nizzoli L, Avvenuti M, Tesconi M, Cresci S (2020) Geo-semantic-parsing: Ai-powered geoparsing by traversing semantic knowledge graphs. Decis Support Syst 136:113346. https://doi.org/10.1016/j.dss.2020.113346
- 56.
Li Y, Du G, Xiang Y, Li S, Ma L, Shao D, Wang X, Chen H (2020) Towards chinese clinical named entity recognition by dynamic embedding using domain-specific knowledge. J Biomed Inf 106:103435. https://doi.org/10.1016/j.jbi.2020.103435
- 57.
Wang T, Gu H, Wu Z, Gao J (2020) Multi-source knowledge integration based on machine learning algorithms for domain ontology. Neural Comput Appl 32(1):235–245. https://doi.org/10.1007/s00521-018-3806-5
- 58.
Wang T, Gu H, Li J, Xie J (2019) Tritag-nfpf: Knowledge denoising for chinese encyclopedia based on triple tag-constructed potential function. IEEE Access 7:107413–107427. https://doi.org/10.1109/ACCESS.2019.2933249
- 59.
Chen K, Zhang Z, Long J, Zhang H (2016) Turning from tf-idf to tf-igm for term weighting in text classification. Expert Syst Appl 66:245–260. https://doi.org/10.1016/j.eswa.2016.09.009
- 60.
Wang Y, Zhang D, Yuan Y, Liu Q, Yang Y (2018) Improvement of tf-idf algorithm based on knowledge graph. In: 2018 IEEE 16th international conference on software engineering research, management and applications (SERA). IEEE, pp 19–24
- 61.
Jiang F, Zhang Z, Chen P, Liu Y (2018) Naive bayes text categorization algorithm based on tf-idf attribute weighting. In: Proceedings of the 2018 2nd international conference on computer science and artificial intelligence. ACM, New York , pp 521–525
- 62.
Wang T, XU T, TANG Z, TODO Y (2017) Tongsacom: A tongyicicilin and sequence alignment-based ontology mapping model for chinese linked open data. IEICE Trans Inf Syst E100.D(6):1251–1261. https://doi.org/10.1587/transinf.2016EDP7307
- 63.
Liu Q, Liu B, He M, Wu D, Liu Y, Cheng X (2016) Synonymous expansion based entity attribute extraction via online encyclopedia. In: Journal of Chinese information processing
- 64.
Wang Z, Huang Y (2019) Knowledge base completion by inference from both relational and literal facts. In: Advances in knowledge discovery and data mining, LNCS sublibrary. SL 7, Artificial intelligence, vol 11441. Springer, Cham, pp 501–513
- 65.
Galárraga L, Heitz G, Murphy K, Suchanek F M (2014) Canonicalizing open knowledge bases. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management. ACM, New York, pp 1679–1688
- 66.
Oren E, Gerke S, Decker S (2007) Simple algorithms for predicate suggestions using similarity and co-occurrence. In: Semantic Web: research and applications, lecture notes in computer science, vol 4519. Springer Nature, Berlin, pp 160–174
- 67.
Xu B, Luo Z, Huang L, Liang B, Xiao Y, Yang D, Wang W (2018) Metic: Multi-instance entity typing from corpus. In: CIKM’18, ACM, association for computing machinery, New York, pp 903–912
- 68.
Wu T, Qi G, Luo B, Zhang L, Wang H (2019) Language-independent type inference of the instances from multilingual wikipedia. Int J Semant Web Inf Syst 15(2):22–46. https://doi.org/10.4018/IJSWIS.2019040102
- 69.
Niu X, Rong S, Wang H, Yu Y (2012) An effective rule miner for instance matching in a web of data. In: CIKM’12. ACM, New York, p 1085
- 70.
Zhang X, Yang Q, Ding J, Wang Z (2020) Entity profiling in knowledge graphs. IEEE Access 8:27257–27266. https://doi.org/10.1109/ACCESS.2020.2971567
- 71.
Esuli A, Fagni T, Sebastiani F (2006) Treeboost.mh: A boosting algorithm for multi-label hierarchical text categorization. In: String processing and information retrieval, Lecture Notes in Computer Science, vol 4209. Springer, Berlin, pp 13–24
- 72.
Heß A, Kushmerick N (2004) Iterative ensemble classification for relational data: A case study of semantic web services. In: Machine learning: ECML 2004, lecture notes in computer science, vol 3201. Springer, Berlin, pp 156–167
- 73.
Melo A, Paulheim H, Völker J (2016) Type prediction in rdf knowledge bases using hierarchical multilabel classification. In: Proceedings of the 6th International Conference on Web Intelligence, Mining and Semantics, WIMS ’16. Association for Computing Machinery, New York, pp 1–10
- 74.
Wang T (2020) Knowledge base for baidubaike. Mendeley. https://data.mendeley.com/datasets/wz6zmvjzb3/1
- 75.
Wang T (2020) Knowledge base for hudong. Mendeley. https://data.mendeley.com/datasets/tm3xs3cc8x/1
Acknowledgements
This work was supported in part by the Scientific Research Project of Beijing Municipal Education Commission (General Social Science Project) [grant number SM201910038010]; National Social Science Fund of China [grant number 19BXW120]; Backup Academic Leaders Grant of Capital University of Economics and Business; and Special Fund of Fundamental Research Expenses of Beijing Municipal University of Capital University of Economics and Business.
Author information
Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, T., Guo, J., Wu, Z. et al. IFTA: Iterative filtering by using TF-AICL algorithm for Chinese encyclopedia knowledge refinement. Appl Intell (2021). https://doi.org/10.1007/s10489-021-02220-w
Accepted:
Published:
Keywords
- Knowledge base
- Online encyclopedia
- Knowledge refining
- Iterative algorithm
- Knowledge graph