Abstract
The Linked Open Data (LOD) cloud brings together information described in RDF and stored on the web in (possibly distributed) RDF Knowledge Bases (KBs). The data in these KBs are not necessarily described by a known schema and many times it is extremely time consuming to query all the interlinked KBs in order to acquire the necessary information. But even when the KB schema is known, we need actually to know which parts of the schema are used. We solve this problem by summarizing large RDF KBs using top-K approximate RDF graph patterns, which we transform to an RDF schema that describes the contents of the KB. This schema describes accurately the KB, even more accurately than an existing schema because it describes the actually used schema, which corresponds to the existing data. We add information on the number of various instances of the patterns, thus allowing the query to estimate the expected results. That way we can then query the RDF graph summary to identify whether the necessary information is present and if it is present in significant numbers whether to be included in a federated query result.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Adler, M., Mitzenmacher, M.: Towards compressing web graphs. In: 2001 Proceedings Data Compression Conference, DCC 2001, pp. 203–212. IEEE (2001)
Aggarwal, C.C., Wang, H.: Managing and Mining Graph Data, vol. 40. Springer, New York (2010)
Alzogbi, A., Lausen, G.: Similar structures inside rdf-graphs. In: LDOW (2013)
Campinas, S., Perry, T.E., Ceccarelli, D., Delbru, R., Tummarello, G.: Introducing rdf graph summary with application to assisted sparql formulation. In: 2012 23rd International Workshop on Database and Expert Systems Applications (DEXA), pp. 261–266. IEEE (2012)
Goasdoué, F., Manolescu, I.: Query-oriented summarization of rdf graphs. Proc. VLDB Endowment 8(12) (2015)
Khatchadourian, S., Consens, M.P.: ExpLOD: summary-based exploration of interlinking and RDF usage in the linked open data cloud. In: Aroyo, L., Antoniou, G., Hyvönen, E., Teije, A., Stuckenschmidt, H., Cabral, L., Tudorache, T. (eds.) ESWC 2010, Part II. LNCS, vol. 6089, pp. 272–287. Springer, Heidelberg (2010)
Khatchadourian, S., Consens, M.P.: Exploring rdf usage and interlinking in the linked open data cloud using explod. In: LDOW (2010)
Khatchadourian, S., Consens, M.P.: Understanding billions of triples with usage summaries. In: Semantic Web Challenge (2011)
Konrath, M., Gottron, T., Scherp, A.: Schemex-web-scale indexed schema extraction of linked open data. In: Semantic Web Challenge, Submission to the Billion Triple Track, pp. 52–58 (2011)
Konrath, M., Gottron, T., Staab, S., Scherp, A.: Schemex-efficient construction of a data catalogue by stream-based indexing of linked data. Web Seman. Sci. Serv. Agents World Wide Web 16, 52–58 (2012)
Louati, A., Aufaure, M.-A., Lechevallier, Y., Chatenay-Malabry, F.: Graph aggregation: application to social networks. In: HDSDA, pp. 157–177 (2011)
Lucchese, C., Orlando, S., Perego, R.: Mining top-k patterns from binary datasets in presence of noise. In: SDM, pp. 165–176. SIAM (2010)
Lucchese, C., Orlando, S., Perego, R.: A unifying framework for mining approximate top-k binary patterns. IEEE Trans. Knowl. Data Eng. 26, 2900–2913 (2014)
Lucchese, C., Orlando, S., Perego, R.: Supervised evaluation of top-k itemset mining algorithms. In: Madria, S., Hara, T. (eds.) DaWaK 2015. LNCS, vol. 9263, pp. 82–94. Springer, Heidelberg (2015)
Miettinen, P., Mielikainen, T., Gionis, A., Das, G., Mannila, H.: The discrete basis problem. IEEE Trans. Knowl. Data Eng. 20(10), 1348–1362 (2008)
Miettinen, P., Vreeken, J.: Model order selection for boolean matrix factorization. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 51–59 (2011)
Navlakha, S., Rastogi, R., Shrivastava, N.: Graph summarization with bounded error. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 419–432. ACM (2008)
Raghavan, S., Garcia-Molina, H.: Representing web graphs. In: 2003 Proceedings of 19th International Conference on Data Engineering, pp. 405–416. IEEE (2003)
Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)
Schätzle, A., Neu, A., Lausen, G., Przyjaciel-Zablocki, M.: Large-scale bisimulation of rdf graphs. In: Proceedings of the Fifth Workshop on Semantic Web Information Management, p. 1. ACM (2013)
Sun, Y., Kongfa, H., Zhipeng, L., Zhao, L., Chen, L.: A graph summarization algorithm based on rfid logistics. Physics Procedia 24, 1707–1714 (2012)
Tian, Y., Hankins, R.A., Patel, J.M.: Efficient aggregation for graph summarization. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 567–580. ACM (2008)
Tian, Y., Patel, J.M.: Interactive graph summarization. In: Yu, P.S., Han, J., Faloutsos, C. (eds.) Link Mining: Models, Algorithms, and Applications, pp. 389–409. Springer, New York (2010)
Toivonen, H., Zhou, F., Hartikainen, A., Hinkka, A.: Compression of weighted graphs. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 965–973. ACM (2011)
Xiang, Y., Jin, R., Fuhry, D., Feodor, F.: Dragan.: summarizing transactional databases with overlapped hyperrectangles. Data Min. Knowl. Discov. 23(2), 215–251 (2011)
Zaki, M.J., Hsiao, C.-J.: Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans. Knowl. Data Eng. 17(4), 462–478 (2005)
Zhang, H., Duan, Y., Yuan, X., Zhang, Y.: Assg: adaptive structural summary for rdf graph data. In: ISWC (2014)
Zhang, N., Tian, Y., Patel, J.M.: Discovery-driven graph summarization. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE), pp. 880–891. IEEE (2010)
Zhou, F., Toivonen, H.: Methods for network abstraction. Ph.D. Thesis, The Department of Computer Science at the University of Helsinki (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Zneika, M., Lucchese, C., Vodislav, D., Kotzinos, D. (2016). RDF Graph Summarization Based on Approximate Patterns. In: Grant, E., Kotzinos, D., Laurent, D., Spyratos, N., Tanaka, Y. (eds) Information Search, Integration, and Personalization. ISIP 2015. Communications in Computer and Information Science, vol 622. Springer, Cham. https://doi.org/10.1007/978-3-319-43862-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-43862-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43861-0
Online ISBN: 978-3-319-43862-7
eBook Packages: Computer ScienceComputer Science (R0)