Logical Linked Data Compression

  • Amit Krishna Joshi
  • Pascal Hitzler
  • Guozhu Dong
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7882)

Abstract

Linked data has experienced accelerated growth in recent years. With the continuing proliferation of structured data, demand for RDF compression is becoming increasingly important. In this study, we introduce a novel lossless compression technique for RDF datasets, called Rule Based Compression (RB Compression) that compresses datasets by generating a set of new logical rules from the dataset and removing triples that can be inferred from these rules. Unlike other compression techniques, our approach not only takes advantage of syntactic verbosity and data redundancy but also utilizes semantic associations present in the RDF graph. Depending on the nature of the dataset, our system is able to prune more than 50% of the original triples without affecting data integrity.

References

  1. 1.
    Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, SIGMOD 1993, pp. 207–216. ACM (1993)Google Scholar
  2. 2.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB 1994, pp. 487–499. Morgan Kaufmann Publishers Inc. (1994)Google Scholar
  3. 3.
    Álvarez-García, S., Brisaboa, N.R., Fernández, J.D., Martínez-Prieto, M.A.: Compressed k2-triples for full-in-memory RDF engines. In: AMCIS (2011)Google Scholar
  4. 4.
    Buehrer, G., Chellapilla, K.: A scalable pattern mining approach to web graph compression with communities. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, WSDM 2008, pp. 95–106. ACM (2008)Google Scholar
  5. 5.
    Fernández, J.D., Gutierrez, C., Martínez-Prieto, M.A.: RDF compression: Basic approaches. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 1091–1092. ACM (2010)Google Scholar
  6. 6.
    Fernández, J.D., Martínez-Prieto, M.A., Gutierrez, C.: Compact representation of large RDF data sets for publishing and exchange. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 193–208. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  7. 7.
    Goethals, B.: Survey on frequent pattern mining. Tech. rep. (2003)Google Scholar
  8. 8.
    Guo, Y., Pan, Z., Heflin, J.: Lubm: A benchmark for owl knowledge base systems. Journal of Web Semantics 3(2-3), 158–182 (2005)CrossRefGoogle Scholar
  9. 9.
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, pp. 1–12. ACM (2000)Google Scholar
  10. 10.
    Hitzler, P., Krötzsch, M., Parsia, B., Patel-Schneider, P.F., Rudolph, S. (eds.): OWL 2 Web Ontology Language: Primer. W3C Recommendation (October 27, 2009), http://www.w3.org/TR/owl2-primer/
  11. 11.
    Hitzler, P., Krötzsch, M., Rudolph, S.: Foundations of Semantic Web Technologies. Chapman & Hall/CRC (2009)Google Scholar
  12. 12.
    Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011)Google Scholar
  13. 13.
    Iannone, L., Palmisano, I., Redavid, D.: Optimizing RDF storage removing redundancies: An Algorithm. In: Ali, M., Esposito, F. (eds.) IEA/AIE 2005. LNCS (LNAI), vol. 3533, pp. 732–742. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  14. 14.
    Joshi, A.K., Hitzler, P., Dong, G.: Towards logical linked data compression. In: Proceedings of the Joint Workshop on Large and Heterogeneous Data and Quantitative Formalization in the Semantic Web, LHD+SemQuant 2012, at the 11th International Semantic Web Conference, ISWC 2012 (2012)Google Scholar
  15. 15.
    Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: PFP: Parallel FP-Growth for query recommendation. In: Proceedings of the 2008 ACM Conference on Recommender Systems, RecSys 2008, pp. 107–114. ACM (2008)Google Scholar
  16. 16.
    Li, Q., Feng, L., Wong, A.K.Y.: From intra-transaction to generalized inter-transaction: Landscaping multidimensional contexts in association rule mining. Inf. Sci. 172(3-4), 361–395 (2005)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Lu, H., Feng, L., Han, J.: Beyond intratransaction association analysis: mining multidimensional intertransaction association rules. ACM Trans. Inf. Syst. 18(4), 423–454 (2000)CrossRefGoogle Scholar
  18. 18.
    Manola, F., Miller, E., McBride, B.: RDF primer (2004), http://www.w3.org/TR/rdf-primer/
  19. 19.
    Meier, M.: Towards rule-based minimization of RDF graphs under constraints. In: Calvanese, D., Lausen, G. (eds.) RR 2008. LNCS, vol. 5341, pp. 89–103. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  20. 20.
    Özdogan, G.Ö., Abul, O.: Task-parallel FP-growth on cluster computers. In: Gelenbe, E., Lent, R., Sakellari, G., Sacan, A., Toroslu, H., Yazici, A. (eds.) Computer and Information Sciences. LNEE, vol. 62, pp. 383–388. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  21. 21.
    Pichler, R., Polleres, A., Skritek, S., Woltran, S.: Redundancy elimination on RDF graphs in the presence of rules, constraints, and queries. In: Hitzler, P., Lukasiewicz, T. (eds.) RR 2010. LNCS, vol. 6333, pp. 133–148. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  22. 22.
    Savasere, A., Omiecinski, E., Navathe, S.B.: An efficient algorithm for mining association rules in large databases. In: Proceedings of the 21st International Conference on Very Large Data Bases, VLDB 1995, pp. 432–444. Morgan Kaufmann Publishers Inc. (1995)Google Scholar
  23. 23.
    Srikant, R., Vu, Q., Agrawal, R.: Mining association rules with item constraints. In: KDD, pp. 67–73 (1997)Google Scholar
  24. 24.
    Urbani, J., Maassen, J., Drost, N., Seinstra, F.J., Bal, H.E.: Scalable RDF data compression with MapReduce. Concurrency and Computation: Practice and Experience 25(1), 24–39 (2013)CrossRefGoogle Scholar
  25. 25.
    Völker, J., Niepert, M.: Statistical schema induction. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 124–138. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  26. 26.
    Zaïane, O.R., El-Hajj, M., Lu, P.: Fast parallel association rule mining without candidacy generation. In: Proceedings of the 2001 IEEE International Conference on Data Mining, ICDM 2001, pp. 665–668. IEEE Computer Society (2001)Google Scholar
  27. 27.
    Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: KDD, pp. 283–286 (1997)Google Scholar
  28. 28.
    Zhang, H., Zhang, B.: Generalized association rule mining algorithms based on multidimensional data. In: Xu, L.D., Min Tjoa, A., Chaudhry, S.S. (eds.) CONFENIS 2007. IFIP, vol. 254, pp. 337–342. Springer, Boston (2007)Google Scholar
  29. 29.
    Zhou, A., Zhou, S., Jin, W., Tian, Z.: Generalized multidimensional association rules. J. Comput. Sci. Technol. 15(4), 388–392 (2000)MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Amit Krishna Joshi
    • 1
  • Pascal Hitzler
    • 1
  • Guozhu Dong
    • 1
  1. 1.Kno.e.sis CenterWright State UniversityDaytonU.S.A.

Personalised recommendations