Advertisement

Predicate Invention Based RDF Data Compression

  • Man ZhuEmail author
  • Weixin Wu
  • Jeff Z. Pan
  • Jingyu Han
  • Pengfei Huang
  • Qian Liu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11341)

Abstract

RDF is a data representation format for schema-free structured information that is gaining speed in the context of semantic web, life science, and vice versa. With the continuing proliferation of structured data, demand for RDF compression is becoming increasingly important. In this study, we introduce a novel lossless compression technique for RDF datasets (triples), called PIC (Predicate Invention based Compression). By generating informative predicates and constructing effective mapping to original predicates, PIC only needs to store dramatically reduced number of triples with the newly created predicates, and restoring the original triples efficiently using the mapping. These predicates are automatically generated by a decomposable forward-backward procedure, which consequently supports very fast parallel bit computation. As a semantic compression method for structured data, besides the reduction of syntactic verbosity and data redundancy, we also invoke semantics in the RDF datasets. Experiments on various datasets show competitive results in terms of compression ratio.

Notes

Acknowledgement

This work is partially funded by the National Science Foundation of China under grant 61602260 and 61702279.

References

  1. 1.
    Álvarez García, S., Brisaboa, N.R., Fernández, J.D., Martínez-Prieto, M.A.: Compressed k2-triples for full-in-memory RDF engines. In: AMCIS (2011)Google Scholar
  2. 2.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Proceedings of the ISWC 2007/ASWC 2007, pp. 722–735 (2007)CrossRefGoogle Scholar
  3. 3.
    Fernández, J.D., Gutiérrez, C., Martínez-Prieto, M.A.: RDF compression: basic approaches. In: Proceedings of the WWW 2010, pp. 1091–1092 (2010)Google Scholar
  4. 4.
    Hammoud, M., Rabbou, D.A., Nouri, R., Beheshti, S., Sakr, S.: DREAM: distributed RDF engine with adaptive query planner and minimal communication. In: Proceedings of the VLDB 2015, pp. 654–665 (2015)CrossRefGoogle Scholar
  5. 5.
    Iannone, L., Palmisano, I., Redavid, D.: Optimizing RDF storage removing redundancies: an algorithm. In: Ali, M., Esposito, F. (eds.) IEA/AIE 2005. LNCS (LNAI), vol. 3533, pp. 732–742. Springer, Heidelberg (2005).  https://doi.org/10.1007/11504894_101CrossRefGoogle Scholar
  6. 6.
    Joshi, A.K., Hitzler, P., Dong, G.: Logical linked data compression. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 170–184. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-38288-8_12CrossRefGoogle Scholar
  7. 7.
    Neumann, T., Weikum, G.: RDF-3X: a RISC-style engine for RDF. Proc. VLDB Endow. 1(1), 647–659 (2008)CrossRefGoogle Scholar
  8. 8.
    Pan, J.Z., Pérez, J.M.G., Ren, Y., Wu, H., Wang, H., Zhu, M.: Graph pattern based RDF data compression. In: Supnithi, T., Yamaguchi, T., Pan, J.Z., Wuwongse, V., Buranarach, M. (eds.) JIST 2014. LNCS, vol. 8943, pp. 239–256. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-15615-6_18CrossRefGoogle Scholar
  9. 9.
    Vrandečić, D.: Wikidata: a new platform for collaborative data collection. In: Proceedings of the WWW 2012, pp. 1063–1064 (2012)Google Scholar
  10. 10.
    Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: Proceedings of the SIGMOD 2012, pp. 481–492 (2012)Google Scholar
  11. 11.
    Yuan, P., Liu, P., Wu, B., Jin, H., Zhang, W., Liu, L.: TripleBit: a fast and compact system for large scale RDF data. PVLDB 6(7), 517–528 (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Man Zhu
    • 1
    Email author
  • Weixin Wu
    • 1
  • Jeff Z. Pan
    • 2
  • Jingyu Han
    • 1
  • Pengfei Huang
    • 3
  • Qian Liu
    • 1
  1. 1.School of Computer ScienceNanjing University of Posts and TelecommunicationsNanjingChina
  2. 2.Department of Computing ScienceUniversity of AberdeenAberdeenUK
  3. 3.College of Electronic and Information EngineeringNanjing University of Aeronautics and AstronauticsNanjingChina

Personalised recommendations