Skip to main content

Predicate Invention Based RDF Data Compression

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11341))

Abstract

RDF is a data representation format for schema-free structured information that is gaining speed in the context of semantic web, life science, and vice versa. With the continuing proliferation of structured data, demand for RDF compression is becoming increasingly important. In this study, we introduce a novel lossless compression technique for RDF datasets (triples), called PIC (Predicate Invention based Compression). By generating informative predicates and constructing effective mapping to original predicates, PIC only needs to store dramatically reduced number of triples with the newly created predicates, and restoring the original triples efficiently using the mapping. These predicates are automatically generated by a decomposable forward-backward procedure, which consequently supports very fast parallel bit computation. As a semantic compression method for structured data, besides the reduction of syntactic verbosity and data redundancy, we also invoke semantics in the RDF datasets. Experiments on various datasets show competitive results in terms of compression ratio.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.bzip.org.

  2. 2.

    http://www.7-zip.org/.

References

  1. Álvarez García, S., Brisaboa, N.R., Fernández, J.D., Martínez-Prieto, M.A.: Compressed k2-triples for full-in-memory RDF engines. In: AMCIS (2011)

    Google Scholar 

  2. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Proceedings of the ISWC 2007/ASWC 2007, pp. 722–735 (2007)

    Chapter  Google Scholar 

  3. Fernández, J.D., Gutiérrez, C., Martínez-Prieto, M.A.: RDF compression: basic approaches. In: Proceedings of the WWW 2010, pp. 1091–1092 (2010)

    Google Scholar 

  4. Hammoud, M., Rabbou, D.A., Nouri, R., Beheshti, S., Sakr, S.: DREAM: distributed RDF engine with adaptive query planner and minimal communication. In: Proceedings of the VLDB 2015, pp. 654–665 (2015)

    Article  Google Scholar 

  5. Iannone, L., Palmisano, I., Redavid, D.: Optimizing RDF storage removing redundancies: an algorithm. In: Ali, M., Esposito, F. (eds.) IEA/AIE 2005. LNCS (LNAI), vol. 3533, pp. 732–742. Springer, Heidelberg (2005). https://doi.org/10.1007/11504894_101

    Chapter  Google Scholar 

  6. Joshi, A.K., Hitzler, P., Dong, G.: Logical linked data compression. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 170–184. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38288-8_12

    Chapter  Google Scholar 

  7. Neumann, T., Weikum, G.: RDF-3X: a RISC-style engine for RDF. Proc. VLDB Endow. 1(1), 647–659 (2008)

    Article  Google Scholar 

  8. Pan, J.Z., Pérez, J.M.G., Ren, Y., Wu, H., Wang, H., Zhu, M.: Graph pattern based RDF data compression. In: Supnithi, T., Yamaguchi, T., Pan, J.Z., Wuwongse, V., Buranarach, M. (eds.) JIST 2014. LNCS, vol. 8943, pp. 239–256. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15615-6_18

    Chapter  Google Scholar 

  9. Vrandečić, D.: Wikidata: a new platform for collaborative data collection. In: Proceedings of the WWW 2012, pp. 1063–1064 (2012)

    Google Scholar 

  10. Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: Proceedings of the SIGMOD 2012, pp. 481–492 (2012)

    Google Scholar 

  11. Yuan, P., Liu, P., Wu, B., Jin, H., Zhang, W., Liu, L.: TripleBit: a fast and compact system for large scale RDF data. PVLDB 6(7), 517–528 (2013)

    Google Scholar 

Download references

Acknowledgement

This work is partially funded by the National Science Foundation of China under grant 61602260 and 61702279.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Man Zhu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhu, M., Wu, W., Pan, J.Z., Han, J., Huang, P., Liu, Q. (2018). Predicate Invention Based RDF Data Compression. In: Ichise, R., Lecue, F., Kawamura, T., Zhao, D., Muggleton, S., Kozaki, K. (eds) Semantic Technology. JIST 2018. Lecture Notes in Computer Science(), vol 11341. Springer, Cham. https://doi.org/10.1007/978-3-030-04284-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04284-4_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04283-7

  • Online ISBN: 978-3-030-04284-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics