SparkRDF: In-Memory Distributed RDF Management Framework for Large-Scale Social Data

  • Zhichao Xu
  • Wei ChenEmail author
  • Lei Gai
  • Tengjiao Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9098)


Considering the scalability and semantic requirements, Resource Description Framework (RDF) and the de-facto query language SPARQL are well suited for managing and querying online social network (OSN) data. Despite some existing works have introduced distributed framework for querying large-scale data, how to improve online query performance is still a challenging task. To address this problem, this paper proposes a scalable RDF data framework, which uses key-value store for offline RDF storage and pipelined in-memory based query strategy. The proposed framework efficiently supports SPARQL Basic Graph Pattern (BGP) queries on large-scale datasets. Experiments on the benchmark dataset demonstrate that the online SPARQL query performance of our framework outperforms existing distributed RDF solutions.


RDF SPARQL Social networks Query processing 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
    SPARQL Query Language for RDF.
  5. 5.
    Neumann, T., Weikum, G.: RDF-3X: A RISC-Style Engine for RDF. Proceedings of the VLDB Endowment 1(1), 647–659 (2008)CrossRefGoogle Scholar
  6. 6.
    Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. In: PVLDB, pp. 1008–1019 (2008)Google Scholar
  7. 7.
  8. 8.
    Husain, M., McGlothlin, J., Masud, M., Khan, L., Thuraisingham, B.: Heuristics-Based Querying Processing for Large RDF Graphs Using Cloud Computing. IEEE Transactions on Knowledge and Data Engineering 23, 1312–1327 (2011)CrossRefGoogle Scholar
  9. 9.
    Myung, J., Yeon, J., Lee, S.: SPARQL basic graph pattern processing with iterative MapReduce. In: Proceedings of MDAC, pp. 6:1–6:6 (2010)Google Scholar
  10. 10.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on OSDI, vol. 6, p. 10 (2004)Google Scholar
  11. 11.
    Kellerman, J.: HBase: Structured storage of sparse data for hadoop (2009).
  12. 12.
    Zaharia, M., Chowdhury, M., Franklin, M., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (2010)Google Scholar
  13. 13.
  14. 14.
    Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on NSDI (2012)Google Scholar
  15. 15.
  16. 16.
    Atre, M., Srinivasan, J., Hendler, J.: BitMat: a main-memory bit matrix of RDF triples for conjunctive triple pattern queries. In: ISWC (2008)Google Scholar
  17. 17.
    Erling, O., Mikhailov, I.: Virtuoso: RDF support in a native RDBMS. In: Semantic Web Information Management, pp. 501–519 (2009)Google Scholar
  18. 18.
    Papailiou, N., Konstantinou, I., Tsoumakos, D., Koziris, N.: H2RDF: adaptive query processing on RDF data in the cloud. In: Proc. of WWW, pp. 397–400 (2012)Google Scholar
  19. 19.
    Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for web scale RDF data. In: PVLDB, pp. 265–276. VLDB Endowment (2013)Google Scholar
  20. 20.
    Chang, F., Dean, J., Ghemawat, S., Hsieh, W., Wallach, D., Burrows, M., Chandra, T., Fikes, A., Gruber, R.: Bigtable: a distributed storage system for structured data. In: Proceedings of the 7th USENIX Symposium on OSDI, pp. 305–314 (2006)Google Scholar
  21. 21.
    Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. J. Web Semantics 3, 158–182 (2005)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Zhichao Xu
    • 1
    • 2
  • Wei Chen
    • 1
    • 2
    Email author
  • Lei Gai
    • 1
    • 2
  • Tengjiao Wang
    • 1
    • 2
  1. 1.Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of EducationBeijingChina
  2. 2.School of Electronics Engineering and Computer SciencePeking UniversityBeijingChina

Personalised recommendations