Skip to main content

Framework-Based Scale-Out RDF Systems

  • Reference work entry
  • First Online:
Encyclopedia of Big Data Technologies
  • 69 Accesses

Synonyms

Hadoop-based RDF query processors; Spark-based RDF query processors

Definitions

RDF, the Resource Description Framework, has been recognized as a de facto standard to describe resources in a semi-structured manner. In particular, RDF is a graph-based format which allows to define named links between resources in the form of triples subject, predicate, object, also called statements. A statement expresses a relationship (defined by a predicate) between resources (subject and object). The relationship is always from subject to object (it is directional). The same resource can be used in multiple triples playing the same or different roles, e.g., it can be used as a subject in one triple, as well as a predicate or an object in another one. This ability enables definition of multiple connections between the triples, hence creation of a connected graph of data. Such graph can be represented as nodes that stands for the resources and edges capturing the relationships between the...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 849.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 999.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Abadi DJ, Marcus A, Madden SR, Hollenbach K (2007) Scalable semantic web data management using vertical partitioning. In: Proceedings of the 33rd international conference on very large data bases. VLDB Endowment, pp 411–422

    Google Scholar 

  • Abouzeid A, Bajda-Pawlikowski K, Abadi DJ, Rasin A, Silberschatz A (2009) Hadoopdb: an architectural hybrid of mapreduce and DBMS technologies for analytical workloads. PVLDB 2(1):922–933. http://www.vldb.org/pvldb/2/vldb09-861.pdf

    Article  Google Scholar 

  • Armbrust M, Xin RS, Lian C, Huai Y, Liu D, Bradley JK, Meng X, Kaftan T, Franklin MJ, Ghodsi A, Zaharia M (2015) Spark SQL: relational data processing in spark. In: SIGMOD. https://doi.org/10.1145/2723372.2742797

  • Bernstein PA, Chiu DMW (1981) Using semi-joins to solve relational queries. J ACM (JACM) 28(1): 25–40

    Article  MATH  Google Scholar 

  • Chen X, Chen H, Zhang N, Zhang S (2014) SparkRDF: elastic discreted RDF graph processing engine with distributed memory. In: Proceedings of the ISWC 2014 posters & demonstrations track a track within the 13th international semantic web conference, ISWC 2014, Riva del Garda, 21 Oct 2014, pp 261–264. http://ceur-ws.org/Vol-1272/paper_43.pdf

  • Chen X, Chen H, Zhang N, Zhang S (2015) SparkRDF: elastic discreted RDF graph processing engine with distributed memory. In: IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, WI-IAT 2015, Singapore, 6–9 Dec 2015, vol I, pp 292–300. https://doi.org/10.1109/WI-IAT.2015.186

  • Dean J, Ghemawa S (2004) MapReduce: simplified data processing on large clusters. In: OSDI

    Google Scholar 

  • Djahandideh B, Goasdoué F, Kaoudi Z, Manolescu I, Quiané-Ruiz J, Zampetakis S (2015) Cliquesquare in action: flat plans for massively parallel RDF queries. In: 31st IEEE international conference on data engineering, ICDE 2015, Seoul, 13–17 Apr 2015, pp 1432–1435. https://doi.org/10.1109/ICDE.2015.7113394

  • Goasdoué F, Kaoudi Z, Manolescu I, Quiané-Ruiz J, Zampetakis S (2015) Cliquesquare: flat plans for massively parallel RDF queries. In: 31st IEEE international conference on data engineering, ICDE 2015, Seoul, 13–17 Apr 2015, pp 771–782. https://doi.org/10.1109/ICDE.2015.7113332

  • Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) GraphX: graph processing in a distributed dataflow framework. In: OSDI. https:// www.usenix.org/conference/osdi14/technical-sessions/ presentation/gonzalez

  • Goodman EL, Grunwald D (2014) Using vertex-centric programming platforms to implement SPARQL queries on large graphs. In: Proceedings of the 4th workshop on irregular applications: architectures and algorithms, IA3 ’14. IEEE Press, Piscataway, pp 25–32. https://doi.org/10.1109/IA3.2014.10

    Google Scholar 

  • Huang J, Abadi DJ, Ren K (2011a) Scalable SPARQL querying of large RDF graphs. PVLDB 4(11): 1123–1134

    Google Scholar 

  • Huang J, Abadi DJ, Ren K (2011b) Scalable SPARQL querying of large RDF graphs. Proc VLDB Endow 4(11):1123–1134

    Google Scholar 

  • Husain M, McGlothlin J, Masud MM, Khan L, Thuraisingham BM (2011) Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Trans Knowl Data Eng 23(9): 1312–1327

    Article  Google Scholar 

  • Kim H, Ravindra P, Anyanwu K (2013) Optimizing RDF(S) queries on cloud platforms. In: 22nd international world wide web conference, WWW ’13, Rio de Janeiro, 13–17 May 2013, Companion volume, pp 261–264. http://dl.acm.org/citation.cfm?id=2487917

  • Lee K, Liu L (2013) Scaling queries over big RDF graphs with semantic hash partitioning. Proc VLDB Endow 6(14):1894–1905

    Article  Google Scholar 

  • Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2012) Distributed graphLab: a framework for machine learning in the cloud. PVLDB 5(8):716–727

    Google Scholar 

  • Naacke H, Curé O, Amann B (2016) SPARQL query processing with Apache spark. CoRR abs/1604.08903. http://arxiv.org/abs/1604.08903

  • Neumann T, Weikum G (2010) The RDF-3x engine for scalable management of RDF data. VLDB J 19(1): 91–113

    Article  Google Scholar 

  • Olston C, Reed B, Srivastava U, Kumar R, Tomkins A (2008) Pig latin: a not-so-foreign language for data processing. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2008, Vancouver, 10–12 June 2008, pp 1099–1110. https://doi.org/10.1145/1376616.1376726

  • Poggi A, Lembo D, Calvanese D, De Giacomo G, Lenzerini M, Rosati R (2008) Linking data to ontologies. In: Spaccapietra S (ed) Journal on data semantics X. Springer, Berlin/Heidelberg, pp 133–173

    Chapter  MATH  Google Scholar 

  • Ravindra P, Kim H, Anyanwu K (2011) An intermediate algebra for optimizing RDF graph pattern matching on mapreduce. In: The semanic web: research and applications – 8th extended semantic web conference, ESWC 2011, Heraklion, Crete, 29 May – 2 June 2011, Proceedings, Part II, pp 46–61. https://doi.org/10.1007/978-3-642-21064-8_4

    Chapter  Google Scholar 

  • Rohloff K, Schantz RE (2010) High-performance, massively scalable distributed systems using the mapreduce software framework: the shard triple-store. In: Programming support innovations for emerging distributed applications. ACM, p 4

    Google Scholar 

  • Sakr S (2016) Big data 2.0 processing systems – a survey. Springer briefs in computer science. Springer. https://doi.org/10.1007/978-3-319-38776-5

    Book  Google Scholar 

  • Sakr S, Liu A, Fayoumi AG (2013) The family of mapreduce and large-scale data processing systems. ACM Comput Surv 46(1). https://doi.org/10.1145/2522968.2522979

    Article  Google Scholar 

  • Schätzle A, Przyjaciel-Zablocki M, Hornung T, Lausen G (2013) Pigsparql: a SPARQL query processing baseline for big data. In: Proceedings of the ISWC 2013 posters & demonstrations track, Sydney, 23 Oct 2013, pp 241–244. http://ceur-ws.org/Vol-1035/iswc2013_poster_16.pdf

  • Schätzle A, Przyjaciel-Zablocki M, Berberich T, Lausen G (2015a) S2X: graph-parallel querying of RDF with GraphX. In: 1st international workshop on big-graphs online querying (Big-O(Q))

    Google Scholar 

  • Schätzle A, Przyjaciel-Zablocki M, Skilevic S, Lausen G (2015b) S2RDF: RDF querying with SPARQL on spark. CoRR abs/1512.07021. http://arxiv.org/abs/1512.07021

  • Valduriez P (1987) Join indices. ACM Trans Database Syst 12(2):218–246. https://doi.org/10.1145/22952.22955

    Article  Google Scholar 

  • Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: HotCloud

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Marcin Wylot or Sherif Sakr .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Wylot, M., Sakr, S. (2019). Framework-Based Scale-Out RDF Systems. In: Sakr, S., Zomaya, A.Y. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-77525-8_225

Download citation

Publish with us

Policies and ethics