Abstract
The recent big data movement resulted in a surge of activity on layering declarative languages on top of distributed computation platforms. In the Semantic Web realm, this surge of analytics languages was not reflected despite the significant growth in the available RDF data. Consequently, when analysing large RDF datasets, users are left with two main options: using SPARQL or using an existing non-RDF-specific big data language, both with its own limitations. The pure declarative nature of SPARQL and the high cost of evaluation can be limiting in some scenarios. On the other hand, existing big data languages are designed mainly for tabular data and, therefore, applying them to RDF data results in verbose, unreadable, and sometimes inefficient scripts. In this paper, we introduce SYRql, a dataflow language designed to process RDF data at a large scale. SYRql blends concepts from both SPARQL and existing big data languages. We formally define a closed algebra that underlies SYRql and discuss its properties and some unique optimisation opportunities this algebra provides. Furthermore, we describe an implementation that translates SYRql scripts into a series of MapReduce jobs and compare the performance to other big data processing languages.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Abiteboul, S., Quass, D., McHugh, J., Widom, J., Wiener, J.L.: The lorel query language for semistructured data. International Journal on Digital Libraries (1997)
Agrawal, R., et al.: The Claremont Report on Database Research. SIGMOD Rec. (2008)
Anyanwu, K., Sheth, A.: P-queries: enabling querying for semantic associations on the semantic web. In: WWW (2003)
Beyer, K.S., Ercegovac, V., Gemulla, R., Balmin, A., Eltabakh, M.Y., Kanne, C.-C., Özcan, F., Shekita, E.J.: Jaql: A Scripting Language for Large Scale Semistructured Data Analysis. In: PVLDB (2011)
Bizer, C., Schultz, A.: The Berlin SPARQL Benchmark. In: IJSWIS (2009)
Buneman, P., Fernandez, M., Suciu, D.: UnQL: A Query Language and Algebra for Semistructured Data Based on Structural Recursion. In: VLDB (2000)
Chamberlin, D.D., Boyce, R.F.: SEQUEL: A Structured English Query Language. In: SIGFIDET (1974)
Codd, E.F.: A Data Base Sublanguage Founded on the Relational Calculus. In: SIGFIDET (1971)
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI (2004)
Dries, A., Nijssen, S., De Raedt, L.: A Query Language for Analyzing Networks. In: CIKM (2009)
Hagedorn, S., Sattler, K.-U.: Efficient Parallel Processing of Analytical Queries on Linked Data. In: Meersman, R., Panetto, H., Dillon, T., Eder, J., Bellahsene, Z., Ritter, N., De Leenheer, P., Dou, D. (eds.) ODBASE 2013. LNCS, vol. 8185, pp. 452–469. Springer, Heidelberg (2013)
Harris, S., Seaborne, A.: SPARQL 1.1 Query Language. W3C Recommendation (March 21, 2013), http://www.w3.org/TR/sparql11-query/
Heise, A., Rheinländer, A., Leich, M., Leser, U., Naumann, F.: Meteor/Sopremo: An Extensible Query Language and Operator Model. In: BigData (2012)
Holmes, A.: Hadoop In Practice, ch. 4. Manning Publications Co. (2012)
Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL Querying of Large RDF Graphs. In: PVLDB (2011)
Li, R., Yang, D., Hu, H., Xie, J., Fu, L.: Scalable RDF Graph Querying Using Cloud Computing. J. Web Eng. (2013)
Liu, Y.A., Stoller, S.D.: Querying Complex Graphs. In: Van Hentenryck, P. (ed.) PADL 2006. LNCS, vol. 3819, pp. 199–214. Springer, Heidelberg (2005)
Maali, F., Decker, S.: Towards an RDF Analytics Language: Learning from Successful Experiences. In: COLD (2013)
Meijer, E., Beckman, B., Bierman, G.: LINQ: Reconciling Object, Relations and XML in the.NET Framework. In: SIGMOD (2006)
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: a Not-so-foreign Language for Data Processing. In: SIGMOD (2008)
Oren, E., Delbru, R., Gerke, S., Haller, A., Decker, S.: Activerdf: Object-oriented semantic web programming. In: WWW (2007)
Pérez, J., Arenas, M., Gutierrez, C.: Semantics and Complexity of SPARQL. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 30–43. Springer, Heidelberg (2006)
Pérez, J., Arenas, M., Gutierrez, C.: nSPARQL: A navigational language for RDF. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 66–81. Springer, Heidelberg (2008)
Ravindra, P., Kim, H., Anyanwu, K.: An Intermediate Algebra for Optimizing RDF Graph Pattern Matching on MapReduce. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part II. LNCS, vol. 6644, pp. 46–61. Springer, Heidelberg (2011)
Robie, J., Chamberlin, D., Dyck, M., Snelson, J.: Xquery 3.0: An XML query language (2014), http://www.w3.org/TR/xquery-30/
Ronen, R., Shmueli, O.: SoQL: A Language for Querying and Creating Data in Social Networks. In: ICDE (2009)
Martın, M.S., Gutierrez, C., Wood, P.T.: SNQL: A social networks query and transformation language. In: AMW (2011)
Sauer, C., Haerder, T.: Compilation of query languages into mapreduce. In: Datenbank-Spektrum (2013)
Schmidt, M., Meier, M., Lausen, G.: Foundations of sparql query optimization. In: ICDT (2010)
Spiewak, D., Zhao, T.: ScalaQL: Language-integrated database queries for scala. In: van den Brand, M., Gašević, D., Gray, J. (eds.) SLE 2009. LNCS, vol. 5969, pp. 154–163. Springer, Heidelberg (2010)
Sporny, M., Longley, D., Kellogg, G., Lanthaler, M., Lindström, N.: JSON-LD 1.0. W3C Recommendation (January 16, 2014)
Staab, S.: Liteq: Language integrated types, extensions and queries for rdf graphs. In: Interoperation in Complex Information Ecosystems (2013)
Stewart, R.J., Trinder, P.W., Loidl, H.-W.: Comparing High Level MapReduce Query Languages. In: Temam, O., Yew, P.-C., Zang, B. (eds.) APPT 2011. LNCS, vol. 6965, pp. 58–72. Springer, Heidelberg (2011)
Stonebraker, M., Held, G., Wong, E., Kreps, P.: The Design and Implementation of INGRES. ACM Trans. Database Syst. (1976)
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Anthony, S., Liu, H., Murthy, R.: Hive - a Petabyte Scale Data Warehouse Using Hadoop. In: ICDE (2010)
Urbani, J., Kotoulas, S., Oren, E., van Harmelen, F.: Scalable Distributed Reasoning Using MapReduce. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 634–649. Springer, Heidelberg (2009)
Wong, L.: Kleisli, a functional query system. Journal of Functional Programming (2000)
Wood, P.T.: Query Languages for Graph Databases. In: SIGMOD (2012)
Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, Ú., Gunda, P.K., Currey, J.: DryadLINQ: A System for General-purpose Distributed Data-parallel Computing Using a High-level Language. In: OSDI (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Maali, F., Ravindra, P., Anyanwu, K., Decker, S. (2014). SYRql: A Dataflow Language for Large Scale Processing of RDF Data. In: Mika, P., et al. The Semantic Web – ISWC 2014. ISWC 2014. Lecture Notes in Computer Science, vol 8796. Springer, Cham. https://doi.org/10.1007/978-3-319-11964-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-11964-9_10
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11963-2
Online ISBN: 978-3-319-11964-9
eBook Packages: Computer ScienceComputer Science (R0)