Abstract
RDF (Resource Description Framework) is a proposed standard for knowledge representation, with relational databases wildly adopted in RDF data management. For efficient evaluation of SPARQL queries over RDF data, the legacy query optimizer needs reconsiderations. One vital problem is how to tackle the suboptimal query plan caused by error-prone cardinality estimation. For RDF data, determine an optimal execution order before the query actually evaluated is costly, or even infeasible. In this paper, we propose ROSIE, a Runtime Optimization framework that iteratively re-optimize SPARQL query plan according to the actual cardinality derived from Incremental partial query Evaluation. By introducing an approach for heuristic-based plan generation, as well as a mechanism to detect cardinality estimation error at runtime, ROSIE relieves the problem of biased cardinality propagation in an efficient way. Extensive experiments on real and benchmark data have shown that, compared to the state-of-the-arts, ROSIE consistently outperformed on complex queries by orders of magnitude.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
RDF Specification, https://www.w3.org/RDF/.
- 2.
SPARQL 1.1 Specification, http://www.w3.org/TR/sparql11-query.
- 3.
DBpedia, http://wiki.dbpedia.org/.
- 4.
Freebase, https://www.freebase.com/.
- 5.
UniPort, ftp://ftp.uniprot.org/.
- 6.
Syntax for Query Variables. https://www.w3.org/TR/sparql11-query/#QSynVariables.
- 7.
For readability consideration, we replaced the URIs with more readable names in \(\mathcal {Q}_e\).
- 8.
RASQAL. https://github.com/dajobe/rasqal.
- 9.
We implicitly assume that a TP T has a least one match, \(|T| \ge 1\).
- 10.
- 11.
- 12.
Downloaded from http://wiki.dbpedia.org/Downloads2015-04.
- 13.
Available at https://github.com/gh-rdf3x/gh-rdf3x/.
- 14.
Available at http://grid.hust.edu.cn/triplebit/TripleBit.tar.gz.
- 15.
Available at https://github.com/openlink/virtuoso-opensource.
- 16.
Available at https://github.com/Quetzal-RDF/quetzal/.
References
Abdel Kader, R., Boncz, P., Manegold, S., van Keulen, M.: ROX: run-time optimization of XQueries. In: SIGMOD, pp. 615–626 (2009)
Bornea, M., et al.: Building an efficient RDF store over a relational database. In: SIGMOD, pp. 121–132 (2013)
Cole, R.L., Graefe, G.: Optimization of dynamic query evaluation plans. In: SIGMOD, pp. 150–160 (1994)
Erling, O., Mikhailov, I.: Virtuoso: RDF support in a native RDBMS. In: de Virgilio, R., Giunchiglia, F., Tanca, L. (eds.) Semantic Web Information Management, pp. 501–519. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04329-1_21
Gai, L., Wang, X., Wang, T.: Leveraging neighborhood summaries for efficient queries on existing relational RDF systems. In: SKG, pp. 154–158 (2017)
Gubichev, A., Neumann, T.: Exploiting the query structure for efficient join ordering in SPARQL queries. In: EDBT, pp. 439–450 (2014)
Harmouch, H., Naumann, F.: Cardinality estimation: an experimental survey. PVLDB 11(4), 499–512 (2017)
Harth, A., Hose, K., Schenkel, R.: Database techniques for linked data management. In: SIGMOD, pp. 597–600 (2012)
Ioannidis, Y.E.: The history of histograms (abridged). In: VLDB, pp. 19–30 (2003)
Kotoulas, S., Urbani, J., Boncz, P., Mika, P.: Robust runtime optimization and skew-resistant execution of analytical SPARQL queries on pig. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012. LNCS, vol. 7649, pp. 247–262. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35176-1_16
Neumann, T., Galindo-Legaria, C.A.: Taking the edge off cardinality estimation errors using incremental execution. In: BTW, pp. 73–92 (2013)
Neumann, T., Moerkotte, G.: Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In: ICDE, pp. 984–994 (2011)
Neumann, T., Weikum, G.: The RDF-3X engine for scalable management of RDF data. PVLDB 19, 91–113 (2010)
Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3), 16:1–16:45 (2009)
Pham, M., Passing, L., Erling, O., Boncz, P.A.: Deriving an emergent relational schema from RDF data. In: WWW, pp. 864–874 (2015)
Stefanoni, G., Motik, B., Kostylev, E.V.: Estimating the cardinality of conjunctive queries over RDF data using graph summarisation. In: WWW (2018)
Tsialiamanis, P., Sidirourgos, L., Fundulaki, I., Christophides, V., Boncz, P.: Heuristics-based query optimisation for SPARQL. In: EDBT, pp. 324–335 (2012)
Yuan, P., Liu, P., Wu, B., Jin, H., Zhang, W., Liu, L.: TripleBit: a fast and compact system for large scale RDF data. PVLDB 6(7), 517–528 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Gai, L., Wang, X., Wang, T. (2018). ROSIE: Runtime Optimization of SPARQL Queries over RDF Using Incremental Evaluation. In: Liu, W., Giunchiglia, F., Yang, B. (eds) Knowledge Science, Engineering and Management. KSEM 2018. Lecture Notes in Computer Science(), vol 11062. Springer, Cham. https://doi.org/10.1007/978-3-319-99247-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-99247-1_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99246-4
Online ISBN: 978-3-319-99247-1
eBook Packages: Computer ScienceComputer Science (R0)