Advertisement

ROSIE: Runtime Optimization of SPARQL Queries over RDF Using Incremental Evaluation

  • Lei Gai
  • Xiaoming Wang
  • Tengjiao Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11062)

Abstract

RDF (Resource Description Framework) is a proposed standard for knowledge representation, with relational databases wildly adopted in RDF data management. For efficient evaluation of SPARQL queries over RDF data, the legacy query optimizer needs reconsiderations. One vital problem is how to tackle the suboptimal query plan caused by error-prone cardinality estimation. For RDF data, determine an optimal execution order before the query actually evaluated is costly, or even infeasible. In this paper, we propose ROSIE, a Runtime Optimization framework that iteratively re-optimize SPARQL query plan according to the actual cardinality derived from Incremental partial query Evaluation. By introducing an approach for heuristic-based plan generation, as well as a mechanism to detect cardinality estimation error at runtime, ROSIE relieves the problem of biased cardinality propagation in an efficient way. Extensive experiments on real and benchmark data have shown that, compared to the state-of-the-arts, ROSIE consistently outperformed on complex queries by orders of magnitude.

Keywords

SPARQL RDF Query optimization Cardinality estimation Runtime optimization 

References

  1. 1.
    Abdel Kader, R., Boncz, P., Manegold, S., van Keulen, M.: ROX: run-time optimization of XQueries. In: SIGMOD, pp. 615–626 (2009)Google Scholar
  2. 2.
    Bornea, M., et al.: Building an efficient RDF store over a relational database. In: SIGMOD, pp. 121–132 (2013)Google Scholar
  3. 3.
    Cole, R.L., Graefe, G.: Optimization of dynamic query evaluation plans. In: SIGMOD, pp. 150–160 (1994)Google Scholar
  4. 4.
    Erling, O., Mikhailov, I.: Virtuoso: RDF support in a native RDBMS. In: de Virgilio, R., Giunchiglia, F., Tanca, L. (eds.) Semantic Web Information Management, pp. 501–519. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-04329-1_21CrossRefGoogle Scholar
  5. 5.
    Gai, L., Wang, X., Wang, T.: Leveraging neighborhood summaries for efficient queries on existing relational RDF systems. In: SKG, pp. 154–158 (2017)Google Scholar
  6. 6.
    Gubichev, A., Neumann, T.: Exploiting the query structure for efficient join ordering in SPARQL queries. In: EDBT, pp. 439–450 (2014)Google Scholar
  7. 7.
    Harmouch, H., Naumann, F.: Cardinality estimation: an experimental survey. PVLDB 11(4), 499–512 (2017)Google Scholar
  8. 8.
    Harth, A., Hose, K., Schenkel, R.: Database techniques for linked data management. In: SIGMOD, pp. 597–600 (2012)Google Scholar
  9. 9.
    Ioannidis, Y.E.: The history of histograms (abridged). In: VLDB, pp. 19–30 (2003)Google Scholar
  10. 10.
    Kotoulas, S., Urbani, J., Boncz, P., Mika, P.: Robust runtime optimization and skew-resistant execution of analytical SPARQL queries on pig. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012. LNCS, vol. 7649, pp. 247–262. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-35176-1_16CrossRefGoogle Scholar
  11. 11.
    Neumann, T., Galindo-Legaria, C.A.: Taking the edge off cardinality estimation errors using incremental execution. In: BTW, pp. 73–92 (2013)Google Scholar
  12. 12.
    Neumann, T., Moerkotte, G.: Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In: ICDE, pp. 984–994 (2011)Google Scholar
  13. 13.
    Neumann, T., Weikum, G.: The RDF-3X engine for scalable management of RDF data. PVLDB 19, 91–113 (2010)Google Scholar
  14. 14.
    Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3), 16:1–16:45 (2009)CrossRefGoogle Scholar
  15. 15.
    Pham, M., Passing, L., Erling, O., Boncz, P.A.: Deriving an emergent relational schema from RDF data. In: WWW, pp. 864–874 (2015)Google Scholar
  16. 16.
    Stefanoni, G., Motik, B., Kostylev, E.V.: Estimating the cardinality of conjunctive queries over RDF data using graph summarisation. In: WWW (2018)Google Scholar
  17. 17.
    Tsialiamanis, P., Sidirourgos, L., Fundulaki, I., Christophides, V., Boncz, P.: Heuristics-based query optimisation for SPARQL. In: EDBT, pp. 324–335 (2012)Google Scholar
  18. 18.
    Yuan, P., Liu, P., Wu, B., Jin, H., Zhang, W., Liu, L.: TripleBit: a fast and compact system for large scale RDF data. PVLDB 6(7), 517–528 (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.School of Electrical Engineering and Computer SciencePeking UniversityBeijingChina
  2. 2.CTBT Beijing National Data Center and Beijing Radionuclide LaboratoryBeijingChina

Personalised recommendations