To Cache or Not To Cache: The Effects of Warming Cache in Complex SPARQL Queries

Lampo, Tomas; Vidal, María-Esther; Danilow, Juan; Ruckhaus, Edna

doi:10.1007/978-3-642-25106-1_22

Tomas Lampo²⁹,
María-Esther Vidal³⁰,
Juan Danilow³⁰ &
…
Edna Ruckhaus³⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7045))

Included in the following conference series:

OTM Confederated International Conferences "On the Move to Meaningful Internet Systems"

776 Accesses
5 Citations

Abstract

Existing RDF engines have developed caching techniques able to store intermediate results and reuse them in further steps of the query execution process; thus, execution time is speeded up by avoiding repeated computation of the same results. Although these techniques can be beneficial for many real-world queries, the same effects may not be observed in complex queries. Particularly, queries comprised of a large number of graph patterns that require the computation of large sets of intermediate results that cannot be reused, or queries that require complex computations to produce small amounts of data, may require further re-orderings or groupings in order to make an effective usage of the cache. In this paper, we address the problem of determining a type of SPARQL queries that can benefit from caching data during query execution or warming up cache. We report on experimental results that show that complex queries can take advantage of the cache, if they are reordered and grouped according to small-sized star-shaped groups; complex queries are not only comprised of a large number of patterns, but they may also produce a large number of intermediate results. Although the results are preliminary, they clearly show that star-shaped group queries can speed up execution time by up to three orders of magnitude when they are run in warm cache, while original queries may exhibit poor performance in warm cache.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.: SW-Store: a vertically partitioned DBMS for Semantic Web data management. VLDB Journal 18(2), 385–406 (2009)
Article Google Scholar
Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.J.: Scalable Semantic Web Data Management Using Vertical Partitioning. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 411–422 (2007)
Google Scholar
AllegroGraph (2009), http://www.franz.com/agraph/allegrograph/
Atre, M., Chaoji, V., Zaki, M.J., Hendler, J.A.: Matrix ”Bit” loaded: a scalable lightweight join query processor for RDF data. In: Proceedings of the WWW, pp. 41–50 (2010)
Google Scholar
Bizer, C., Schultz, A.: The berlin sparql benchmark. Int. J. Semantic Web Inf. Syst. 5(2), 1–24 (2009)
Article Google Scholar
Bornhövd, C., Altinel, M., Mohan, C., Pirahesh, H., Reinwald, B.: Adaptive database caching with dbcache. IEEE Data Eng. Bull. 27(2), 11–18 (2004)
Google Scholar
Fletcher, G., Beck, P.: Scalable Indexing of RDF Graph for Efficient Join Processing. In: CIKM (2009)
Google Scholar
Guo, Y., Pan, Z., Heflin, J.: Lubm: A benchmark for owl knowledge base systems. J. Web Sem. 3(2-3), 158–182 (2005)
Article Google Scholar
Guo, Y., Qasem, A., Pan, Z., Heflin, J.: A requirements driven framework for benchmarking semantic web knowledge base systems. IEEE Trans. Knowl. Data Eng. 19(2), 297–309 (2007)
Article Google Scholar
Harth, A., Umbrich, J., Hogan, A., Decker, S.: A Federated Repository for Querying Graph Structured Data from the Web. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 211–224. Springer, Heidelberg (2007)
Chapter Google Scholar
Ianni, G., Krennwallner, T., Martello, A., Polleres, A.: A Rule System for Querying Persistent RDFS Data. In: Aroyo, L., Traverso, P., Ciravegna, F., Cimiano, P., Heath, T., Hyvönen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 857–862. Springer, Heidelberg (2009)
Chapter Google Scholar
Idreos, S., Kersten, M.L., Manegold, S.: Self-organizing tuple reconstruction in column-stores. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 297–308 (2009)
Google Scholar
Jena Ontology Api (2009), http://jena.sourceforge.net/ontology/index.html
Jena TDB (2009), http://jena.hpl.hp.com/wiki/TDB
Kim, S.-K., Min, S.L., Ha, R.: Efficient worst case timing analysis of data caching. In: IEEE Real Time Technology and Applications Symposium, pp. 230–240 (1996)
Google Scholar
Lampo, T., Ruckhaus, E., Sierra, J., Vidal, M.-E., Martinez, A.: OneQL: An Ontology-based Architecture to Efficiently Query Resources on the Semantic Web. In: The 5th International Workshop on Scalable Semantic Web Knowledge Base Systems at the International Semantic Web Conference, ISWC (2009)
Google Scholar
Malik, T., Wang, X., Burns, R.C., Dash, D., Ailamaki, A.: Automated physical design in database caches. In: ICDE Workshops, pp. 27–34 (2008)
Google Scholar
Martin, M., Unbehauen, J., Auer, S.: Improving the Performance of Semantic Web Applications with SPARQL Query Caching. In: Aroyo, L., Antoniou, G., Hyvönen, E., ten Teije, A., Stuckenschmidt, H., Cabral, L., Tudorache, T. (eds.) ESWC 2010, Part II. LNCS, vol. 6089, pp. 304–318. Springer, Heidelberg (2010)
Chapter Google Scholar
McGlothlin, J.: RDFVector: An Efficient and Scalable Schema for Semantic Web Knowledge Bases. In: Proceedings of the PhD Symposium ESWC (2010)
Google Scholar
McGlothlin, J., Khan, L.: RDFJoin: A Scalable of Data Model for Persistence and Efficient Querying of RDF Dataasets. In: Proceedings of the International Conference on Very Large Data Bases, VLDB (2009)
Google Scholar
Neumann, T., Weikum, G.: RDF-3X: a RISC-style engine for RDF. PVLDB 1(1), 647–659 (2008)
Google Scholar
Neumann, T., Weikum, G.: Scalable join processing on very large rdf graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 627–640 (2009)
Google Scholar
Ruckhaus, E., Ruiz, E., Vidal, M.: Query Evaluation and Optimization in the Semantic Web. In: Proceedings ALPSWS 2006: 2nd International Workshop on Applications of Logic Programming to the Semantic Web and Semantic Web Services (2006)
Google Scholar
Ruckhaus, E., Ruiz, E., Vidal, M.: OnEQL: An Ontology Efficient Query Language Engine for the Semantic Web. In: Proceedings ALPSWS (2007)
Google Scholar
Ruckhaus, E., Ruiz, E., Vidal, M.: Query Evaluation and Optimization in the Semantic Web. In: TPLP (2008)
Google Scholar
Schmidt, M., Hornung, T., Küchlin, N., Lausen, G., Pinkel, C.: An Experimental Comparison of RDF Data Management Approaches in a SPARQL Benchmark Scenario. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 82–97. Springer, Heidelberg (2008)
Chapter Google Scholar
Sidirourgos, L., Goncalves, R., Kersten, M.L., Nes, N., Manegold, S.: Column-store support for RDF data management: not all swans are white. PVLDB 1(2), 1553–1563 (2008)
Google Scholar
Vidal, M.-E., Ruckhaus, E., Lampo, T., Martínez, A., Sierra, J., Polleres, A.: Efficiently Joining Group Patterns in SPARQL Queries. In: Aroyo, L., Antoniou, G., Hyvönen, E., ten Teije, A., Stuckenschmidt, H., Cabral, L., Tudorache, T. (eds.) ESWC 2010. LNCS, vol. 6088, pp. 228–242. Springer, Heidelberg (2010)
Chapter Google Scholar
Weiss, C., Bernstein, A.: On-disk storage techniques for semantic web data are b-trees always the optimal solution? In: The 5th International Workshop on Scalable Semantic Web Knowledge Base Systems at the International Semantic Web Conference, ISWC (2009)
Google Scholar
Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. PVLDB 1(1), 1008–1019 (2008)
Google Scholar
Wielemaker, J.: An Optimised Semantic Web Query Language Implementation in Prolog. In: Gabbrielli, M., Gupta, G. (eds.) ICLP 2005. LNCS, vol. 3668, pp. 128–142. Springer, Heidelberg (2005)
Chapter Google Scholar
Wilkinson, K., Sayers, C., Kuno, H., Reynolds, D.: Efficient RDF Storage and Retrieval in Jena2. Exploiting Hyperlinks 349, 35–43 (2003)
Google Scholar
Williams, G.T., Weaver, J.: Enabling fine-grained http caching of sparql query results. Accepted ISWC (2011)
Google Scholar
Yang, M., Wu, G.: Caching intermediate result of sparql queries. In: WWW (Companion Volume), pp. 159–160 (2011)
Google Scholar
Zukowski, M., Boncz, P.A., Nes, N., Héman, S.: Monetdb/x100 - a dbms in the cpu cache. IEEE Data Eng. Bull. 28(2), 17–22 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Maryland, College Park, USA
Tomas Lampo
Universidad Simón Bolívar, Caracas, Venezuela
María-Esther Vidal, Juan Danilow & Edna Ruckhaus

Authors

Tomas Lampo
View author publications
You can also search for this author in PubMed Google Scholar
María-Esther Vidal
View author publications
You can also search for this author in PubMed Google Scholar
Juan Danilow
View author publications
You can also search for this author in PubMed Google Scholar
Edna Ruckhaus
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

STAR Lab, Vrije Universiteit Brussel (VUB), Bldg G/10, Pleinlaan 2, 1050, Brussel, Belgium
Robert Meersman
DEBII, Curtin University of Technology, Technology Park, De Laeter Way, 6102, Bentley, WA, Australia
Tharam Dillon
Facultad de Informática, Universidad Politécnica de Madrid, Campus de Montegancedo S/N, 28660, Boadilla del Monte, Madrid, Spain
Pilar Herrero
Smeal College of Business, Pennsylvania State University, University Park, 16802, P.O. Box, PA, U.S.A.
Akhil Kumar
Institute of Databases and Information Systems, Ulm University, Germany
Manfred Reichert
City University of Hong Kong, Hong Kong
Li Qing
National University of Singapore (NUS), Singapore
Beng-Chin Ooi
Dipartemento Tecnologie dell’Informazione, Universitá degli Studi di Milano, Via Bramante 65, 26013, Crema, Italy
Ernesto Damiani
Vanderbilt University, VU, Station B #1829, 2015 Terrace Place, 37203, Nashville, TN, USA
Douglas C. Schmidt
Virginia Tech, 24060, Blacksburg, VA
Jules White
Digital Enterprise Research Institute (DERI),, National University of Ireland, IDA Business Park, Lower Dangan, Galway, Ireland
Manfred Hauswirth
Kno.e.sis Center, Wright State University, Dayton,, Ohio
Pascal Hitzler
IBM India Research Lab, 4, Block C, Institutional Area, 110 070, Vasant Kunj, New Delhi, India
Mukesh Mohania

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lampo, T., Vidal, ME., Danilow, J., Ruckhaus, E. (2011). To Cache or Not To Cache: The Effects of Warming Cache in Complex SPARQL Queries. In: Meersman, R., et al. On the Move to Meaningful Internet Systems: OTM 2011. OTM 2011. Lecture Notes in Computer Science, vol 7045. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25106-1_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-25106-1_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25105-4
Online ISBN: 978-3-642-25106-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics