Abstract
Resource Description Framework (RDF) has been used as a general model for conceptual description and information modelling. As the growing number and volume of RDF datasets emerged recently, many techniques have been developed for accelerating the query answering process on triple stores, which handle large-scale RDF data. Caching is one of the popular solutions. Non-RDBMS based triple stores, which leverage the intrinsic nature of RDF graphs, are emerging and attracting more research attention in recent years. However, as their fundamental structure is different from RDBMS triple stores, they can not leverage the RDBMS caching mechanism. In this paper, we develop a time-aware frequency based caching algorithm to address this issue. Our approach retrieves the accessed triples by analyzing and expanding previous queries and collects most frequently accessed triples by evaluating their access frequencies using Exponential Smoothing, a forecasting method. We evaluate our approach using real world queries from a publicly available SPARQL endpoint. Our theoretical analysis and empirical results show that the proposed approach outperforms the state-of-the-art approaches with higher hit rates.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Carpineto, C., Romano, G.: A Survey of Automatic Query Expansion in Information Retrieval. ACM Computing Survey 44(1), 1 (2012)
Denning, P.J.: The Working Set Model for Program Behaviour. Communications of the ACM 11(5), 323–333 (1968)
Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL Querying of Large RDF Graphs. The VLDB Endowment (PVLDB) 4(11), 1123–1134 (2011)
Johnson, N.L., Kemp, A.W., Kotz, S.: Univariate Discrete Distributions (2nd Edition). Wiley (1993)
Jr., E.S.G.: Exponential Smoothing: The State of The Art-Part II. International Journal of Forecasting 22(4), 637–666 (2006)
Levandoski, J.J., Larson, P., Stoica, R.: Identifying hot and cold data in main-memory databases. In: Proc. of 29th International Conference on Data Engineering (ICDE 2013), pp. 26–37. Brisbane, Australia, April 2013
Lorey, J., Naumann, F.: Detecting SPARQL query templates for data prefetching. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 124–139. Springer, Heidelberg (2013)
Martin, M., Unbehauen, J., Auer, S.: Improving the Performance of Semantic Web Applications with SPARQL Query Caching. In: Aroyo, L., Antoniou, G., Hyvönen, E., ten Teije, A., Stuckenschmidt, H., Cabral, L., Tudorache, T. (eds.) ESWC 2010, Part II. LNCS, vol. 6089, pp. 304–318. Springer, Heidelberg (2010)
Megiddo, N., Modha, D.S.: ARC: a self-tuning, low overhead replacement cache. In: Proc. of the Conference on File and Storage Technologies (FAST 2003). San Francisco, California, USA, March 2003
Movellan, J.R.: A Quickie on Exponential Smoothing. http://mplab.ucsd.edu/tutorials/ExpSmoothing.pdfa/
Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs. In: Proc. of the International Conference on Management of Data (SIGMOD 2009)
Neumann, T., Weikum, G.: The RDF-3X Engine for Scalable Management of RDF Data. The VLDB Journal 19(1), 91–113 (2010)
O’Neil, E.J., O’Neil, P.E., Weikum, G.: The LRU-K page replacement algorithm for database disk buffering. In: Proc. of the International Conference on Management of Data (SIGMOD 1993), pp. 297–306. Washington, D.C., USA, May 1993
Pérez, J., Arenas, M., Gutierrez, C.: Semantics and Complexity of SPARQL. ACM Transactions on Database Systems 34(3) (2009)
Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., Reynolds, D.: SPARQL basic graph pattern optimization using selectivity estimation. In: Proc. of the 17th International World Wide Web Conference (WWW 2008), pp. 595–604. Beijing, China, April 2008
Yan, Y., Wang, C., Zhou, A., Qian, W., Ma, L., Pan, Y.: Efficiently querying RDF data in triple stores. In: Proc. of the 17th International World Wide Web Conference (WWW 2008), pp. 1053–1054. Beijing, China, April 2008
Yang, M., Wu, G.: Caching intermediate result of SPARQL queries. In: Proc. of the 20th International World Wide Web Conference (WWW 2011), pp. 159–160. Hyderabad, India, March 2011
Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A Distributed Graph Engine for Web Scale RDF Data. The VLDB Endowment (PVLDB) 6(4), 265–276 (2013)
Zou, L., Mo, J., Chen, L., Özsu, M.T., Zhao, D.: gStore: Answering SPARQL Queries via Subgraph Matching. The VLDB Endowment (PVLDB) 4(8), 482–493 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhang, W.E., Sheng, Q.Z., Taylor, K., Qin, Y. (2015). Identifying and Caching Hot Triples for Efficient RDF Query Processing. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science(), vol 9050. Springer, Cham. https://doi.org/10.1007/978-3-319-18123-3_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-18123-3_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18122-6
Online ISBN: 978-3-319-18123-3
eBook Packages: Computer ScienceComputer Science (R0)