Abstract
Thanks to the rapid growth of memory capacity, it is now feasible to perform query processing completely in memory. Nevertheless, as main memory is substantially more expensive than most secondary storage equipments, including HDD and SSD, it is not suitable for storing cold data. Therefore, a hybrid data storage composed of both memory and secondary storage is expected to stay popular in the foreseeable future. In this paper, we introduce a query optimization model for hybrid data storage. Different from traditional query processors, which treat either main memory as a cache or secondary storage as an anti-cache, our model performs semantic data partitioning between memory and secondary storage. Query optimization can thus take the partitioning of data into account, to achieve enhanced performance. We conducted experimental evaluation on a columnar query engine to demonstrate the advantage of the proposed approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Akbar, M.M., Rahman, M.S., Kaykobad, M., Manning, E.G., Shoja, G.C.: Solving the multidimensional multiple-choice knapsack problem by constructing convex hulls. Comput. Oper. Res. 33(5), 1259–1273 (2006)
Bernstein, P.A., Goodman, N., Wong, E., Reeve, C.L., Rothnie Jr., J.B.: Query processing in a system for distributed databases (sdd-1). ACM TODS 6(4), 602–625 (1981)
Boncz, P.A., Zukowski, M., Nes, N.: Monetdb, x100: hyper-pipelining query execution. In: CIDR, pp. 225–237 (2005)
Ceri, S., Gottlob, G.: Optimizing joins between two partitioned relations in distributed databases. J. Parallel Distrib. Comput. 3(2), 183–205 (1986)
Chaudhuri, S.: An overview of query optimization in relational systems. In: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 34–43. ACM (1998)
Dar, S., Franklin, M.J., Jonsson, B.T., Srivastava, D., Tan, M., et al.: Semantic data caching and replacement. In: Proceedings of VLDB, vol. 96, pp. 330–341. Citeseer (1996)
DeBrabant, J., Pavlo, A., Tu, S., Stonebraker, M., Zdonik, S.: Anti-caching: a new approach to database management system architecture. Proc. VLDB Endow. 6(14), 1942–1953 (2013)
Eldawy, A., Levandoski, J., Larson, P.-Å.: Trekking through siberia: managing cold data in a memory-optimized database. Proc. VLDB Endow. 7(11), 931–942 (2014)
Finkelstein, S.: Common expression analysis in database applications. In: Proceedings of SIGMOD, pp. 235–245. ACM (1982)
Ganguly, S., Hasan, W., Krishnamurthy, R.: Query optimization for parallel execution. In: Proceedings of the SIGMOD, pp. 9–18 (1992)
Giannikis, G., Alonso, G., Kossmann, D.: Shareddb: killing one thousand queries with one stone. Proc. VLDB Endow. 5(6), 526–537 (2012)
Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Mining Knowl. Discov. 1(1), 29–53 (1997)
Herodotou, H., Borisov, N., Babu, S.: Query optimization techniques for partitioned tables. In: Proceedings of the SIGMOD, pp. 49–60. ACM (2011)
Kemper, A., Neumann, T.: Hyper: a hybrid OLTP & OLAP main memory database system based on virtual memory snapshots. In: Proceedings of ICDE, pp. 195–206. IEEE (2011)
Kossmann, D., Franklin, M.J., Drasch, G., Ag, W.: Cache investment: integrating query optimization and distributed data placement. ACM TODS 25(4), 517–558 (2000)
Manegold, S., Boncz, P., Kersten, M.L.: Optimizing main-memory join on modern hardware. IEEE TKDE 14(4), 709–730 (2002)
Manegold, S., Boncz, P., Kersten, M.L.: Generic database cost models for hierarchical memory systems. In Proceedings of VLDB, VLDB 2002, pp. 191–202. VLDB Endowment (2002)
Neumann, T.: Efficiently compiling efficient query plans for modern hardware. Proc. VLDB Endow. 4(9), 539–550 (2011)
Polyzotis, N.: Selectivity-based partitioning: a divide-and-union paradigm for effective query optimization. In: Proceedings of CIKM, pp. 720–727. ACM (2005)
Rao, J., Ross, K.A.: Making b+-trees cache conscious in main memory. ACM SIGMOD Record 29, 475–486 (2000)
Ren, Q., Dunham, M.H., Kumar, V.: Semantic caching and query processing. IEEE TKDE 15(1), 192–210 (2003)
Sellis, T.K.: Multiple-query optimization. ACM TODS 13(1), 23–52 (1988)
Zhang, H., Chen, G., Ooi, B.C., Tan, K.-L., Zhang, M.: In-memory big data management and processing: a survey. IEEE TKDE 27(7), 1920–1948 (2015)
Zhang, H., Chen, G., Ooi, B.C., Wong, W.-F., Wu, S., Xia, Y.: Anti-caching-based elastic memory management for big data. In: Proceedings of ICDE, pp. 1268–1279. IEEE (2015)
Zhang, Y., Zhou, X., Zhang, Y., Zhang, Y., Su, M., Wang, S.: Virtual denormalization via array index reference for main memory OLAP. IEEE TKDE 28(4), 1061–1074 (2016)
Zhou, J., Larson, P.-A., Chaiken, R.: Incorporating partitioning and parallel plans into the scope optimizer. In Proceedings of ICDE, pp. 1060–1071. IEEE (2010)
Zukowski, M., van de Wiel, M., Boncz, P.: Vectorwise: a vectorized analytical dbms. In: Proceedings of ICDE, pp. 1349–1350. IEEE (2012)
Acknowledgement
This work is partially supported by Chinese National High-tech R&D Program (863 Program) (2015AA015307) and the NSFC Porject (No. 61272138).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Yu, A., Meng, Q., Zhou, X., Shen, B., Zhang, Y. (2017). Query Optimization on Hybrid Storage. In: Candan, S., Chen, L., Pedersen, T., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10177. Springer, Cham. https://doi.org/10.1007/978-3-319-55753-3_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-55753-3_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55752-6
Online ISBN: 978-3-319-55753-3
eBook Packages: Computer ScienceComputer Science (R0)