SINGLE vs. MapReduce vs. Relational: Predicting Query Execution Time
Over the past decade’s several new concepts emerged to organize and query data over large Data Warehouse (DW) system with the same primary objective, that is, optimize processing speed. More recently, with the rise of BigData concept, storage cost lowered significantly, and performance (random accesses) increased, particularly with modern SSD disks. This paper introduces and tested a storage alternative which goes against current data normalization premises, where storage space is no longer a concern. By de-normalizing the entire data schema (transparent to the user) it is proposed a new concept system where query execution time must be entirely predictable, independently of its complexity, called, SINGLE. The proposed data model also allows easy partitioning and distributed processing to enable execution parallelism, boosting performance, as happens in MapReduce. TPC-H benchmark is used to evaluate storage space and query performance. Results show predictable performance when comparing with approaches based on a normalized relational schema, and MapReduce oriented.
KeywordsPredictable Query execution Data warehouse MapReduce Normalization De-normalization Distributed Relational
This work is financed by national funds through FCT - Fundação para a Ciência e Tecnologia, I.P., under the project UID/Multi/04016/2016. Furthermore, we would like to thank the Instituto Politécnico de Viseu and CI&DETS for their support.
- 3.Council, Transaction Processing Performance: TPC-H benchmark specification, vol. 21, pp. 592–603 (2008). http://www.tcp.org
- 4.DeWitt, D.J., Katz, R.H., Olken, F., Shapiro, L.D., Stonebraker, M.R., Wood, D.A.: Implementation techniques for main memory database systems, vol. 14. ACM (1984)Google Scholar
- 6.Kimball, R.: The Data Warehouse Lifecycle Toolkit. Wiley, Hoboken (2008)Google Scholar
- 9.Mutharaju, R., Maier, F., Hitzler, P.: A MapReduce algorithm for SC. In: 23rd International Workshop on Description Logics DL2010, p. 456 (2010)Google Scholar
- 11.Patel, J.M., Carey, M.J., Vernon, M.K.: Accurate modeling of the hybrid hash join algorithm. In: ACM SIGMETRICS Performance Evaluation Review, vol. 22, pp. 56–66. ACM (1994)Google Scholar
- 12.Pavlo, A., et al.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pp. 165–178. ACM (2009)Google Scholar
- 13.Pinto, Y.: A framework for systematic database denormalization. Glob. J. Comput. Sci. Technol. 9(4), 44–52 (2009)Google Scholar
- 14.Roy, S., Shit, B., Sen, S.: Association based multi-attribute analysis to construct materialized view. In: Chaki, R., Saeed, K., Cortesi, A., Chaki, N. (eds.) Advanced Computing and Systems for Security. AISC, vol. 567, pp. 115–131. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-3409-1_8CrossRefGoogle Scholar
- 15.Sanders, G.L., Shin, S.: Denormalization effects on performance of RDBMS. In: Proceedings of the 34th Annual Hawaii International Conference on System Sciences 2001, p. 9. IEEE (2001)Google Scholar
- 16.Zaker, M., Phon-Amnuaisuk, S., Haw, S.C.: Optimizing the data warehouse design by hierarchical denormalizing. In: Proceedings of the 8th Conference on Applied Computer Scince, pp. 131–138. World Scientific and Engineering Academy and Society (WSEAS) (2008)Google Scholar