Abstract
Operating on computer clusters, parallel databases enjoy enhanced performance. However, the scalability of a parallel database is limited by a number of factors. Although MapReduce-based systems are highly scalable, their performance is not satisfactory for data intensive applications. In this paper, we explore the feasibility of building a data warehouse that incorporates the best features from both technologies – the efficiency of parallel database and the scalability and fault tolerance of MapReduce. Towards this target, we design a prototype system called LinearDB. LinearDB organizes data in a decomposed snowflake schema and adopts three operations – transform, reduce and merge – to accomplish query processing. All these techniques are specially designed for the cluster environment. Our experimental results show that its scalability matches MapReduce and its performance is up to 3 times as good as that of PostgreSQL.
This work is partly supported by the Important National Science & Technology Specific Projects of China ("HGJ" Projects, Grant No.2010ZX01042-001-002), the National Natural Science Foundation of China (Grant No.61070054), the Fundamental Research Funds for the Central Universities (the Research Funds of Renmin University of China, Grant No.10XNI018), the Renmin University of China (Grant No.10XNB053), and the Graduate Science Foundation of Renmin University of China (Grant No.10XNH096 and No.11XNH120).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Karayannidis, N., Tsois, A., Sellis, T.K., Pieringer, R., Markl, V., Ramsak, F., Fenk, R., Elhardt, K., Bayer, R.: Processing Star Queries on Hierarchically-Clustered Fact Tables. In: Proceedings of the 28th VLDB Conference, pp. 730–741 (2002)
Raman, V., Swart, G., Qiao, L., Reiss, F., Dialani, V., Kossmann, D., Narang, I., Sidle, R.: Constant-Time Query Processing. In: Proceedings of the 24th ICDE Conference, pp. 60–69 (2008)
Markl, V., Ramsak, F., Bayer, R.: Improving OLAP Performance by Multidimensional Hierarchical Clustering. In: Proceedings of the IDEAS 1999, pp. 165–177 (1999)
WinterCorp: 2005 TopTen Program Summary, http://www.wintercorp.com/WhitePapers/WC_TopTenWP.pdf
Korth, H.F., Kuper, G.M., Feigenbaum, J., Gelder, A.V., Ullman, J.D.: SYSTEM/U: adatabase system based on the universal relation assumption. TODS 9(3), 331–347 (1984)
Chaudhuri, S., Dayal, U.: An Overview of Data Warehousing and OLAP Technology. SIGMOD Record 26(1), 65–74 (1997)
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: Proceeding of the 6th Symposium on Operating System Design and Implementation (OSDI 2004), pp. 137–150 (2004)
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J., Rasin, A., Silberschatz, A.: HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. PVLDB 2(1), 922–933 (2009)
Largest Commercial Database in Winter Corp. TopTen? Survey Tops One Hundred Terabytes, http://test.wintercorp.com/PressReleases/ttp2005_pressrelease_091405.htm
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 35th SIGMOD Conference, pp. 165–178 (2009)
Olston, C., Reed, B., Srivastava, U., et al.: Pig latin: a not-so-foreign language for data. In: Proceedings of the 34th SIGMOD Conference, pp. 1099–1110 (2008)
Theodoratos, D., Tsois, A.: Heuristic Optimization of OLAP Queries inMultidimensionally Hierarchically ClusteredDatabases. In: DOLAP 2001(2001)
Bayer, R.: The universal B-Tree for multi-dimensional Indexing: General Concepts. In: Masuda, T., Tsukamoto, M., Masunaga, Y. (eds.) WWCA 1997. LNCS, vol. 1274. Springer, Heidelberg (1997)
Abadi, D.J.: Data Management in the Cloud: limitations and Opportunities. IEEE Bulletin of the Technical Committee on Data Engineering 32(1), 3–12 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, H., Qin, X., Zhang, Y., Wang, S., Wang, Z. (2011). LinearDB: A Relational Approach to Make Data Warehouse Scale Like MapReduce. In: Yu, J.X., Kim, M.H., Unland, R. (eds) Database Systems for Advanced Applications. DASFAA 2011. Lecture Notes in Computer Science, vol 6588. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20152-3_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-20152-3_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20151-6
Online ISBN: 978-3-642-20152-3
eBook Packages: Computer ScienceComputer Science (R0)