LinearDB: A Relational Approach to Make Data Warehouse Scale Like MapReduce

Wang, Huiju; Qin, Xiongpai; Zhang, Yansong; Wang, Shan; Wang, Zhanwei

doi:10.1007/978-3-642-20152-3_23

Huiju Wang^19,20,
Xiongpai Qin^19,20,
Yansong Zhang^19,20,
Shan Wang^19,20 &
…
Zhanwei Wang^19,20

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6588))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1141 Accesses
6 Citations

Abstract

Operating on computer clusters, parallel databases enjoy enhanced performance. However, the scalability of a parallel database is limited by a number of factors. Although MapReduce-based systems are highly scalable, their performance is not satisfactory for data intensive applications. In this paper, we explore the feasibility of building a data warehouse that incorporates the best features from both technologies – the efficiency of parallel database and the scalability and fault tolerance of MapReduce. Towards this target, we design a prototype system called LinearDB. LinearDB organizes data in a decomposed snowflake schema and adopts three operations – transform, reduce and merge – to accomplish query processing. All these techniques are specially designed for the cluster environment. Our experimental results show that its scalability matches MapReduce and its performance is up to 3 times as good as that of PostgreSQL.

This work is partly supported by the Important National Science & Technology Specific Projects of China ("HGJ" Projects, Grant No.2010ZX01042-001-002), the National Natural Science Foundation of China (Grant No.61070054), the Fundamental Research Funds for the Central Universities (the Research Funds of Renmin University of China, Grant No.10XNI018), the Renmin University of China (Grant No.10XNB053), and the Graduate Science Foundation of Renmin University of China (Grant No.10XNH096 and No.11XNH120).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Karayannidis, N., Tsois, A., Sellis, T.K., Pieringer, R., Markl, V., Ramsak, F., Fenk, R., Elhardt, K., Bayer, R.: Processing Star Queries on Hierarchically-Clustered Fact Tables. In: Proceedings of the 28th VLDB Conference, pp. 730–741 (2002)
Google Scholar
Raman, V., Swart, G., Qiao, L., Reiss, F., Dialani, V., Kossmann, D., Narang, I., Sidle, R.: Constant-Time Query Processing. In: Proceedings of the 24th ICDE Conference, pp. 60–69 (2008)
Google Scholar
Markl, V., Ramsak, F., Bayer, R.: Improving OLAP Performance by Multidimensional Hierarchical Clustering. In: Proceedings of the IDEAS 1999, pp. 165–177 (1999)
Google Scholar
WinterCorp: 2005 TopTen Program Summary, http://www.wintercorp.com/WhitePapers/WC_TopTenWP.pdf
http://hadoop.apache.org
Korth, H.F., Kuper, G.M., Feigenbaum, J., Gelder, A.V., Ullman, J.D.: SYSTEM/U: adatabase system based on the universal relation assumption. TODS 9(3), 331–347 (1984)
Article Google Scholar
Chaudhuri, S., Dayal, U.: An Overview of Data Warehousing and OLAP Technology. SIGMOD Record 26(1), 65–74 (1997)
Article Google Scholar
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: Proceeding of the 6th Symposium on Operating System Design and Implementation (OSDI 2004), pp. 137–150 (2004)
Google Scholar
http://hive.apache.org/
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J., Rasin, A., Silberschatz, A.: HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. PVLDB 2(1), 922–933 (2009)
Google Scholar
Largest Commercial Database in Winter Corp. TopTen? Survey Tops One Hundred Terabytes, http://test.wintercorp.com/PressReleases/ttp2005_pressrelease_091405.htm
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 35th SIGMOD Conference, pp. 165–178 (2009)
Google Scholar
Olston, C., Reed, B., Srivastava, U., et al.: Pig latin: a not-so-foreign language for data. In: Proceedings of the 34th SIGMOD Conference, pp. 1099–1110 (2008)
Google Scholar
Theodoratos, D., Tsois, A.: Heuristic Optimization of OLAP Queries inMultidimensionally Hierarchically ClusteredDatabases. In: DOLAP 2001(2001)
Google Scholar
http://hbase.apache.org
Bayer, R.: The universal B-Tree for multi-dimensional Indexing: General Concepts. In: Masuda, T., Tsukamoto, M., Masunaga, Y. (eds.) WWCA 1997. LNCS, vol. 1274. Springer, Heidelberg (1997)
Chapter Google Scholar
Abadi, D.J.: Data Management in the Cloud: limitations and Opportunities. IEEE Bulletin of the Technical Committee on Data Engineering 32(1), 3–12 (2009)
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Data Engineering and Knowledge Engineering, (Renmin University of China), MOE, Beijing, 100872, P.R. China
Huiju Wang, Xiongpai Qin, Yansong Zhang, Shan Wang & Zhanwei Wang
School of Information, Renmin University of China, Beijing, 100872, P.R. China
Huiju Wang, Xiongpai Qin, Yansong Zhang, Shan Wang & Zhanwei Wang

Authors

Huiju Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiongpai Qin
View author publications
You can also search for this author in PubMed Google Scholar
Yansong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhanwei Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China
Jeffrey Xu Yu
Department of Computer Science, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro (373-1 Guseong-don), 305-701, Yuseong-gu, Daejeon, Korea
Myoung Ho Kim
Institute for Computer Science and Business Information Systems (ICB), University of Duisburg-Essen, Schützenbahn 70, 45117, Essen, Germany
Rainer Unland

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, H., Qin, X., Zhang, Y., Wang, S., Wang, Z. (2011). LinearDB: A Relational Approach to Make Data Warehouse Scale Like MapReduce. In: Yu, J.X., Kim, M.H., Unland, R. (eds) Database Systems for Advanced Applications. DASFAA 2011. Lecture Notes in Computer Science, vol 6588. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20152-3_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-20152-3_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20151-6
Online ISBN: 978-3-642-20152-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics