Abstract
Big data bring us not only constantly growing data volume, dynamic and elastic storage demands, diversified data structures, but also different data features. Apart from the traditional dense data, more and more “sparse” data emerged and account for the majority of the massive data. How to adapt to the characteristics of the sparse data without losing sight of the traits of the dense data is a challenge. To meet the differentiated storage demands and give a proper way to express the semantic of absent values, we proposed a 3-layered storage structure named “Dynamic Table” to represent the incomplete data. Our approach deliberates on the distributed storage requirements in the cloud and aims to support a hybrid row and column layout, which allows users to mix-and-match the two kinds of physical storage formats on demand. In addition, the original semantic of absent values is divided into two parts with distinct treatments. Specifically a four-valued logic is introduced. Experiments on synthetic and real-world data sets demonstrate that our approach combines the advantages of columnar storage and the merits of row-oriented store. The distinguished semantic of absent values are necessary to describe the missing values in sparse data set.
This work is supported by National Science and Technology Major Program for Core Electronic Devices, High-end Generic Chips and Basic Software Project of China under Grant No.2010ZX01042-001-003-05, 2010ZX01042-002-002-02, and Natural Science Foundation of China (NSFC) under grant numbers: 60973002, 61170003.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Beckmann, J.L., Halverson, A., Krishnamurthy, R., et al.: Extending RDBMSs to support sparse datasets using an interpreted attribute storage format. In: Proceedings of the 22nd International Conference on Data Engineering, ICDE, p. 58. IEEE Computer Society, Washington (2006)
Yang, B., Qian, W., Zhou, A.: Using Wide Table to manage web data: a survey. Frontiers of Computer Science in China 2, 211–223 (2008)
Eric, C., Beckmann, J., Naughton, J.: The case for a wide-table approach to manage sparse relational data sets. In: Proceedings of SIGMOD, pp. 821–832. ACM, New York (2007)
Agrawal, R., Somani, A., Xu, Y.: Storage and querying of e-commerce data. In: Proceedings of the 27th International Conference on VLDB, pp. 169–180. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Chang, F., Dean, J., Ghemawat, J., et al.: Bigtable: A Distributed Storage System for Structured Data. ACM Transactions on Computer Systems 26, 1–26 (2008)
Apache HBase, http://hbase.apache.org/
He, Y., Lee, R.B., Huai, Y., et al.: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems. In: Proceedings of the IEEE International Conference on Data Engineering (ICDE), pp. 1199–1208. IEEE, Hannover (2011)
Ailamaki, A., DeWitt, D., Hill, M., et al.: Weaving Relations for Cache Performance. In: Proceedings of the 27th International Conference on VLDB, pp. 149–158. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Ramamurthy, R., DeWitt, D.J., Su, Q.: A Case for Fractured Mirrors. The International Journal on Very Large Data Bases 12, 89–101 (2003)
Boncz, P., Zukowski, M., Nes, N.: MonetDB/X100: Hyper-pipelining query execution. In: Proceedings of the CIDR 2005, pp. 225–237. VLDB, San Francisco (2005)
Stonebraker, M., Abadi, D.J., et al.: C-Store: A Column-oriented DBMS. In: Proceedings of the 31st International Conference on VLDB, pp. 553–564. VLDB Endowment, Trondheim (2005)
Abadi, D.J., Madden, S.R., Hachem, N.: ColumnStores vs. RowStores: How Different Are They Really? In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 967–980. ACM, New York (2008)
Copeland, G.P., Khoshafian, S.N.: A decomposition storage model. In: Proceedings of the 1985 ACM SIGMOD International Conference on Management of Data, pp. 268–279. ACM, New York (1985)
Floratou, A., Patel, J.M., Shekita, E.J., Tata, S.: Column-Oriented Storage Techniques for MapReduce. Proceedings of the VLDB Endowment 4, 419–429 (2011)
Zaniolo, C.: Database Relations with Null Values. Journal of Computer and System Sciences 28, 142–166 (1984)
Candan, K.S., Grant, J., Subrahmanian, V.S.: A Unified Treatment of Null Values Using Constraints. Information Sciences 98, 99–156 (1997)
Codd, E.F.: Missing Information (Applicable and Inapplicable) in Relational database. In: Margaret, H.E. (ed.) ACM SIGMOD Record, vol. 15, pp. 53–53 (1986)
Gessert, G.H.: Four Valued Logic for Relational Database Systems. ACM SIGMOD Record 19, 29–35 (1990)
Vassiliou, Y.: NULL values in database management a denotational semantics approach. In: Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, pp. 162–169. ACM, New York (1979)
Thusoo, A., Sarma, J.S., Jain, N.: Hive – A Petabyte Scale Data Warehouse Using Hadoop. In: 2010 IEEE 26th International Conference on ICDE, Long Beach, CA, pp. 996–1005 (2010)
Abadi, D.J.: Column Stores For Wide and Sparse Data. In: Proceedings of CIDR, pp. 292–297 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cheng, X. et al. (2012). Dynamic Table: A Layered and Configurable Storage Structure in the Cloud. In: Bao, Z., et al. Web-Age Information Management. WAIM 2012. Lecture Notes in Computer Science, vol 7419. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33050-6_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-33050-6_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33049-0
Online ISBN: 978-3-642-33050-6
eBook Packages: Computer ScienceComputer Science (R0)