Abstract
We are now entering the era of big data. HBase comes out to organize data as key-value pairs and support fast queries on rowkeys, but queries on non-rowkey column are a blind spot of HBase. It is the main topic of this paper to provide high-performance query capability on non-rowkey column. An effective secondary index model is proposed, and the prototype system CinHBa is implemented. Furthermore, a novel caching policy, Hotscore Algorithm, is introduced in CinHBa to cache hottest index data into memory to improve query performance. Experiment evaluation shows that query response time of CinHBa is far less than native HBase without secondary index on 10M records. Besides that, CinHBa has good data scalability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
DBMS2: DataBase Management System Services, http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive
Huawei Hindex, https://github.com/Huawei-Hadoop/hindex
Corbato, F.: A Paging Experiment with the Multics System. MIT Project MAC Report MAC-M-384 (1968)
Ungureanu, C., Debnath, B., Rago, S., Aranya, A.: TBF: A memory-efficient replacement policy for flash-based caches. In: 29th IEEE International Conference onData Engineering Brisbane (ICDE), pp. 1117–1128. IEEE Press, Brisbane (2013)
A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks, http://database.cs.brown.edu/projects/mapreduce-vs-dbms
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A Comparison of Approaches to Large-scale Data Analysis. In: 35th International Conference on Management of Data, New York, pp. 165–178 (2009)
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking Cloud Serving Systems with YCSB. In: 1st ACM Symposium on Cloud Computing, Santa Clara, CA, pp. 143–154 (2010)
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J., Rasin, A., Silberschatz, A.: HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. In: 35th International Conference on Very Large Data Bases, Lyon, pp. 922–933 (2009)
Dittrich, J., Quian-Ruiz, J., Jindal, A., Kargin, Y., Setty, V., Schad, J.: Hadoop++: Making a Yellow Elephant Run Like a Cheetah (WithoutIt Even Noticing). In: 36th International Conference on Very Large Data Bases, Singapore, pp. 518–529 (2010)
Dittrich, J., Quian-Ruiz, J., Richter, S., Schuh, S., Jindal, A., Schad, J.: Only Aggressive Elephants are Fast Elephants. In: 38th International Conference on Very Large Data Bases, Istanbul, pp. 1591–1602 (2012)
Sfakianakis, G., Patlakas, I., Ntarmos, N., Triantafillou, P.: Interval Indexing and Querying on Key-value Cloud Stores. In: 29th IEEE International Conference on Data Engineering (ICDE), pp. 805–816. IEEE Press, Brisbane (2013)
Bentley, J.L.: Solutions to Klee’s Rectangle Problem, Technical Report, Carnegie-Mellon University, Pittsburgh (1977)
Dean, J., Ghemawat, S.: MapReduce: a Flexible Data Processing Tool. Communications of the ACM 53(1), 72–77 (2010)
Levandoski, J.J., Larson, P., Stoica, R.: Identifying Hot and Cold Data in Main-Memory Databases. In: 29th IEEE International Conference on Data Engineering (ICDE), pp. 26–37. IEEE Press, Brisbane (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Ge, W. et al. (2014). CinHBa: A Secondary Index with Hotscore Caching Policy on Key-Value Data Store. In: Luo, X., Yu, J.X., Li, Z. (eds) Advanced Data Mining and Applications. ADMA 2014. Lecture Notes in Computer Science(), vol 8933. Springer, Cham. https://doi.org/10.1007/978-3-319-14717-8_47
Download citation
DOI: https://doi.org/10.1007/978-3-319-14717-8_47
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14716-1
Online ISBN: 978-3-319-14717-8
eBook Packages: Computer ScienceComputer Science (R0)