CinHBa: A Secondary Index with Hotscore Caching Policy on Key-Value Data Store

Ge, Wei; Huang, Yihua; Zhao, Di; Luo, Shengmei; Yuan, Chunfeng; Zhou, Wenhui; Tang, Yun; Zhou, Juan

doi:10.1007/978-3-319-14717-8_47

Wei Ge^22,24,
Yihua Huang²²,
Di Zhao²²,
Shengmei Luo²³,
Chunfeng Yuan²²,
Wenhui Zhou²²,
Yun Tang²² &
…
Juan Zhou²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8933))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

3268 Accesses
3 Citations

Abstract

We are now entering the era of big data. HBase comes out to organize data as key-value pairs and support fast queries on rowkeys, but queries on non-rowkey column are a blind spot of HBase. It is the main topic of this paper to provide high-performance query capability on non-rowkey column. An effective secondary index model is proposed, and the prototype system CinHBa is implemented. Furthermore, a novel caching policy, Hotscore Algorithm, is introduced in CinHBa to cache hottest index data into memory to improve query performance. Experiment evaluation shows that query response time of CinHBa is far less than native HBase without secondary index on 10M records. Besides that, CinHBa has good data scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

DBMS2: DataBase Management System Services, http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive
Huawei Hindex, https://github.com/Huawei-Hadoop/hindex
Corbato, F.: A Paging Experiment with the Multics System. MIT Project MAC Report MAC-M-384 (1968)
Google Scholar
Ungureanu, C., Debnath, B., Rago, S., Aranya, A.: TBF: A memory-efficient replacement policy for flash-based caches. In: 29th IEEE International Conference onData Engineering Brisbane (ICDE), pp. 1117–1128. IEEE Press, Brisbane (2013)
Google Scholar
A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks, http://database.cs.brown.edu/projects/mapreduce-vs-dbms
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A Comparison of Approaches to Large-scale Data Analysis. In: 35th International Conference on Management of Data, New York, pp. 165–178 (2009)
Google Scholar
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking Cloud Serving Systems with YCSB. In: 1st ACM Symposium on Cloud Computing, Santa Clara, CA, pp. 143–154 (2010)
Google Scholar
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J., Rasin, A., Silberschatz, A.: HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. In: 35th International Conference on Very Large Data Bases, Lyon, pp. 922–933 (2009)
Google Scholar
Dittrich, J., Quian-Ruiz, J., Jindal, A., Kargin, Y., Setty, V., Schad, J.: Hadoop++: Making a Yellow Elephant Run Like a Cheetah (WithoutIt Even Noticing). In: 36th International Conference on Very Large Data Bases, Singapore, pp. 518–529 (2010)
Google Scholar
Dittrich, J., Quian-Ruiz, J., Richter, S., Schuh, S., Jindal, A., Schad, J.: Only Aggressive Elephants are Fast Elephants. In: 38th International Conference on Very Large Data Bases, Istanbul, pp. 1591–1602 (2012)
Google Scholar
Sfakianakis, G., Patlakas, I., Ntarmos, N., Triantafillou, P.: Interval Indexing and Querying on Key-value Cloud Stores. In: 29th IEEE International Conference on Data Engineering (ICDE), pp. 805–816. IEEE Press, Brisbane (2013)
Google Scholar
Bentley, J.L.: Solutions to Klee’s Rectangle Problem, Technical Report, Carnegie-Mellon University, Pittsburgh (1977)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: a Flexible Data Processing Tool. Communications of the ACM 53(1), 72–77 (2010)
Article Google Scholar
Levandoski, J.J., Larson, P., Stoica, R.: Identifying Hot and Cold Data in Main-Memory Databases. In: 29th IEEE International Conference on Data Engineering (ICDE), pp. 26–37. IEEE Press, Brisbane (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210046, China
Wei Ge, Yihua Huang, Di Zhao, Chunfeng Yuan, Wenhui Zhou, Yun Tang & Juan Zhou
ZTE Corporation, Nanjing, 210012, China
Shengmei Luo
Guangxi Normal University, Guilin, 541000, China
Wei Ge

Authors

Wei Ge
View author publications
You can also search for this author in PubMed Google Scholar
Yihua Huang
View author publications
You can also search for this author in PubMed Google Scholar
Di Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Shengmei Luo
View author publications
You can also search for this author in PubMed Google Scholar
Chunfeng Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Wenhui Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yun Tang
View author publications
You can also search for this author in PubMed Google Scholar
Juan Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Sun Yat-sen University, Guangzhou, P.R. China
Xudong Luo
The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Jeffrey Xu Yu
Guanxi Normal University, Guilin, P.R. China
Zhi Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ge, W. et al. (2014). CinHBa: A Secondary Index with Hotscore Caching Policy on Key-Value Data Store. In: Luo, X., Yu, J.X., Li, Z. (eds) Advanced Data Mining and Applications. ADMA 2014. Lecture Notes in Computer Science(), vol 8933. Springer, Cham. https://doi.org/10.1007/978-3-319-14717-8_47

Download citation

DOI: https://doi.org/10.1007/978-3-319-14717-8_47
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14716-1
Online ISBN: 978-3-319-14717-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics