Skip to main content

CinHBa: A Secondary Index with Hotscore Caching Policy on Key-Value Data Store

  • Conference paper
Advanced Data Mining and Applications (ADMA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8933))

Included in the following conference series:

Abstract

We are now entering the era of big data. HBase comes out to organize data as key-value pairs and support fast queries on rowkeys, but queries on non-rowkey column are a blind spot of HBase. It is the main topic of this paper to provide high-performance query capability on non-rowkey column. An effective secondary index model is proposed, and the prototype system CinHBa is implemented. Furthermore, a novel caching policy, Hotscore Algorithm, is introduced in CinHBa to cache hottest index data into memory to improve query performance. Experiment evaluation shows that query response time of CinHBa is far less than native HBase without secondary index on 10M records. Besides that, CinHBa has good data scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. DBMS2: DataBase Management System Services, http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive

  2. Huawei Hindex, https://github.com/Huawei-Hadoop/hindex

  3. Corbato, F.: A Paging Experiment with the Multics System. MIT Project MAC Report MAC-M-384 (1968)

    Google Scholar 

  4. Ungureanu, C., Debnath, B., Rago, S., Aranya, A.: TBF: A memory-efficient replacement policy for flash-based caches. In: 29th IEEE International Conference onData Engineering Brisbane (ICDE), pp. 1117–1128. IEEE Press, Brisbane (2013)

    Google Scholar 

  5. A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks, http://database.cs.brown.edu/projects/mapreduce-vs-dbms

  6. Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A Comparison of Approaches to Large-scale Data Analysis. In: 35th International Conference on Management of Data, New York, pp. 165–178 (2009)

    Google Scholar 

  7. Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking Cloud Serving Systems with YCSB. In: 1st ACM Symposium on Cloud Computing, Santa Clara, CA, pp. 143–154 (2010)

    Google Scholar 

  8. Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J., Rasin, A., Silberschatz, A.: HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. In: 35th International Conference on Very Large Data Bases, Lyon, pp. 922–933 (2009)

    Google Scholar 

  9. Dittrich, J., Quian-Ruiz, J., Jindal, A., Kargin, Y., Setty, V., Schad, J.: Hadoop++: Making a Yellow Elephant Run Like a Cheetah (WithoutIt Even Noticing). In: 36th International Conference on Very Large Data Bases, Singapore, pp. 518–529 (2010)

    Google Scholar 

  10. Dittrich, J., Quian-Ruiz, J., Richter, S., Schuh, S., Jindal, A., Schad, J.: Only Aggressive Elephants are Fast Elephants. In: 38th International Conference on Very Large Data Bases, Istanbul, pp. 1591–1602 (2012)

    Google Scholar 

  11. Sfakianakis, G., Patlakas, I., Ntarmos, N., Triantafillou, P.: Interval Indexing and Querying on Key-value Cloud Stores. In: 29th IEEE International Conference on Data Engineering (ICDE), pp. 805–816. IEEE Press, Brisbane (2013)

    Google Scholar 

  12. Bentley, J.L.: Solutions to Klee’s Rectangle Problem, Technical Report, Carnegie-Mellon University, Pittsburgh (1977)

    Google Scholar 

  13. Dean, J., Ghemawat, S.: MapReduce: a Flexible Data Processing Tool. Communications of the ACM 53(1), 72–77 (2010)

    Article  Google Scholar 

  14. Levandoski, J.J., Larson, P., Stoica, R.: Identifying Hot and Cold Data in Main-Memory Databases. In: 29th IEEE International Conference on Data Engineering (ICDE), pp. 26–37. IEEE Press, Brisbane (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Ge, W. et al. (2014). CinHBa: A Secondary Index with Hotscore Caching Policy on Key-Value Data Store. In: Luo, X., Yu, J.X., Li, Z. (eds) Advanced Data Mining and Applications. ADMA 2014. Lecture Notes in Computer Science(), vol 8933. Springer, Cham. https://doi.org/10.1007/978-3-319-14717-8_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14717-8_47

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14716-1

  • Online ISBN: 978-3-319-14717-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics