An Efficient Bulk Loading Approach of Secondary Index in Distributed Log-Structured Data Stores

Zhu, Yanchao; Zhang, Zhao; Cai, Peng; Qian, Weining; Zhou, Aoying

doi:10.1007/978-3-319-55753-3_6

Yanchao Zhu¹⁸,
Zhao Zhang^18,19,
Peng Cai¹⁸,
Weining Qian¹⁸ &
…
Aoying Zhou¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10177))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

3169 Accesses
5 Citations

Abstract

How to improve reading performance of Log-Structured-Merge (LSM)-tree gains much attention recently. Meanwhile, constructing secondary index for LSM data stores is a popular solution. And bulk loading of secondary index is inevitable when a new application is developed on an existing LSM data stores. However, to the best of our knowledge there are few studies on research of bulk loading of secondary index in distributed LSM-tree. In this paper, we study the performance improvement of bulk loading of secondary index in distributed LSM-tree data stores. We propose an efficient bulk loading approach of secondary index in Log-Structured Data Stores. Firstly, we design secondary index structure based on distributed LSM-tree to guarantee the scalability and consistency of secondary index. Secondly, we propose an efficient framework to handle bulk loading of secondary index in a distributed environment, which can provide a good load balancing for query processing by using equal-depth histogram to capture data distribution. Analysis of theoretical and experimental results on standard benchmark illustrate the efficacy of the proposed methods in a distributed environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Apache HBase website. http://hbase.apache.org/
CDEAR website. https://github.com/daseECNU/Cedar/
LevelDB website. http://leveldb.org/
OceanBase website. https://github.com/alibaba/oceanbase/
PHOENIX website. http://phoenix.apache.org/
Secondary Index for HBase. https://github.com/Huawei-Hadoop/hindex
SOLR website. http://lucene.apache.org/solr/
Sysbench website. http://dev.mysql.com/downloads/benchmarks.html
Alsubaiee, S., Asterixdb, A., et al.: A scalable, open source bdms. Proc. VLDB Endowment 7(14), 1905–1916 (2014)
Article Google Scholar
Brewer, E.: Pushing the cap: strategies for consistency and availability. Computer 45(2), 23–29 (2012)
Article Google Scholar
Chang, F., Dean, J., Bigtable, G., et al.: A distributed storage system for structured data. ACM Trans. Comput. Syst. (TOCS) 26(2), 4 (2008)
Article Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Conference on Symposium on Opearting Systems Design & Implementation, pp. 107–113 (2004)
Google Scholar
ONeil, P., Cheng, E., Gawlick, D., O’Neil, E.: The log-structured merge-tree (lsm-tree). Acta Informatica 33(4), 351–385 (1996)
Google Scholar
Tan, W., Tata, S., Tang, Y., Fong, L.L.: Diff-index: differentiated index in distributed log-structured data stores. In: EDBT, pp. 700–711 (2014)
Google Scholar
Zou, Y., Liu, J., Wang, S., Zha, L., Xu, Z.: CCIndex: a complemental clustering index on distributed ordered tables for multi-dimensional range queries. In: Ding, C., Shao, Z., Zheng, R. (eds.) NPC 2010. LNCS, vol. 6289, pp. 247–261. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15672-4_22
Chapter Google Scholar

Download references

Acknowledgements

This work is partially supported by National High-tech R&D Program (863 Program) under grant number 2015AA015307, National Science Foundation of China under grant numbers 61402180, 61432006 and 61672232, Natural Science Foundation of Shanghai under grant numbers 14ZR1412600, and Guangxi Key Laboratory of Trusted Software (kx201602). The corresponding author is Zhao Zhang.

Author information

Authors and Affiliations

School of Data Science and Engineering, East China Normal University, Shanghai, China
Yanchao Zhu, Zhao Zhang, Peng Cai, Weining Qian & Aoying Zhou
School of Computer Science and Software Engineering, East China Normal University, Shanghai, China
Zhao Zhang

Authors

Yanchao Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Zhao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Peng Cai
View author publications
You can also search for this author in PubMed Google Scholar
Weining Qian
View author publications
You can also search for this author in PubMed Google Scholar
Aoying Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhao Zhang .

Editor information

Editors and Affiliations

Arizona State University , Tempe - Phoenix, Arizona, USA
Selçuk Candan
Hong Kong University of Science and Tech , Hong Kong, China
Lei Chen
Aalborg University , Aalborg, Denmark
Torben Bach Pedersen
University of New South Wales , Sydney, New South Wales, Australia
Lijun Chang
The University of Queensland , Brisbane, Queensland, Australia
Wen Hua

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, Y., Zhang, Z., Cai, P., Qian, W., Zhou, A. (2017). An Efficient Bulk Loading Approach of Secondary Index in Distributed Log-Structured Data Stores. In: Candan, S., Chen, L., Pedersen, T., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10177. Springer, Cham. https://doi.org/10.1007/978-3-319-55753-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-55753-3_6
Published: 22 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55752-6
Online ISBN: 978-3-319-55753-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics