Skip to main content

Apara: Workload-Aware Data Partition and Replication for Parallel Databases

  • Conference paper
  • First Online:
Web and Big Data (APWeb-WAIM 2019)

Abstract

Data partition and replication mechanisms directly determine query execution patterns in parallel database systems, which have a great impact on system performance. Recently, there have been some workload-aware data storage techniques, but they suffer from problems of narrow support to complex workloads or large requirements for storage. In order to enable the support for complex analytical workloads over massive distributed database systems, we design and implement a workload-aware data partition and replication tool, called Apara. We design two heuristic algorithms and define two cost models for effective data partition calculation and efficient replication usages. We run a set of experiments to compare and demonstrate the performance between Apara and the other representative work. The results show that Apara consistently outperforms the primary solutions on TPC-H workloads.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agrawal, S., Narasayya, V., Yang, B.: Integrating vertical and horizontal partitioning into automated physical database design. In: SIGMOD (2004)

    Google Scholar 

  2. Curino, C., Jones, E., et al.: Schism: a workload-driven approach to database replication and partitioning. In: VLDB (2010)

    Google Scholar 

  3. DeWitt, D.J., Ghandeharizadeh, S., et al.: The gamma database machine project. In: TKDE (1990)

    Article  Google Scholar 

  4. Eadon, G., Chong, E.I., et al.: Supporting table partitioning by reference in oracle. In: SIGMOD (2008)

    Google Scholar 

  5. Fushimi, S., Kitsuregawa, M., Tanaka, H.: An overview of the system software of a parallel relational database machine grace. In: VLDB (1986)

    Google Scholar 

  6. GreenPlumDB. https://greenplum.org/

  7. Grund, M., Krüger, J., et al.: Hyrise: a main memory hybrid storage engine. In: VLDB (2010)

    Google Scholar 

  8. Iptraf. http://iptraf.seul.org/

  9. Lu, Y., Shanbhag, A., et al.: AdaptDB: adaptive partitioning for distributed joins. In: VLDB (2017)

    Article  Google Scholar 

  10. Navathe, S., Ceri, S., et al.: Vertical partitioning algorithms for database design. In: TODS (1984)

    Article  Google Scholar 

  11. Nehme, R., Bruno, N.: Automated partitioning design in parallel database systems. In: SIGMOD (2011)

    Google Scholar 

  12. Pavlo, A., Curino, C., Zdonik, S.: Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In: SIGMOD (2012)

    Google Scholar 

  13. Quamar, A., Kumar, K.A., Deshpande, A.: SWORD: scalable workload-aware data placement for transactional workloads. In: EDBT (2013)

    Google Scholar 

  14. Rodiger, W., Muhlbauer, T., et al.: Locality-sensitive operators for parallel main-memory database clusters. In: ICDE (2014)

    Google Scholar 

  15. Sacca, D., Wiederhold, G.: Database partitioning in a cluster of processors. In: TODS (1985)

    Article  Google Scholar 

  16. Shanbhag, A., Jindal, A., et al.: A robust partitioning scheme for ad-hoc query workloads. In: SoCC (2017)

    Google Scholar 

  17. TPC-H. http://www.tpc.org/tpch/

  18. Waas, F.M.: Beyond conventional data warehousing—massively parallel data processing with greenplum database. In: BIITE (2008)

    Google Scholar 

  19. Whitley, D.: A genetic algorithm tutorial. Stat. Comput. 4(2), 65–85 (1994)

    Article  Google Scholar 

  20. Zamanian, E., Binnig, C., Salama, A.: Locality-aware partitioning in parallel database systems. In: SIGMOD (2015)

    Google Scholar 

Download references

Acknowledgment

We are supported by National Key Projects (No. 2018YFB1003404) and National Science Foundation of China (No. 61572194).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rong Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, X., Zhang, C., Li, Y., Zhang, R., Zhou, A. (2019). Apara: Workload-Aware Data Partition and Replication for Parallel Databases. In: Shao, J., Yiu, M., Toyoda, M., Zhang, D., Wang, W., Cui, B. (eds) Web and Big Data. APWeb-WAIM 2019. Lecture Notes in Computer Science(), vol 11642. Springer, Cham. https://doi.org/10.1007/978-3-030-26075-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26075-0_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26074-3

  • Online ISBN: 978-3-030-26075-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics