Advertisement

Apara: Workload-Aware Data Partition and Replication for Parallel Databases

  • Xiaolei Zhang
  • Chunxi Zhang
  • Yuming Li
  • Rong ZhangEmail author
  • Aoying Zhou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11642)

Abstract

Data partition and replication mechanisms directly determine query execution patterns in parallel database systems, which have a great impact on system performance. Recently, there have been some workload-aware data storage techniques, but they suffer from problems of narrow support to complex workloads or large requirements for storage. In order to enable the support for complex analytical workloads over massive distributed database systems, we design and implement a workload-aware data partition and replication tool, called Apara. We design two heuristic algorithms and define two cost models for effective data partition calculation and efficient replication usages. We run a set of experiments to compare and demonstrate the performance between Apara and the other representative work. The results show that Apara consistently outperforms the primary solutions on TPC-H workloads.

Keywords

Distributed database Workload-aware storage Partition Replication 

Notes

Acknowledgment

We are supported by National Key Projects (No. 2018YFB1003404) and National Science Foundation of China (No. 61572194).

References

  1. 1.
    Agrawal, S., Narasayya, V., Yang, B.: Integrating vertical and horizontal partitioning into automated physical database design. In: SIGMOD (2004)Google Scholar
  2. 2.
    Curino, C., Jones, E., et al.: Schism: a workload-driven approach to database replication and partitioning. In: VLDB (2010)Google Scholar
  3. 3.
    DeWitt, D.J., Ghandeharizadeh, S., et al.: The gamma database machine project. In: TKDE (1990)CrossRefGoogle Scholar
  4. 4.
    Eadon, G., Chong, E.I., et al.: Supporting table partitioning by reference in oracle. In: SIGMOD (2008)Google Scholar
  5. 5.
    Fushimi, S., Kitsuregawa, M., Tanaka, H.: An overview of the system software of a parallel relational database machine grace. In: VLDB (1986)Google Scholar
  6. 6.
  7. 7.
    Grund, M., Krüger, J., et al.: Hyrise: a main memory hybrid storage engine. In: VLDB (2010)Google Scholar
  8. 8.
  9. 9.
    Lu, Y., Shanbhag, A., et al.: AdaptDB: adaptive partitioning for distributed joins. In: VLDB (2017)CrossRefGoogle Scholar
  10. 10.
    Navathe, S., Ceri, S., et al.: Vertical partitioning algorithms for database design. In: TODS (1984)CrossRefGoogle Scholar
  11. 11.
    Nehme, R., Bruno, N.: Automated partitioning design in parallel database systems. In: SIGMOD (2011)Google Scholar
  12. 12.
    Pavlo, A., Curino, C., Zdonik, S.: Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In: SIGMOD (2012)Google Scholar
  13. 13.
    Quamar, A., Kumar, K.A., Deshpande, A.: SWORD: scalable workload-aware data placement for transactional workloads. In: EDBT (2013)Google Scholar
  14. 14.
    Rodiger, W., Muhlbauer, T., et al.: Locality-sensitive operators for parallel main-memory database clusters. In: ICDE (2014)Google Scholar
  15. 15.
    Sacca, D., Wiederhold, G.: Database partitioning in a cluster of processors. In: TODS (1985)CrossRefGoogle Scholar
  16. 16.
    Shanbhag, A., Jindal, A., et al.: A robust partitioning scheme for ad-hoc query workloads. In: SoCC (2017)Google Scholar
  17. 17.
  18. 18.
    Waas, F.M.: Beyond conventional data warehousing—massively parallel data processing with greenplum database. In: BIITE (2008)Google Scholar
  19. 19.
    Whitley, D.: A genetic algorithm tutorial. Stat. Comput. 4(2), 65–85 (1994)CrossRefGoogle Scholar
  20. 20.
    Zamanian, E., Binnig, C., Salama, A.: Locality-aware partitioning in parallel database systems. In: SIGMOD (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Xiaolei Zhang
    • 2
  • Chunxi Zhang
    • 2
  • Yuming Li
    • 2
  • Rong Zhang
    • 1
    • 2
    Email author
  • Aoying Zhou
    • 2
  1. 1.International Research Center of Trustworthy Software, Shanghai Key Laboratory of Trustworthy ComputingEast China Normal UniversityShanghaiChina
  2. 2.School of Data Science and EngineeringEast China Normal UniversityShanghaiChina

Personalised recommendations