Skip to main content

Towards Dynamic Data Placement for Polystore Ingestion

  • Conference paper
  • First Online:
  • 366 Accesses

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 337))

Abstract

Integrating low-latency data streaming into data warehouse architectures has become an important enhancement to support modern data warehousing applications. In these architectures, heterogeneous workloads with data ingestion and analytical queries must be executed with strict performance guarantees. Furthermore, the data warehouse may consists of multiple different types of storage engines (a.k.a., polystores or multi-stores). A paramount problem is data placement; different workload scenarios call for different data placement designs. Moreover, workload conditions change frequently. In this paper, we provide evidence that a dynamic, workload-driven approach is needed for data placement in polystores with low-latency data ingestion support. We study the problem based on the characteristics of the TPC-DI benchmark in the context of an abbreviated polystore that consists of S-Store and Postgres.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Kappa Architecture. https://www.oreilly.com/ideas/questioning-the-lambda-architecture

  2. Lambda Architecture. http://lambda-architecture.net/

  3. Neo4j. https://neo4j.com/

  4. Altinel, M., Bornhovd, C., Krishnamurthy, S., Mohan, C., Pirahesh, H., Reinwald, B.: Cache tables: paving the way for an adaptive database cache. In: VLDB, pp. 718–729 (2003)

    Google Scholar 

  5. Barber, R., et al.: Wildfire: concurrent blazing data ingest and analytics. In: SIGMOD, pp. 2077–2080 (2016)

    Google Scholar 

  6. Bruno, N., Chaudhuri, S.: An online approach to physical design tuning. In: ICDE, pp. 826–835 (2007)

    Google Scholar 

  7. Cetintemel, U., et al.: S-Store: a streaming NewSQL system for big velocity applications. PVLDB 7(13), 1633–1636 (2014)

    Google Scholar 

  8. Cudre-Mauroux, P., et al.: A demonstration of SciDB: a science-oriented DBMS. PVLDB 2(2), 1534–1537 (2009)

    Google Scholar 

  9. DeBrabant, J., Pavlo, A., Tu, S., Stonebraker, M., Zdonik, S.: Anti-caching: a new approach to database management system architecture. PVLDB 6(14), 1942–1953 (2013)

    Google Scholar 

  10. Du, J., Glavic, B., Tan, W., Miller, R.J.: DeepSea: progressive workload-aware partitioning of materialized views in scalable data analytics. In: EDBT, pp. 198–209 (2017)

    Google Scholar 

  11. Elmore, A., et al.: A demonstration of the BigDAWG polystore system. PVLDB 8(12), 1908–1911 (2015)

    Google Scholar 

  12. Fernandez, R.C., et al.: Liquid: unifying nearline and offline big data integration. In: CIDR (2015)

    Google Scholar 

  13. Fitzpatrick, B.: Distributed caching with memcached. Linux J. 124, 5–5 (2004)

    Google Scholar 

  14. Golab, L., Johnson, T., Seidel, J.S., Shkapenyuk, V.: Stream warehousing with DataDepot. In: SIGMOD, pp. 847–854 (2009)

    Google Scholar 

  15. Josifovski, V., Schwarz, P., Haas, L., Lin, E.: Garlic: a new flavor of federated query processing for DB2. In: SIGMOD, pp. 524–532 (2002)

    Google Scholar 

  16. Kallman, R., et al.: H-Store: a high-performance, distributed main memory transaction processing system. PVLDB 1(2), 1496–1499 (2008)

    Google Scholar 

  17. Kreps, J., Narkhede, N., Rao, J.: Kafka: a distributed messaging system for log processing. In: NetDB Workshop (2011)

    Google Scholar 

  18. LeFevre, J., Sankaranarayanan, J., Hacigumus, H., Tatemura, J., Polyzotis, N., Carey, M.J.: MISO: souping up big data query processing with a multistore system. In: SIGMOD, pp. 1591–1602 (2014)

    Google Scholar 

  19. Meehan, J., Aslantas, C., Zdonik, S., Tatbul, N., Du, J.: Data ingestion for the connected world. In: CIDR (2017)

    Google Scholar 

  20. Meehan, J., et al.: S-Store: streaming meets transaction processing. PVDLB 8(13), 2134–2145 (2015)

    MathSciNet  Google Scholar 

  21. Meehan, J., et al.: Integrating real-time and batch processing in a polystore. In: IEEE HPEC (2016)

    Google Scholar 

  22. Özsu, M.T., Valduriez, P.: Distributed database systems: where are we now? IEEE Comput. 24(8), 68–78 (1991)

    Article  Google Scholar 

  23. Poess, M., Rabl, T., Jacobsen, H., Caufield, B.: TPC-DI: the first industry benchmark for data integration. PVLDB 7(13), 1367–1378 (2014)

    Google Scholar 

  24. Stonebraker, M., et al.: C-store: a column-oriented DBMS. In: VLDB, pp. 553–564 (2005)

    Google Scholar 

  25. Stonebraker, M., Cetintemel, U.: “One size fits all”: an idea whose time has come and gone. In: ICDE, pp. 2–11 (2005)

    Google Scholar 

  26. Tatbul, N., et al.: Handling shared, mutable state in stream processing with correctness guarantees. IEEE Data Eng. Bull. Special Issue Next-Gener. Stream Process. 38(4), 94–104 (2015)

    Google Scholar 

  27. Vassiliadis, P., Simitsis, A.: Near real-time ETL. In: Kozielski, S., Wrembel, R. (eds.) New Trends in Data Warehousing and Data Analysis, pp. 1–31. Springer, Boston (2009). https://doi.org/10.1007/978-0-387-87431-9_2

    Chapter  Google Scholar 

Download references

Acknowledgments

We thank Renee J. Miller and Boris Glavic for reviewing the work. We also thank the anonymous reviewers and the BIRTE 2017 workshop attendees for their helpful suggestions. This research is funded in part by a Bell Canada Fellowship, NSERC, the Intel Science and Technology Center for Big Data, and the NSF under grant NSF IIS-1111423.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiang Du .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Du, J., Meehan, J., Tatbul, N., Zdonik, S. (2019). Towards Dynamic Data Placement for Polystore Ingestion. In: Castellanos, M., Chrysanthis, P., Pelechrinis, K. (eds) Real-Time Business Intelligence and Analytics. BIRTE BIRTE BIRTE 2015 2016 2017. Lecture Notes in Business Information Processing, vol 337. Springer, Cham. https://doi.org/10.1007/978-3-030-24124-7_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-24124-7_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-24123-0

  • Online ISBN: 978-3-030-24124-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics