Abstract
Integrating low-latency data streaming into data warehouse architectures has become an important enhancement to support modern data warehousing applications. In these architectures, heterogeneous workloads with data ingestion and analytical queries must be executed with strict performance guarantees. Furthermore, the data warehouse may consists of multiple different types of storage engines (a.k.a., polystores or multi-stores). A paramount problem is data placement; different workload scenarios call for different data placement designs. Moreover, workload conditions change frequently. In this paper, we provide evidence that a dynamic, workload-driven approach is needed for data placement in polystores with low-latency data ingestion support. We study the problem based on the characteristics of the TPC-DI benchmark in the context of an abbreviated polystore that consists of S-Store and Postgres.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Kappa Architecture. https://www.oreilly.com/ideas/questioning-the-lambda-architecture
Lambda Architecture. http://lambda-architecture.net/
Neo4j. https://neo4j.com/
Altinel, M., Bornhovd, C., Krishnamurthy, S., Mohan, C., Pirahesh, H., Reinwald, B.: Cache tables: paving the way for an adaptive database cache. In: VLDB, pp. 718–729 (2003)
Barber, R., et al.: Wildfire: concurrent blazing data ingest and analytics. In: SIGMOD, pp. 2077–2080 (2016)
Bruno, N., Chaudhuri, S.: An online approach to physical design tuning. In: ICDE, pp. 826–835 (2007)
Cetintemel, U., et al.: S-Store: a streaming NewSQL system for big velocity applications. PVLDB 7(13), 1633–1636 (2014)
Cudre-Mauroux, P., et al.: A demonstration of SciDB: a science-oriented DBMS. PVLDB 2(2), 1534–1537 (2009)
DeBrabant, J., Pavlo, A., Tu, S., Stonebraker, M., Zdonik, S.: Anti-caching: a new approach to database management system architecture. PVLDB 6(14), 1942–1953 (2013)
Du, J., Glavic, B., Tan, W., Miller, R.J.: DeepSea: progressive workload-aware partitioning of materialized views in scalable data analytics. In: EDBT, pp. 198–209 (2017)
Elmore, A., et al.: A demonstration of the BigDAWG polystore system. PVLDB 8(12), 1908–1911 (2015)
Fernandez, R.C., et al.: Liquid: unifying nearline and offline big data integration. In: CIDR (2015)
Fitzpatrick, B.: Distributed caching with memcached. Linux J. 124, 5–5 (2004)
Golab, L., Johnson, T., Seidel, J.S., Shkapenyuk, V.: Stream warehousing with DataDepot. In: SIGMOD, pp. 847–854 (2009)
Josifovski, V., Schwarz, P., Haas, L., Lin, E.: Garlic: a new flavor of federated query processing for DB2. In: SIGMOD, pp. 524–532 (2002)
Kallman, R., et al.: H-Store: a high-performance, distributed main memory transaction processing system. PVLDB 1(2), 1496–1499 (2008)
Kreps, J., Narkhede, N., Rao, J.: Kafka: a distributed messaging system for log processing. In: NetDB Workshop (2011)
LeFevre, J., Sankaranarayanan, J., Hacigumus, H., Tatemura, J., Polyzotis, N., Carey, M.J.: MISO: souping up big data query processing with a multistore system. In: SIGMOD, pp. 1591–1602 (2014)
Meehan, J., Aslantas, C., Zdonik, S., Tatbul, N., Du, J.: Data ingestion for the connected world. In: CIDR (2017)
Meehan, J., et al.: S-Store: streaming meets transaction processing. PVDLB 8(13), 2134–2145 (2015)
Meehan, J., et al.: Integrating real-time and batch processing in a polystore. In: IEEE HPEC (2016)
Özsu, M.T., Valduriez, P.: Distributed database systems: where are we now? IEEE Comput. 24(8), 68–78 (1991)
Poess, M., Rabl, T., Jacobsen, H., Caufield, B.: TPC-DI: the first industry benchmark for data integration. PVLDB 7(13), 1367–1378 (2014)
Stonebraker, M., et al.: C-store: a column-oriented DBMS. In: VLDB, pp. 553–564 (2005)
Stonebraker, M., Cetintemel, U.: “One size fits all”: an idea whose time has come and gone. In: ICDE, pp. 2–11 (2005)
Tatbul, N., et al.: Handling shared, mutable state in stream processing with correctness guarantees. IEEE Data Eng. Bull. Special Issue Next-Gener. Stream Process. 38(4), 94–104 (2015)
Vassiliadis, P., Simitsis, A.: Near real-time ETL. In: Kozielski, S., Wrembel, R. (eds.) New Trends in Data Warehousing and Data Analysis, pp. 1–31. Springer, Boston (2009). https://doi.org/10.1007/978-0-387-87431-9_2
Acknowledgments
We thank Renee J. Miller and Boris Glavic for reviewing the work. We also thank the anonymous reviewers and the BIRTE 2017 workshop attendees for their helpful suggestions. This research is funded in part by a Bell Canada Fellowship, NSERC, the Intel Science and Technology Center for Big Data, and the NSF under grant NSF IIS-1111423.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Du, J., Meehan, J., Tatbul, N., Zdonik, S. (2019). Towards Dynamic Data Placement for Polystore Ingestion. In: Castellanos, M., Chrysanthis, P., Pelechrinis, K. (eds) Real-Time Business Intelligence and Analytics. BIRTE BIRTE BIRTE 2015 2016 2017. Lecture Notes in Business Information Processing, vol 337. Springer, Cham. https://doi.org/10.1007/978-3-030-24124-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-24124-7_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24123-0
Online ISBN: 978-3-030-24124-7
eBook Packages: Computer ScienceComputer Science (R0)