Abstract
For traditional data warehouses, mostly large and expensive server and storage systems are used. For small- and medium size companies, it is often too expensive to implement and run such systems. Given this situation, the SaaS model comes in handy, since these companies might opt to run their OLAP as a service. The challenge is then for the analytics service provider to minimize TCO by consolidating as many tenants onto as few servers as possible, a technique often referred to as multi-tenancy.
In this article, we report on three different results on our research around building a cluster of multi-tenant main memory column databases for analytics as a service. For this purpose we ported SAP’s in-memory column database TREX to run in the Amazon cloud. We evaluated the relation between data size of a tenant and number of queries per second and created a formula which allows us to estimate how many tenants with different sizes and request rates can be put on one instance for our main memory database. We discuss findings on cost/performance tradeoffs between reliably storing the data of a tenant on a single node using a highly-available network attached storage, such as Amazon EBS, vs. replication of tenant data to a secondary node where the data resides on less resilient storage. We also describe a mechanism to provide support for historical queries across older snapshots of tenant data which is lazy-loaded from Amazon’s S3 near-line archiving storage and cached on the local VM disks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brewer, E.A.: Towards robust distributed systems (abstract). In: Neiger [10], p. 7
Carper, I.L., Harvey, S., Wetherbe, J.C.: Computer capacity planning: strategy and methodologies. SIGMIS Database 14(4), 3–13 (1983)
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. In: SOSP 2007: Proceedings of Twenty-First ACM SIGOPS Symposium on Operating Systems Principles, pp. 205–220. ACM, New York (2007)
Glorioso, R.M., Desautels, R.E.: Disaster recovery or disaster tolerance: The choice is yours
Härder, T., Reuter, A.: Principles of transaction-oriented database recovery. ACM Comput. Surv. 15(4), 287–317 (1983)
Lau, E., Madden, S.: An integrated approach to recovery and high availability in an updatable, distributed data warehouse. In: Dayal, U., Whang, K.-Y., Lomet, D.B., Alonso, G., Lohman, G.M., Kersten, M.L., Cha, S.K., Kim, Y.-K. (eds.) VLDB, pp. 703–714. ACM, New York (2006)
Lazowska, E.D., Zahorjan, J., Graham, G.S., Sevcik, K.C.: Quantitative system performance: computer system analysis using queueing network models. Prentice-Hall, Inc., Upper Saddle River (1984)
Legler, T., Lehner, W., Ross, A.: Data Mining with the SAP Netweaver BI Accelerator. In: VLDB, pp. 1059–1068 (2006)
Majumdar, D.: A Quick Survey of MultiVersion Concurrency Algorithms (2006)
Neiger, G. (ed.): Proceedings of the Nineteenth Annual ACM Symposium on Principles of Distributed Computing, Portland, Oregon, USA, July 16-19. ACM, New York (2000)
O’Neil, P.E., O’Neil, E.J., Chen, X.: The Star Schema Benchmark, SSB (2007), http://www.cs.umb.edu/~poneil/StarSchemaB.PDF
Plattner, H.: A common database approach for oltp and olap using an in-memory column database. In: SIGMOD 2009: Proceedings of the 35th SIGMOD International Conference on Management of Data, pp. 1–2. ACM, New York (2009)
Buyya, R.R., Ranjan, R., Calheiros, R.N.: Modeling and simulation of scalable cloud computing environments and the cloudsim toolkit: Challenges and opportunities. In: Proc. of the 7th High Performance Computing and Simulation (HPCS 2009), p. 11 (2009)
Stonebraker, M.: The design of the Postgres storage system. In: Readings in Object-Oriented Database Systems, p. 286 (1989)
Stonebraker, M.: Managing persistent objects in a multi-level store. ACM SIGMOD Record 20(2), 2–11 (1991)
Stonebraker, M., Kemnitz, G.: The POSTGRES next generation database management system (1991)
TPC-H, http://www.tpc.org/tpch/
Vogels, W.: Eventually consistent. Commun. ACM 52(1), 40–44 (2009)
Wasserman, T.J., Martin, P., Skillicorn, D.B., Rizvi, H.: Developing a characterization of business intelligence workloads for sizing new database systems. In: DOLAP 2004: Proceedings of the 7th ACM International Workshop on Data Warehousing and OLAP, pp. 7–13. ACM, New York (2004)
Zawawy, H., Martin, P., Hassanein, H.: Supporting capacity planning for db2 udb. In: CASCON 2002: Proceedings of the 2002 Conference of the Centre for Advanced Studies on Collaborative Research, p. 15. IBM Press (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schaffner, J., Eckart, B., Schwarz, C., Brunnert, J., Jacobs, D., Zeier, A. (2011). Towards Analytics-as-a-Service Using an In-Memory Column Database. In: Agrawal, D., Candan, K.S., Li, WS. (eds) New Frontiers in Information and Software as Services. Lecture Notes in Business Information Processing, vol 74. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19294-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-19294-4_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19293-7
Online ISBN: 978-3-642-19294-4
eBook Packages: Computer ScienceComputer Science (R0)