Skip to main content

Towards Analytics-as-a-Service Using an In-Memory Column Database

  • Conference paper
New Frontiers in Information and Software as Services

Abstract

For traditional data warehouses, mostly large and expensive server and storage systems are used. For small- and medium size companies, it is often too expensive to implement and run such systems. Given this situation, the SaaS model comes in handy, since these companies might opt to run their OLAP as a service. The challenge is then for the analytics service provider to minimize TCO by consolidating as many tenants onto as few servers as possible, a technique often referred to as multi-tenancy.

In this article, we report on three different results on our research around building a cluster of multi-tenant main memory column databases for analytics as a service. For this purpose we ported SAP’s in-memory column database TREX to run in the Amazon cloud. We evaluated the relation between data size of a tenant and number of queries per second and created a formula which allows us to estimate how many tenants with different sizes and request rates can be put on one instance for our main memory database. We discuss findings on cost/performance tradeoffs between reliably storing the data of a tenant on a single node using a highly-available network attached storage, such as Amazon EBS, vs. replication of tenant data to a secondary node where the data resides on less resilient storage. We also describe a mechanism to provide support for historical queries across older snapshots of tenant data which is lazy-loaded from Amazon’s S3 near-line archiving storage and cached on the local VM disks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brewer, E.A.: Towards robust distributed systems (abstract). In: Neiger [10], p. 7

    Google Scholar 

  2. Carper, I.L., Harvey, S., Wetherbe, J.C.: Computer capacity planning: strategy and methodologies. SIGMIS Database 14(4), 3–13 (1983)

    Article  Google Scholar 

  3. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. In: SOSP 2007: Proceedings of Twenty-First ACM SIGOPS Symposium on Operating Systems Principles, pp. 205–220. ACM, New York (2007)

    Chapter  Google Scholar 

  4. Glorioso, R.M., Desautels, R.E.: Disaster recovery or disaster tolerance: The choice is yours

    Google Scholar 

  5. Härder, T., Reuter, A.: Principles of transaction-oriented database recovery. ACM Comput. Surv. 15(4), 287–317 (1983)

    Article  MathSciNet  Google Scholar 

  6. Lau, E., Madden, S.: An integrated approach to recovery and high availability in an updatable, distributed data warehouse. In: Dayal, U., Whang, K.-Y., Lomet, D.B., Alonso, G., Lohman, G.M., Kersten, M.L., Cha, S.K., Kim, Y.-K. (eds.) VLDB, pp. 703–714. ACM, New York (2006)

    Google Scholar 

  7. Lazowska, E.D., Zahorjan, J., Graham, G.S., Sevcik, K.C.: Quantitative system performance: computer system analysis using queueing network models. Prentice-Hall, Inc., Upper Saddle River (1984)

    Google Scholar 

  8. Legler, T., Lehner, W., Ross, A.: Data Mining with the SAP Netweaver BI Accelerator. In: VLDB, pp. 1059–1068 (2006)

    Google Scholar 

  9. Majumdar, D.: A Quick Survey of MultiVersion Concurrency Algorithms (2006)

    Google Scholar 

  10. Neiger, G. (ed.): Proceedings of the Nineteenth Annual ACM Symposium on Principles of Distributed Computing, Portland, Oregon, USA, July 16-19. ACM, New York (2000)

    Google Scholar 

  11. O’Neil, P.E., O’Neil, E.J., Chen, X.: The Star Schema Benchmark, SSB (2007), http://www.cs.umb.edu/~poneil/StarSchemaB.PDF

  12. Plattner, H.: A common database approach for oltp and olap using an in-memory column database. In: SIGMOD 2009: Proceedings of the 35th SIGMOD International Conference on Management of Data, pp. 1–2. ACM, New York (2009)

    Google Scholar 

  13. Buyya, R.R., Ranjan, R., Calheiros, R.N.: Modeling and simulation of scalable cloud computing environments and the cloudsim toolkit: Challenges and opportunities. In: Proc. of the 7th High Performance Computing and Simulation (HPCS 2009), p. 11 (2009)

    Google Scholar 

  14. Stonebraker, M.: The design of the Postgres storage system. In: Readings in Object-Oriented Database Systems, p. 286 (1989)

    Google Scholar 

  15. Stonebraker, M.: Managing persistent objects in a multi-level store. ACM SIGMOD Record 20(2), 2–11 (1991)

    Article  Google Scholar 

  16. Stonebraker, M., Kemnitz, G.: The POSTGRES next generation database management system (1991)

    Google Scholar 

  17. TPC-H, http://www.tpc.org/tpch/

  18. Vogels, W.: Eventually consistent. Commun. ACM 52(1), 40–44 (2009)

    Article  Google Scholar 

  19. Wasserman, T.J., Martin, P., Skillicorn, D.B., Rizvi, H.: Developing a characterization of business intelligence workloads for sizing new database systems. In: DOLAP 2004: Proceedings of the 7th ACM International Workshop on Data Warehousing and OLAP, pp. 7–13. ACM, New York (2004)

    Google Scholar 

  20. Zawawy, H., Martin, P., Hassanein, H.: Supporting capacity planning for db2 udb. In: CASCON 2002: Proceedings of the 2002 Conference of the Centre for Advanced Studies on Collaborative Research, p. 15. IBM Press (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Schaffner, J., Eckart, B., Schwarz, C., Brunnert, J., Jacobs, D., Zeier, A. (2011). Towards Analytics-as-a-Service Using an In-Memory Column Database. In: Agrawal, D., Candan, K.S., Li, WS. (eds) New Frontiers in Information and Software as Services. Lecture Notes in Business Information Processing, vol 74. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19294-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19294-4_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19293-7

  • Online ISBN: 978-3-642-19294-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics