Skip to main content

Providing Scalable Database Services on the Cloud

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6488))

Abstract

The Cloud is fast gaining popularity as a platform for deploying Software as a Service (SaaS) applications. In principle, the Cloud provides unlimited compute resources, enabling deployed services to scale seamlessly. Moreover, the pay-as-you-go model in the Cloud reduces the maintenance overhead of the applications. Given the advantages of the Cloud, it is attractive to migrate existing software to this new platform. However, challenges remain as most software applications need to be redesigned to embrace the Cloud.

In this paper, we present an overview of our current on-going work in developing epiC – an elastic and efficient power-aware data-intensive Cloud system. We discuss the design issues and the implementation of epiC’s storage system and processing engine. The storage system and the processing engine are loosely coupled, and have been designed to handle two types of workload simultaneously, namely data-intensive analytical jobs and online transactions (commonly referred as OLAP and OLTP respectively). The processing of large-scale analytical jobs in epiC adopts a phase-based processing strategy, which provides a fine-grained fault tolerance, while the processing of queries adopts indexing and filter-and-refine strategies.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. http://hadoop.apache.org/

  2. http://www.teradata.com/

  3. http://www.asterdata.com/

  4. http://www.vertica.com/

  5. http://www.greenplum.com/

  6. epiC project, http://www.comp.nus.edu.sg/~epic/

  7. Google MegaStore’s Presentation at SIGMOD (2008), http://perspectives.mvdirona.com/2008/07/10/GoogleMegastore.aspx

  8. Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: Hadoopdb: an architectural hybrid of mapreduce and dbms technologies for analytical workloads. Proc. VLDB Endow. 2(1), 922–933 (2009)

    Article  Google Scholar 

  9. Agarwal, S., Dunagan, J., Jain, N., Saroiu, S., Wolman, A., Bhogan, H.: Volley: automated data placement for geo-distributed cloud services. In: NSDI 2010: Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, Berkeley, CA, USA, pp. 2–2. USENIX Association (2010)

    Google Scholar 

  10. Cao, Y., Chen, C., Guo, F., Jiang, D., Lin, Y., Ooi, B.C., Vo, H.T., Wu, S., Xu, Q.: A cloud data storage system for supporting both oltp and olap. Technical Report, National University of Singapore, School of Computing. TRA8/10 (2010)

    Google Scholar 

  11. Chaiken, R., Jenkins, B., Larson, P.-Å., Ramsey, B., Shakib, D., Weaver, S., Zhou, J.: Scope: easy and efficient parallel processing of massive data sets. PVLDB 1(2), 1265–1276 (2008)

    Google Scholar 

  12. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: OSDI 2006: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, Berkeley, CA, USA, pp. 15–15. USENIX Association (2006)

    Google Scholar 

  13. Cooper, B.F., Ramakrishnan, R., Srivastava, U., Silberstein, A., Bohannon, P., Jacobsen, H.-A., Puz, N., Weaver, D., Yerneni, R.: Pnuts: Yahoo!’s hosted data serving platform. Proc. VLDB Endow. 1(2), 1277–1288 (2008)

    Article  Google Scholar 

  14. Das, S., Agrawal, D., Abbadi, A.E.: G-store: a scalable data store for transactional multi key access in the cloud. In: SoCC, pp. 163–174 (2010)

    Google Scholar 

  15. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. ACM Commun. 51(1), 107–113 (2008)

    Article  Google Scholar 

  16. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. SIGOPS Oper. Syst. Rev. 41(6), 205–220 (2007)

    Article  Google Scholar 

  17. DeWitt, D., Gray, J.: Parallel database systems: the future of high performance database systems. ACM Commun. 35(6), 85–98 (1992)

    Article  Google Scholar 

  18. Dewitt, D.J., Ghandeharizadeh, S., Schneider, D.A., Bricker, A., Hsiao, H.I., Rasmussen, R.: The gamma database machine project. IEEE Trans. on Knowl. and Data Eng. 2(1), 44–62 (1990)

    Article  Google Scholar 

  19. Fushimi, S., Kitsuregawa, M., Tanaka, H.: An overview of the system software of a parallel relational database machine grace. In: VLDB 1986: Proceedings of the 12th International Conference on Very Large Data Bases, pp. 209–219. Morgan Kaufmann Publishers Inc., San Francisco (1986)

    Google Scholar 

  20. Ghemawat, S., Gobioff, H., Leung, S.-T.: The google file system. In: SOSP, pp. 29–43 (2003)

    Google Scholar 

  21. Guo, F., Li, X., Ooi, B.C., Tan, K.-L.: Guinea: An efficient data processing framework on large clusters. Technical Report, National University of Singapore, School of Computing. TRA9/10 (2010)

    Google Scholar 

  22. Jagadish, H.V., Ooi, B.C., Vu, Q.H.: Baton: a balanced tree structure for peer-to-peer networks. In: VLDB, pp. 661–672 (2005)

    Google Scholar 

  23. Jiang, D., Ooi, B.C., Shi, L., Wu, S.: The performance of mapreduce: An in-depth study. Proc. VLDB Endow. 3(1), 472–483 (2010)

    Article  Google Scholar 

  24. Kraska, T., Hentschel, M., Alonso, G., Kossmann, D.: Consistency rationing in the cloud: Pay only when it matters. PVLDB 2(1), 253–264 (2009)

    Google Scholar 

  25. Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)

    Article  Google Scholar 

  26. Lomet, D., Mokbel, M.F.: Locking key ranges with unbundled transaction services. Proc. VLDB Endow. 2(1), 265–276 (2009)

    Article  Google Scholar 

  27. Lupu, M., Ooi, B.C., Tay, Y.C.: Paths to stardom: calibrating the potential of a peer-based data management system. In: SIGMOD 2008: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 265–278. ACM, New York (2008)

    Chapter  Google Scholar 

  28. Matsudaira, P.: High-end biological imaging generates very large 3d+ and dynamic datasets. Proc. VLDB Endow. (2010)

    Google Scholar 

  29. Nykiel, T., Potamias, M., Mishra, C., Kollios, G., Koudas, N.: Mrshare: Sharing across multiple queries in mapreduce. Proc. VLDB Endow. 3(1), 494–505 (2010)

    Article  MATH  Google Scholar 

  30. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: SIGMOD 2008: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1099–1110. ACM, New York (2008)

    Chapter  Google Scholar 

  31. Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: SIGMOD 2009: Proceedings of the 35th SIGMOD International Conference on Management of Data, pp. 165–178. ACM, New York (2009)

    Google Scholar 

  32. Ramakrishnan, R.: Data management challenges in the cloud. In: Proceedings of ACM SIGOPS LADIS (2009), http://www.cs.cornell.edu/projects/ladis2009/talks/ramakrishnan-keynote-ladis2009.pdf

  33. Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable content-addressable network. In: SIGCOMM, pp. 161–172 (2001)

    Google Scholar 

  34. Stoica, I., Morris, R., Liben-Nowell, D., Karger, D.R., Kaashoek, M.F., Dabek, F., Balakrishnan, H.: Chord: a scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Trans. Netw. 11(1), 17–32 (2003)

    Article  Google Scholar 

  35. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Anthony, S., Liu, H., Murthy, R.: Hive - a petabyte scale data warehouse using hadoop. In: ICDE, pp. 996–1005 (2010)

    Google Scholar 

  36. Vo, H.T., Chen, C., Ooi, B.C.: Towards elastic transactional cloud storage with range query support. Proc. VLDB Endow. 3(1), 506–517 (2010)

    Article  Google Scholar 

  37. Vu, Q., Lupu, M., Ooi, B.C.: Peer-to-peer computing: Principles and applications. Springer, Heidelberg (2009)

    Google Scholar 

  38. Wang, J., Wu, S., Gao, H., Li, J., Ooi, B.C.: Indexing multi-dimensional data in a cloud system. In: SIGMOD 2010: Proceedings of the 2010 International Conference on Management of Data, pp. 591–602. ACM, New York (2010)

    Google Scholar 

  39. Wu, S., Jiang, D., Ooi, B.C., Wu, K.-L.: Efficient b-tree based indexing for cloud data processing. Proc. VLDB Endow. 3(1), 1207–1218 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, C. et al. (2010). Providing Scalable Database Services on the Cloud. In: Chen, L., Triantafillou, P., Suel, T. (eds) Web Information Systems Engineering – WISE 2010. WISE 2010. Lecture Notes in Computer Science, vol 6488. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17616-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17616-6_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17615-9

  • Online ISBN: 978-3-642-17616-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics