Abstract
The Cloud is fast gaining popularity as a platform for deploying Software as a Service (SaaS) applications. In principle, the Cloud provides unlimited compute resources, enabling deployed services to scale seamlessly. Moreover, the pay-as-you-go model in the Cloud reduces the maintenance overhead of the applications. Given the advantages of the Cloud, it is attractive to migrate existing software to this new platform. However, challenges remain as most software applications need to be redesigned to embrace the Cloud.
In this paper, we present an overview of our current on-going work in developing epiC – an elastic and efficient power-aware data-intensive Cloud system. We discuss the design issues and the implementation of epiC’s storage system and processing engine. The storage system and the processing engine are loosely coupled, and have been designed to handle two types of workload simultaneously, namely data-intensive analytical jobs and online transactions (commonly referred as OLAP and OLTP respectively). The processing of large-scale analytical jobs in epiC adopts a phase-based processing strategy, which provides a fine-grained fault tolerance, while the processing of queries adopts indexing and filter-and-refine strategies.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
epiC project, http://www.comp.nus.edu.sg/~epic/
Google MegaStore’s Presentation at SIGMOD (2008), http://perspectives.mvdirona.com/2008/07/10/GoogleMegastore.aspx
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: Hadoopdb: an architectural hybrid of mapreduce and dbms technologies for analytical workloads. Proc. VLDB Endow. 2(1), 922–933 (2009)
Agarwal, S., Dunagan, J., Jain, N., Saroiu, S., Wolman, A., Bhogan, H.: Volley: automated data placement for geo-distributed cloud services. In: NSDI 2010: Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, Berkeley, CA, USA, pp. 2–2. USENIX Association (2010)
Cao, Y., Chen, C., Guo, F., Jiang, D., Lin, Y., Ooi, B.C., Vo, H.T., Wu, S., Xu, Q.: A cloud data storage system for supporting both oltp and olap. Technical Report, National University of Singapore, School of Computing. TRA8/10 (2010)
Chaiken, R., Jenkins, B., Larson, P.-Å., Ramsey, B., Shakib, D., Weaver, S., Zhou, J.: Scope: easy and efficient parallel processing of massive data sets. PVLDB 1(2), 1265–1276 (2008)
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: OSDI 2006: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, Berkeley, CA, USA, pp. 15–15. USENIX Association (2006)
Cooper, B.F., Ramakrishnan, R., Srivastava, U., Silberstein, A., Bohannon, P., Jacobsen, H.-A., Puz, N., Weaver, D., Yerneni, R.: Pnuts: Yahoo!’s hosted data serving platform. Proc. VLDB Endow. 1(2), 1277–1288 (2008)
Das, S., Agrawal, D., Abbadi, A.E.: G-store: a scalable data store for transactional multi key access in the cloud. In: SoCC, pp. 163–174 (2010)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. ACM Commun. 51(1), 107–113 (2008)
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. SIGOPS Oper. Syst. Rev. 41(6), 205–220 (2007)
DeWitt, D., Gray, J.: Parallel database systems: the future of high performance database systems. ACM Commun. 35(6), 85–98 (1992)
Dewitt, D.J., Ghandeharizadeh, S., Schneider, D.A., Bricker, A., Hsiao, H.I., Rasmussen, R.: The gamma database machine project. IEEE Trans. on Knowl. and Data Eng. 2(1), 44–62 (1990)
Fushimi, S., Kitsuregawa, M., Tanaka, H.: An overview of the system software of a parallel relational database machine grace. In: VLDB 1986: Proceedings of the 12th International Conference on Very Large Data Bases, pp. 209–219. Morgan Kaufmann Publishers Inc., San Francisco (1986)
Ghemawat, S., Gobioff, H., Leung, S.-T.: The google file system. In: SOSP, pp. 29–43 (2003)
Guo, F., Li, X., Ooi, B.C., Tan, K.-L.: Guinea: An efficient data processing framework on large clusters. Technical Report, National University of Singapore, School of Computing. TRA9/10 (2010)
Jagadish, H.V., Ooi, B.C., Vu, Q.H.: Baton: a balanced tree structure for peer-to-peer networks. In: VLDB, pp. 661–672 (2005)
Jiang, D., Ooi, B.C., Shi, L., Wu, S.: The performance of mapreduce: An in-depth study. Proc. VLDB Endow. 3(1), 472–483 (2010)
Kraska, T., Hentschel, M., Alonso, G., Kossmann, D.: Consistency rationing in the cloud: Pay only when it matters. PVLDB 2(1), 253–264 (2009)
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)
Lomet, D., Mokbel, M.F.: Locking key ranges with unbundled transaction services. Proc. VLDB Endow. 2(1), 265–276 (2009)
Lupu, M., Ooi, B.C., Tay, Y.C.: Paths to stardom: calibrating the potential of a peer-based data management system. In: SIGMOD 2008: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 265–278. ACM, New York (2008)
Matsudaira, P.: High-end biological imaging generates very large 3d+ and dynamic datasets. Proc. VLDB Endow. (2010)
Nykiel, T., Potamias, M., Mishra, C., Kollios, G., Koudas, N.: Mrshare: Sharing across multiple queries in mapreduce. Proc. VLDB Endow. 3(1), 494–505 (2010)
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: SIGMOD 2008: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1099–1110. ACM, New York (2008)
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: SIGMOD 2009: Proceedings of the 35th SIGMOD International Conference on Management of Data, pp. 165–178. ACM, New York (2009)
Ramakrishnan, R.: Data management challenges in the cloud. In: Proceedings of ACM SIGOPS LADIS (2009), http://www.cs.cornell.edu/projects/ladis2009/talks/ramakrishnan-keynote-ladis2009.pdf
Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable content-addressable network. In: SIGCOMM, pp. 161–172 (2001)
Stoica, I., Morris, R., Liben-Nowell, D., Karger, D.R., Kaashoek, M.F., Dabek, F., Balakrishnan, H.: Chord: a scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Trans. Netw. 11(1), 17–32 (2003)
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Anthony, S., Liu, H., Murthy, R.: Hive - a petabyte scale data warehouse using hadoop. In: ICDE, pp. 996–1005 (2010)
Vo, H.T., Chen, C., Ooi, B.C.: Towards elastic transactional cloud storage with range query support. Proc. VLDB Endow. 3(1), 506–517 (2010)
Vu, Q., Lupu, M., Ooi, B.C.: Peer-to-peer computing: Principles and applications. Springer, Heidelberg (2009)
Wang, J., Wu, S., Gao, H., Li, J., Ooi, B.C.: Indexing multi-dimensional data in a cloud system. In: SIGMOD 2010: Proceedings of the 2010 International Conference on Management of Data, pp. 591–602. ACM, New York (2010)
Wu, S., Jiang, D., Ooi, B.C., Wu, K.-L.: Efficient b-tree based indexing for cloud data processing. Proc. VLDB Endow. 3(1), 1207–1218 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, C. et al. (2010). Providing Scalable Database Services on the Cloud. In: Chen, L., Triantafillou, P., Suel, T. (eds) Web Information Systems Engineering – WISE 2010. WISE 2010. Lecture Notes in Computer Science, vol 6488. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17616-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-17616-6_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17615-9
Online ISBN: 978-3-642-17616-6
eBook Packages: Computer ScienceComputer Science (R0)