An Effective Scalable SQL Engine for NoSQL Databases
- 12 Citations
- 1k Downloads
Abstract
NoSQL databases were initially devised to support a few concrete extreme scale applications. Since the specificity and scale of the target systems justified the investment of manually crafting application code their limited query and indexing capabilities were not a major impediment. However, with a considerable number of mature alternatives now available there is an increasing willingness to use NoSQL databases in a wider and more diverse spectrum of applications and, to most of them, hand-crafted query code is not an enticing trade-off.
In this paper we address this shortcoming of current NoSQL databases with an effective approach for executing SQL queries while preserving their scalability and schema flexibility. We show how a full-fledged SQL engine can be integrated atop of HBase leading to an ANSI SQL compliant database. Under a standard TPC-C workload our prototype scales linearly with the number of nodes in the system and outperforms a NoSQL TPC-C implementation optimized for HBase.
Keywords
SQL NoSQL Cloud Computing MiddlewareReferences
- 1.BigQuery: Google (2011), http://code.google.com/apis/bigquery/
- 2.Hive: Hive (2011), http://hive.apache.org/
- 3.Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc. VLDB Endow. 2, 922–933 (2009)Google Scholar
- 4.Armbrust, M., Curtis, K., Kraska, T., Fox, A., Franklin, M.J., Patterson, D.A.: PIQL: success-tolerant query processing in the cloud. Proc. VLDB Endow. 5(3), 181–192 (2011)Google Scholar
- 5.Baker, J., Bondç, C., Corbett, J.C., Furman, J.J., Khorlin, A., Larson, J., Léon, J.M., Li, Y., Lloyd, A., Yushprakh, V.: Megastore: Providing Scalable, Highly Available Storage for Interactive Services. In: CIDR (2011)Google Scholar
- 6.Brantner, M., Florescu, D., Graf, D., Kossmann, D., Kraska, T.: Building a database on S3. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 251–264. ACM, New York (2008), http://doi.acm.org/10.1145/1376616.1376645 CrossRefGoogle Scholar
- 7.Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: OSDI 2006 (2006)Google Scholar
- 8.Cooper, B.F., Ramakrishnan, R., Srivastava, U., Silberstein, A., Bohannon, P., Jacobsen, H.A., Puz, N., Weaver, D., Yerneni, R.: PNUTS: Yahoo!’s hosted data serving platform. Proc. VLDB Endow (2008)Google Scholar
- 9.Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: SoCC 2010 (2010)Google Scholar
- 10.DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: Amazon’s highly available key-value store. In: SOSP 2007 (2007)Google Scholar
- 11.Foundation, A.S.: Apache derby (2013), http://db.apache.org/derby/
- 12.George, L.: HBase: The Definitive Guide. O’Reilly Media (2011)Google Scholar
- 13.Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. SIGOPS Operating Systems Review 37(5), 29–43 (2003)CrossRefGoogle Scholar
- 14.Gomes, P., Pereira, J., Oliveira, R.: An object mapping for the Cassandra distributed database. In: Inforum (2011)Google Scholar
- 15.Google: Cloud SQL: pick the plan that fits your app. (May 2012), http://googleappengine.blogspot.pt/2012/05/cloud-sql-pick-plan-that-fits-your-app.html
- 16.Hacigümüs, H., Tatemura, J., Hsiung, W.P., Moon, H.J., Po, O., Sawires, A., Chi, Y., Jafarpour, H.: CloudDB: One Size Fits All Revived. In: Proceedings of the 2010 6th World Congress on Services (2010)Google Scholar
- 17.Harizopoulos, S., Abadi, D.J., Madden, S., Stonebraker, M.: OLTP through the looking glass, and what we found there. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 981–992. ACM, New York (2008), http://doi.acm.org/10.1145/1376616.1376713 CrossRefGoogle Scholar
- 18.Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: Zookeeper: wait-free coordination for internet-scale systems. In: Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIXATC 2010, pp. 11–11. USENIX Association, Berkeley (2010), http://dl.acm.org/citation.cfm?id=1855840.1855851 Google Scholar
- 19.Lakshman, A., Malik, P.: Cassandra - A Decentralized Structured Storage System. In: LADIS 2009 (2009)Google Scholar
- 20.Lin, L., Lychagina, V., Wong, M.: Tenzing: A SQL implementation on the MapReduce framework. Proceedings of the VLDB Endowment 4(12), 1318–1327 (2011)Google Scholar
- 21.Meijer, E., Bierman, G.: A co-relational model of data for large shared data banks. ACM Queue 9(3), 30:30–30:48 (2011), http://doi.acm.org/10.1145/1952746.1961297 Google Scholar
- 22.Nadkarni, P., Brandt, C.: Data Extraction and Ad Hoc Query of an Entity-Attribute-Value Database. Journal of the American Medical Informatics Association 5(6), 511–527 (1998)CrossRefGoogle Scholar
- 23.Rys, M.: Scalable SQL. ACM Queue: Tomorrow’s Computing Today 9(4), 30 (2011)CrossRefGoogle Scholar
- 24.SalesForce.com: Phoenix: A SQL layer over HBase (May 2013), https://github.com/forcedotcom/phoenix
- 25.Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop Distributed File System. In: Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST 2010, pp. 1–10. IEEE Computer Society, Washington, DC (2010), http://dx.doi.org/10.1109/MSST.2010.5496972 CrossRefGoogle Scholar
- 26.Stonebraker, M., Cattell, R.: 10 rules for scalable performance in ’simple operation’ datastores. Commun. ACM 54(6), 72–80 (2011), http://doi.acm.org/10.1145/1953122.1953144 CrossRefGoogle Scholar
- 27.Stonebraker, M., Madden, S., Abadi, D.J., Harizopoulos, S., Hachem, N., Helland, P.: The end of an architectural era (it’s time for a complete rewrite). In: Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB 2007, pp. 1150–1160. VLDB Endowment (2007), http://dl.acm.org/citation.cfm?id=1325851.1325981
- 28.Vilaça, R., Cruz, F., Oliveira, R.: On the expressiveness and trade-offs of large scale tuple stores. In: Meersman, R., Dillon, T., Herrero, P. (eds.) OTM 2010. LNCS, vol. 6427, pp. 727–744. Springer, Heidelberg (2010), http://dx.doi.org/10.1007/978-3-642-16949-6_5 CrossRefGoogle Scholar