Abstract
Distributed processing commonly requires data spread across machines using a priori static or hash-based data allocation. In this paper, we explore an alternative approach that starts from a master node in control of the complete database, and a variable number of worker nodes for delegated query processing. Data is shipped just-in-time to the worker nodes using a need to know policy, and is being reused, if possible, in subsequent queries. A bidding mechanism among the workers yields a scheduling with the most efficient reuse of previously shipped data, minimizing the data transfer costs.
Just-in-time data shipment allows our system to benefit from locally available idle resources to boost overall performance. The system is maintenance-free and allocation is fully transparent to users. Our experiments show that the proposed adaptive distributed architecture is a viable and flexible alternative for small scale MapReduce-type of settings.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bajda-Pawlikowski, K., Abadi, D.J., et al.: Efficient Processing of Data Warehousing Queries in a Split Execution Environment. In: SIGMOD, pp. 1165–1176 (2011)
Chambers, C., Raniwala, A., et al.: FlumeJava: easy, efficient data-parallel pipelines. In: PLDI, pp. 363–375 (2010)
Curino, C., Jones, E.P.C., et al.: Relational Cloud: a Database Service for the Cloud. In: CIDR, pp. 235–240 (2011)
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: Proceedings of OSDI, pp. 137–150 (2004)
Elmore, A.J., Das, S., Agrawal, D., Abbadi, A.E.: Zephyr: live migration in shared nothing databases for elastic cloud platforms. In: SIGMOD Conference, pp. 301–312 (2011)
Floratou, A., Patel, J.M., Shekita, E.J., Tata, S.: Column-Oriented Storage Techniques for MapReduce. In: VLDB, pp. 419–429 (2011)
Franklin, M.J., Jónsson, B.T., Kossmann, D.: Performance tradeoffs for client-server query processing. In: SIGMOD Conference, pp. 149–160 (1996)
Goncalves, R., Kersten, M.L.: The data cyclotron query processing scheme. In: EDBT, pp. 75–86 (2010)
Hadoop (2012), http://hadoop.apache.org/
Herodotou, H., Lim, H., et al.: Starfish: A self-tuning system for big data analytics. In: CIDR (2011)
Ivanova, M., Kersten, M.L., Nes, N.J., Goncalves, R.: An architecture for recycling intermediates in a column-store. ACM Trans. Database Syst. 35(4), 24 (2010)
Jiang, D., Ooi, B.C., Shi, L., Wu, S.: The Performance of MapReduce: An In-depth Study. PVLDB 3(1), 472–483 (2010)
Kossmann, D., Franklin, M.J., Drasch, G.: Cache investment: integrating query optimization and distributed data placement. ACM Trans. Database Syst. 25(4), 517–558 (2000)
Olston, C., Reed, B., et al.: et al. Pig latin: a not-so-foreign language for data processing. In: SIGMOD Conference, pp. 1099–1110 (2008)
Olston, C., Reed, B., Silberstein, A., Srivastava, U.: Automatic optimization of parallel dataflow programs. In: USENIX Annual Technical Conference, pp. 267–273 (2008)
Pavlo, A., Paulson, E., et al.: A Comparison of Approaches to Large-scale Data Analysis. In: SIGMOD Conference, pp. 165–178 (2009)
Plattner, C., Alonso, G., Özsu, M.T.: Extending DBMSs with Satellite Databases. VLDB J. 17(4), 657–682 (2008)
Raman, V., Han, W., Narang, I.: Parallel querying with non-dedicated computers. In: VLDB, pp. 61–72 (2005)
Röhm, U., Böhm, K., Schek, H.-J.: Cache-Aware Query Routing in a Cluster of Databases. In: ICDE, pp. 641–650 (2001)
Thusoo, A., Sarma, J.S., et al.: Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2, 1626–1629 ( August 2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ivanova, M., Kersten, M., Groffen, F. (2012). Just-In-Time Data Distribution for Analytical Query Processing. In: Morzy, T., Härder, T., Wrembel, R. (eds) Advances in Databases and Information Systems. ADBIS 2012. Lecture Notes in Computer Science, vol 7503. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33074-2_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-33074-2_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33073-5
Online ISBN: 978-3-642-33074-2
eBook Packages: Computer ScienceComputer Science (R0)