Gunther: Search-Based Auto-Tuning of MapReduce

Liao, Guangdeng; Datta, Kushal; Willke, Theodore L.

doi:10.1007/978-3-642-40047-6_42

Guangdeng Liao¹⁹,
Kushal Datta¹⁹ &
Theodore L. Willke¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8097))

Included in the following conference series:

European Conference on Parallel Processing

4036 Accesses
43 Citations

Abstract

MapReduce has emerged as a very popular programming model for large-scale data analytics. Despite its industry-wide acceptance, the open source Apache^TM Hadoop^TM framework for MapReduce remains difficult to optimize, particularly in large-scale production environments. The vast search space defined by the hundreds of MapReduce configuration parameters and the complex interactions between them makes it time consuming to rely on manual tuning. Hence something more is needed. In this paper we evaluate approaches to the automatic tuning of Hadoop MapReduce including ones based on cost-based and machine learning models. We determine that they are inadequate and instead propose a search-based approach called Gunther for Hadoop MapReduce optimization. Gunther uses a Genetic Algorithm which is specially designed to aggressively identify parameter settings that result in near-optimal job execution time. We evaluate Gunther on two types of clusters with different resource characteristics. Our experiments demonstrate that Gunther can obtain near-optimal performance within a small number of trials (<30), outperforming existing auto-tuning solutions and industry recommended configurations. We also describe a methodology for reducing the dimensionality of the auto-tuning problem, further improving search efficiency without sacrificing performance improvement.

Download to read the full chapter text

Chapter PDF

HCOpt: An Automatic Optimizer for Configuration Parameters of Hadoop

MR-COF: A Genetic MapReduce Configuration Optimization Framework

A performance modeling-based HADOOP configuration tuning strategy

Article 01 February 2022

Keywords

References

Babu, S.: Towards Automatic Optimization of MapReduce Programs. In: SOCC, pp. 137–142 (2010)
Google Scholar
Beck, A.: A Fast Iterative Shrinkage-Threshold Algorithm for Linear Inverse Problems. In: SIAM (2009)
Google Scholar
Chaudhuri, S., Narasayya, V.: Self-tuning database systems: a decade of progress. In: VLDB 2007 (2007)
Google Scholar
Cloudera: 7 tips for Improving MapReduce Performance
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: OSDI (2004)
Google Scholar
Duan, S., Thummala, V., Babu, S.: Tuning database configuration parameters with iTuned. In: VLDB 2009 (2009)
Google Scholar
Ekanayake, J., et al.: Twister: a runtime for iterative mapreduce. In: HPDC (2010)
Google Scholar
Ganapathi, A., et al.: A case for machine learning to optimize multicore performance. In: HotPar (2009)
Google Scholar
Hadoop mapreduce, http://hadoop.apache.org
HiBench, https://github.com/hibench/HiBench-2
Herodotou, H.: Hadoop Performance Models. Technical report, Duke Univ. (2010)
Google Scholar
Herodotou, H., et al.: What-if Analysis, and Cost-based Optimization of MapReduce Programs. In: PVLDB (2011)
Google Scholar
Herodoto, H., et al.: Starfish: A Self-tuning System for Big Data Analytics. In: CIDR (2011)
Google Scholar
Intel SSD, http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-ssd.html
Ipek, E., de Supinski, B.R., Schulz, M., McKee, S.A.: An approach to performance prediction for parallel applications. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 196–205. Springer, Heidelberg (2005)
Chapter Google Scholar
Jahani, E., et al.: Automatic Optimization of MapReduce Programs. In: PVLDB (2011)
Google Scholar
Jiang, D., et al.: The Performance of MapReduce: An In-depth Study. In: PVLDB (2010)
Google Scholar
Kambatla, K., et al.: Towards optimizing hadoop provisioning in the cloud. In: HotCloud (2009)
Google Scholar
Kennedy, J., et al.: Particle Swarm Optimization. IEEE ICNN (1995)
Google Scholar
Kirkpatrick, S., Gelatt, D.C., Vechhi, M.P.: Optimization by simulated annealing. Science (1983)
Google Scholar
Kwan, S., et al.: Automatic Configuration of IBM DB2 Universal Database. IBM TR (2002)
Google Scholar
Liu, J., et al.: Panacea: Towards Holistic Optimization of MapReduce Applications. In: CGO 2012 (2012)
Google Scholar
Mitchell, M.: An Introduction to Genetic Algorithms. The MIT Press (1996)
Google Scholar
Singer, J., et al.: Garbage collection auto-tuning for java mapreduce on multi-cores. In: ISMM (2011)
Google Scholar
Vaidya.hadoop.apache.org/mapreduce/docs/r0.21.0/vaidya.html
White, T.: Hadoop: The Definitive Guide. Yahoo Press (2010)
Google Scholar
YARN, http://hadoop.apache.org/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/YARN.html
Ye, T., Kalyanaraman, S.: A Recursive Random Search Algorithm for Large-Scale Network Parameter Configuration. In: SIGMETRICS, pp. 196–205 (2003)
Google Scholar
Zheng, W., Bianchini, R., Nguyen, T.D.: Automatic Configuration of Internet Services. In: Eurosys 2007 (2007)
Google Scholar
Zhu, Q., et al.: Automatic tuning of interactive perception applications. UAI (2010)
Google Scholar
Gridmix3 - Emulating Production Workload for Apache Hadoop: http://developer.yahoo.com/blogs/hadoop/gridmix3-emulating-production-workload-apache-hadoop-450.html

Download references

Author information

Authors and Affiliations

Intel Labs, Hillsboro, Oregon, USA
Guangdeng Liao, Kushal Datta & Theodore L. Willke

Authors

Guangdeng Liao
View author publications
You can also search for this author in PubMed Google Scholar
Kushal Datta
View author publications
You can also search for this author in PubMed Google Scholar
Theodore L. Willke
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

German Research School for Simulation Sciences, RWTH Aachen, Schinkelstr. 2a, 52062, Aachen, Germany
Felix Wolf
Jülich Supercomputing Centre, Forschungszentrum Jülich GmbH, Station 22,, 52425, Jülich, Germany
Bernd Mohr
Center for Computing and Communication, RWTH Aachen, Seffenter Weg 23, 52074, Aachen, Germany
Dieter an Mey

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liao, G., Datta, K., Willke, T.L. (2013). Gunther: Search-Based Auto-Tuning of MapReduce. In: Wolf, F., Mohr, B., an Mey, D. (eds) Euro-Par 2013 Parallel Processing. Euro-Par 2013. Lecture Notes in Computer Science, vol 8097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40047-6_42

Download citation

DOI: https://doi.org/10.1007/978-3-642-40047-6_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40046-9
Online ISBN: 978-3-642-40047-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Gunther: Search-Based Auto-Tuning of MapReduce

Abstract

Chapter PDF

Similar content being viewed by others

HCOpt: An Automatic Optimizer for Configuration Parameters of Hadoop

MR-COF: A Genetic MapReduce Configuration Optimization Framework

A performance modeling-based HADOOP configuration tuning strategy

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Gunther: Search-Based Auto-Tuning of MapReduce

Abstract

Chapter PDF

Similar content being viewed by others

HCOpt: An Automatic Optimizer for Configuration Parameters of Hadoop

MR-COF: A Genetic MapReduce Configuration Optimization Framework

A performance modeling-based HADOOP configuration tuning strategy

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation