Scheduling MapReduce Jobs in HPC Clusters

Neves, Marcelo Veiga; Ferreto, Tiago; De Rose, César

doi:10.1007/978-3-642-32820-6_19

Marcelo Veiga Neves¹⁹,
Tiago Ferreto¹⁹ &
César De Rose¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7484))

Included in the following conference series:

European Conference on Parallel Processing

3237 Accesses
4 Citations

Abstract

MapReduce (MR) has become a de facto standard for large-scale data analysis. Moreover, it has also attracted the attention of the HPC community due to its simplicity, efficiency and highly scalable parallel model. However, MR implementations present some issues that may complicate its execution in existing HPC clusters, specially concerning the job submission. While on MR there are no strict parameters required to submit a job, in a typical HPC cluster, users must specify the number of nodes and amount of time required to complete the job execution. This paper presents the MR Job Adaptor, a component to optimize the scheduling of MR jobs along with HPC jobs in an HPC cluster. Experiments performed using real-world HPC and MapReduce workloads have show that MR Job Adaptor can properly transform MR jobs to be scheduled in an HPC Cluster, minimizing the job turnaround time, and exploiting unused resources in the cluster.

Download to read the full chapter text

Chapter PDF

MapReduce scheduling algorithms in Hadoop: a systematic study

Article Open access 10 October 2023

Soudabeh Hedayati, Neda Maleki, … Kamal Berahmand

Performance Improvement of MapReduce Framework by Identifying Slow TaskTrackers in Heterogeneous Hadoop Cluster

Enhancing the Performance of MapReduce Default Scheduler by Detecting Prolonged TaskTrackers in Heterogeneous Environments

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Apache Hadoop on Demand (HOD) (2012), http://hadoop.apache.org/common/docs/current/hod_scheduler.html (accessed on February 2012)
Parallel Workloads Archive (2012), http://www.cs.huji.ac.il/labs/parallel/workload/ (accessed on February 2012)
Casanova, H.: Simgrid: A toolkit for the simulation of application scheduling. In: Proceedings of the First IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2001), Brisbane, Australia (May 2001)
Google Scholar
Chen, Y., Ganapathi, A., Griffith, R., Katz, R.H.: The case for evaluating mapreduce performance using workload suites. In: MASCOTS, pp. 390–399. IEEE (2011)
Google Scholar
De Rose, C.A.F., Ferreto, T., Calheiros, R.N., Cirne, W., Costa, L.B., Fireman, D.: Allocation strategies for utilization of space shared resources in bag of tasks grids. Future Generation Computer Systems 24(5), 331–341 (2008)
Article Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)
Article Google Scholar
Ekanayake, J., et al.: Twister: a runtime for iterative mapreduce. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC 2010, pp. 810–818. ACM, New York (2010)
Chapter Google Scholar
Feitelson, D.G., Mu’alem Weil, A.: Utilization and predictability in scheduling the IBM SP2 with backfilling. In: 12th Intl. Parallel Processing Symp (IPPS), pp. 542–546 (April 1998)
Google Scholar
Fox, G., et al.: Parallel data mining from multicore to cloudy grids. In: Proceedings of HPC 2008 (2011)
Google Scholar
Gropp, W., Lusk, E., Skjellum, A.: Using MPI Portable Parallel Programming with the Message Passing Interface. The MIT Press (1994)
Google Scholar
Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A.D., Katz, R., Shenker, S., Stoica, I.: Mesos: Flexible resource sharing for the cloud. USENIX (August 2011)
Google Scholar
Isard, M., et al.: Dryad: distributed data-parallel programs from sequential building blocks. In: Proceedings of EuroSys 2007 (January 2007)
Google Scholar
Krishnan, S., Tatineni, M.: Myhadoop-hadoop-on-demand on traditional hpc resources. sdsc.edu (2011), http://www.sdsc.edu/~allans/MyHadoop.pdf
Lublin, U., Feitelson, D.G.: The workload on parallel supercomputers: Modeling the characteristics of rigid jobs. J. Parallel & Distributed Comput. 63(11), 1105–1122 (2003)
Article MATH Google Scholar
Middleton, A.: Data-intensive technologies for cloud computing. In: Handbook of Cloud Computing (January 2010)
Google Scholar
Oracle: Oracle Grid Engine, previously known as Sun Grid Engine (SGE) (2012), http://www.oracle.com/technetwork/oem/grid-engine-166852.html (accessed on February 2012)
Schadt, E., Linderman, M., Sorenson, J.: Computational solutions to large-scale data management and analysis. Nature Reviews (January 2010)
Google Scholar
Sehrish, S., et al.: Mrap: a novel mapreduce-based framework to support hpc analytics applications with access patterns. In: Proceedings of HPDC 2010, pp. 107–118 (2010), http://doi.acm.org/10.1145/1851476.1851490
Srirama, S., Jakovits, P.: Adapting scientific computing problems to clouds using mapreduce. Future Generation Computer Systems (January 2011)
Google Scholar
Team, A.H.: Apache hadoop web site (2011), http://hadoop.apache.org (accessed on February 2012)
Team, A.H.: Hamster: Hadoop and mpi on the same cluster (2011), https://issues.apache.org/jira/browse/MAPREDUCE-2911 (accessed on February 2012)
Top 500: Top 500 Supercomputers Site (2012), http://www.top500.org (accessed on February 2012)
TORQUE: TORQUE Resource Manager (2012), http://www.clusterresources.com/products/torque-resource-manager.php (accessed on February 2012)
Verma, A., Cherkasova, L., Campbell, R.H.: Aria: automatic resource inference and allocation for mapreduce environments. In: Proceedings of ICAC 2011, pp. 235–244 (2011)
Google Scholar
Wang, G., et al.: Towards synthesizing realistic workload traces for studying the hadoop ecosystem. In: MASCOTS. pp. 400–408. IEEE (2011)
Google Scholar
Zaharia, M., et al.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Morin, C., Muller, G. (eds.) EuroSys, pp. 265–278. ACM (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Informatics, PUCRS, Brazil
Marcelo Veiga Neves, Tiago Ferreto & César De Rose

Authors

Marcelo Veiga Neves
View author publications
You can also search for this author in PubMed Google Scholar
Tiago Ferreto
View author publications
You can also search for this author in PubMed Google Scholar
César De Rose
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Patras, Computer Technology Institute and Press “Diophantus”,, N. Kazantzaki, 26504, Rio, Greece
Christos Kaklamanis
University of Patras, University Building B, 26504, Rio, Greece
Theodore Papatheodorou
Computer Technology Institute and Press “Diophantus”, University of Patras, N. Kazantzaki, 26504, Rio, Greece
Paul G. Spirakis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Neves, M.V., Ferreto, T., De Rose, C. (2012). Scheduling MapReduce Jobs in HPC Clusters. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds) Euro-Par 2012 Parallel Processing. Euro-Par 2012. Lecture Notes in Computer Science, vol 7484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32820-6_19

Download citation

DOI: https://doi.org/10.1007/978-3-642-32820-6_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32819-0
Online ISBN: 978-3-642-32820-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Scheduling MapReduce Jobs in HPC Clusters

Abstract

Chapter PDF

Similar content being viewed by others

MapReduce scheduling algorithms in Hadoop: a systematic study

Performance Improvement of MapReduce Framework by Identifying Slow TaskTrackers in Heterogeneous Hadoop Cluster

Enhancing the Performance of MapReduce Default Scheduler by Detecting Prolonged TaskTrackers in Heterogeneous Environments

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Scheduling MapReduce Jobs in HPC Clusters

Abstract

Chapter PDF

Similar content being viewed by others

MapReduce scheduling algorithms in Hadoop: a systematic study

Performance Improvement of MapReduce Framework by Identifying Slow TaskTrackers in Heterogeneous Hadoop Cluster

Enhancing the Performance of MapReduce Default Scheduler by Detecting Prolonged TaskTrackers in Heterogeneous Environments

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation