Many-Core Branch-and-Bound for GPU Accelerators and MIC Coprocessors

Melab, Nouredine; Gmys, Jan; Mezmaz, Mohand; Tuyttens, Daniel

doi:10.1007/978-3-030-18764-4_12

Nouredine Melab^6,7,
Jan Gmys⁸,
Mohand Mezmaz⁸ &
…
Daniel Tuyttens⁸

Part of the book series: Studies in Computational Intelligence ((SCI,volume 833))

990 Accesses
5 Citations

Abstract

Coprocessors are increasingly becoming key building blocks of High Performance Computing platforms. These many-core energy-efficient devices boost the performance of traditional processors. On the other hand, Branch-and-Bound (B&B) algorithms are tree-based exact methods for solving to optimality combinatorial optimization problems (COPs). Solving large COPs results in the generation of a very large pool of subproblems and the evaluation of their associated lower bounds. Generating and evaluating those subproblems on coprocessors raises several issues including processor-coprocessor data transfer optimization, vectorization, thread divergence, and so on. In this paper, we investigate the offload-based parallel design and implementation of B&B algorithms for coprocessors addressing these issues. Two major many-core architectures are considered and compared: Nvidia GPU and Intel MIC. The proposed approaches have been experimented using the Flow-Shop scheduling problem and two hardware configurations equivalent in terms of energy consumption: Nvidia Tesla K40 and Intel Xeon Phi 5110P. The reported results show that the GPU-accelerated approach outperforms the MIC offload-based one even in its vectorized version. Moreover, vectorization improves the efficiency of the MIC offload-based approach with a factor of two.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
GPU and MIC stand for respectively Graphics Processing Unit and Many Integrated Cores.
2.
An optimization problem consists of minimizing or maximizing a cost function. Without loss of generality, in this paper the minimization case and the permutation Flow-Shop scheduling problem are considered.
3.
http://d3f8ykwhia686p.cloudfront.net/1live/intel/CompilerAutovectorizationGuide.eps.

References

Barreto, L., Bauer, M.: Parallel branch and bound algorithm-a comparison between serial, OpenMP and MPI implementations. J. Phys. Conf. Ser. 256, 012018 (2010)
Google Scholar
Bolze, R., et al.: Grid’5000: a large scale and highly reconfigurable experimental grid testbed. IJHPCA 20(4), 481–494 (2006)
Article Google Scholar
Chakroun, I.: Parallel heterogeneous branch and bound algorithms for multi-core and multi-GPU environments. Ph.D. thesis from Université Lille 1 (2013)
Google Scholar
Chakroun, I., Melab, N., Mezmaz, M., Tuyttens, D.: Combining multi-core and GPU computing for solving combinatorial optimization problems. J. Parallel Distrib. Comput. 73(12), 1563–1577 (2013). https://doi.org/10.1016/j.jpdc.2013.07.023
Article Google Scholar
Chakroun, I., Mezmaz, M., Melab, N., Bendjoudi, A.: Reducing thread divergence in a GPU-accelerated branch-and-bound algorithm. Concurr. Comput. Pract. Exp. 25(8), 1121–1136 (2013)
Article Google Scholar
Chrysos, G.: Intel Xeon Phi X100 family coprocessor - the architecture. Intel Inc. (2012). https://software.intel.com/en-us/articles/intel-xeon-phi-coprocessor-codename-knights-corner
Garey, M., Johnson, D., Sethi, R.: The complexity of flow-shop and job-shop scheduling. Math. Oper. Res. 1, 117–129 (1976)
Google Scholar
Johnson, S.: Optimal two and three-stage production schedules with setup times included. Nav. Res. Logistis Q. 1, 61–68 (1954)
Article Google Scholar
Lageweg, B., Lenstra, J., Kan, A.R.: A general bounding scheme for the permutation flow-shop problem. Oper. Res. 26(1), 53–67 (1978)
Article Google Scholar
Lalami, M.: Contribution à la résolution de problèms d’optimisation combinatoire: méthodes séquentielles et parallèles. Université de Toulouse III - Paul Sabatier (2012)
Google Scholar
Lawler, E.L., Wood, D.E.: Branch-and-bound methods: a survey. Oper. Res. 14(4), 699–719 (1966). https://doi.org/10.1287/opre.14.4.699
Article MathSciNet Google Scholar
Melab, N., Chakroun, I., Bendjoudi, A.: Graphics processing unit-accelerated bounding for branch-and-bound applied to a permutation problem using data access optimization. Concurr. Comput. Pract. Exp. 26(16), 2667–2683 (2014)
Article Google Scholar
Melab, N., Chakroun, I., Mezmaz, M., Tuyttens, D.: A GPU-accelerated branch-and-bound algorithm for the flow-shop scheduling problem. In: 2012 IEEE International Conference on Cluster Computing, CLUSTER 2012, Beijing, China, 24–28 Sept 2012, pp. 10–17 (2012)
Google Scholar
Melab, N., Leroy, R., Mezmaz, M., Tuyttens, D.: Parallel branch-and-bound using private IVM-based work stealing on xeon phi MIC coprocessor. In: 2015 International Conference on High Performance Computing and Simulation, HPCS 2015, Amsterdam, Netherlands, 20–24 July 2015, pp. 394–399 (2015)
Google Scholar
Mezmaz, M., Melab, N., Talbi, E.G.: A grid-enabled branch and bound algorithm for solving challenging combinatorial optimization problems. In: Proceedings of 21th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Long Beach, California (2007)
Google Scholar
Nvidia: NVIDIA’s next generation CUDA compute architecture: Kepler GK110/210. Whitepaper (2014)
Google Scholar
Saini, S., Jin, H., Jespersen, D., Cheung, S., Djomehri, M., Chang, J., Hood, R.: Early multi-node performance evaluation of a knights corner (KNC) based NASA supercomputer. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, IPDPS 2015, Hyderabad, India, 25–29 May 2015, pp. 57–67 (2015)
Google Scholar
Top500 HPC international ranking. http://www.top500.org/
Vu, T.T.: Heterogeneity and locality-aware work stealing for large scale branch-and-bound irregular algorithms. Ph.D. thesis from Université Lille 1 (2014)
Google Scholar

Download references

Acknowledgements

Experiments presented in this paper were carried out using the Intel Xeon Phi of Digitalis (http://digitalis.imag.fr) and Grid’5000 testbed (https://www.grid5000.fr). Therefore, we would like to thank the Digitalis staff and especially Pierre Neyron from LIG/CNRS Lab for making the MIC processor fully functional and available. Grid’5000 is supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations.

Author information

Authors and Affiliations

Inria Lille - Nord Europe, Lille, France
Nouredine Melab
CNRS/CRIStAL, Université Lille 1, Villeneuve-d’Ascq, France
Nouredine Melab
Mathematics and Operational Research Department (MathRO), University of Mons, Mons, Belgium
Jan Gmys, Mohand Mezmaz & Daniel Tuyttens

Authors

Nouredine Melab
View author publications
You can also search for this author in PubMed Google Scholar
Jan Gmys
View author publications
You can also search for this author in PubMed Google Scholar
Mohand Mezmaz
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Tuyttens
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nouredine Melab .

Editor information

Editors and Affiliations

TH Köln, Cologne, Germany
Thomas Bartz-Beielstein
Jožef Stefan Institute, Ljubljana, Slovenia
Bogdan Filipič
Jožef Stefan Institute, Ljubljana, Slovenia
Peter Korošec
University Lille, Lille, France
El-Ghazali Talbi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Melab, N., Gmys, J., Mezmaz, M., Tuyttens, D. (2020). Many-Core Branch-and-Bound for GPU Accelerators and MIC Coprocessors. In: Bartz-Beielstein, T., Filipič, B., Korošec, P., Talbi, EG. (eds) High-Performance Simulation-Based Optimization. Studies in Computational Intelligence, vol 833. Springer, Cham. https://doi.org/10.1007/978-3-030-18764-4_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-18764-4_12
Published: 02 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18763-7
Online ISBN: 978-3-030-18764-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics