Skip to main content

Many-Core Branch-and-Bound for GPU Accelerators and MIC Coprocessors

  • Chapter
  • First Online:
High-Performance Simulation-Based Optimization

Part of the book series: Studies in Computational Intelligence ((SCI,volume 833))

Abstract

Coprocessors are increasingly becoming key building blocks of High Performance Computing platforms. These many-core energy-efficient devices boost the performance of traditional processors. On the other hand, Branch-and-Bound (B&B) algorithms are tree-based exact methods for solving to optimality combinatorial optimization problems (COPs). Solving large COPs results in the generation of a very large pool of subproblems and the evaluation of their associated lower bounds. Generating and evaluating those subproblems on coprocessors raises several issues including processor-coprocessor data transfer optimization, vectorization, thread divergence, and so on. In this paper, we investigate the offload-based parallel design and implementation of B&B algorithms for coprocessors addressing these issues. Two major many-core architectures are considered and compared: Nvidia GPU and Intel MIC. The proposed approaches have been experimented using the Flow-Shop scheduling problem and two hardware configurations equivalent in terms of energy consumption: Nvidia Tesla K40 and Intel Xeon Phi 5110P. The reported results show that the GPU-accelerated approach outperforms the MIC offload-based one even in its vectorized version. Moreover, vectorization improves the efficiency of the MIC offload-based approach with a factor of two.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    GPU and MIC stand for respectively Graphics Processing Unit and Many Integrated Cores.

  2. 2.

    An optimization problem consists of minimizing or maximizing a cost function. Without loss of generality, in this paper the minimization case and the permutation Flow-Shop scheduling problem are considered.

  3. 3.

    http://d3f8ykwhia686p.cloudfront.net/1live/intel/CompilerAutovectorizationGuide.eps.

References

  1. Barreto, L., Bauer, M.: Parallel branch and bound algorithm-a comparison between serial, OpenMP and MPI implementations. J. Phys. Conf. Ser. 256, 012018 (2010)

    Google Scholar 

  2. Bolze, R., et al.: Grid’5000: a large scale and highly reconfigurable experimental grid testbed. IJHPCA 20(4), 481–494 (2006)

    Article  Google Scholar 

  3. Chakroun, I.: Parallel heterogeneous branch and bound algorithms for multi-core and multi-GPU environments. Ph.D. thesis from Université Lille 1 (2013)

    Google Scholar 

  4. Chakroun, I., Melab, N., Mezmaz, M., Tuyttens, D.: Combining multi-core and GPU computing for solving combinatorial optimization problems. J. Parallel Distrib. Comput. 73(12), 1563–1577 (2013). https://doi.org/10.1016/j.jpdc.2013.07.023

    Article  Google Scholar 

  5. Chakroun, I., Mezmaz, M., Melab, N., Bendjoudi, A.: Reducing thread divergence in a GPU-accelerated branch-and-bound algorithm. Concurr. Comput. Pract. Exp. 25(8), 1121–1136 (2013)

    Article  Google Scholar 

  6. Chrysos, G.: Intel Xeon Phi X100 family coprocessor - the architecture. Intel Inc. (2012). https://software.intel.com/en-us/articles/intel-xeon-phi-coprocessor-codename-knights-corner

  7. Garey, M., Johnson, D., Sethi, R.: The complexity of flow-shop and job-shop scheduling. Math. Oper. Res. 1, 117–129 (1976)

    Google Scholar 

  8. Johnson, S.: Optimal two and three-stage production schedules with setup times included. Nav. Res. Logistis Q. 1, 61–68 (1954)

    Article  Google Scholar 

  9. Lageweg, B., Lenstra, J., Kan, A.R.: A general bounding scheme for the permutation flow-shop problem. Oper. Res. 26(1), 53–67 (1978)

    Article  Google Scholar 

  10. Lalami, M.: Contribution à la résolution de problèms d’optimisation combinatoire: méthodes séquentielles et parallèles. Université de Toulouse III - Paul Sabatier (2012)

    Google Scholar 

  11. Lawler, E.L., Wood, D.E.: Branch-and-bound methods: a survey. Oper. Res. 14(4), 699–719 (1966). https://doi.org/10.1287/opre.14.4.699

    Article  MathSciNet  Google Scholar 

  12. Melab, N., Chakroun, I., Bendjoudi, A.: Graphics processing unit-accelerated bounding for branch-and-bound applied to a permutation problem using data access optimization. Concurr. Comput. Pract. Exp. 26(16), 2667–2683 (2014)

    Article  Google Scholar 

  13. Melab, N., Chakroun, I., Mezmaz, M., Tuyttens, D.: A GPU-accelerated branch-and-bound algorithm for the flow-shop scheduling problem. In: 2012 IEEE International Conference on Cluster Computing, CLUSTER 2012, Beijing, China, 24–28 Sept 2012, pp. 10–17 (2012)

    Google Scholar 

  14. Melab, N., Leroy, R., Mezmaz, M., Tuyttens, D.: Parallel branch-and-bound using private IVM-based work stealing on xeon phi MIC coprocessor. In: 2015 International Conference on High Performance Computing and Simulation, HPCS 2015, Amsterdam, Netherlands, 20–24 July 2015, pp. 394–399 (2015)

    Google Scholar 

  15. Mezmaz, M., Melab, N., Talbi, E.G.: A grid-enabled branch and bound algorithm for solving challenging combinatorial optimization problems. In: Proceedings of 21th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Long Beach, California (2007)

    Google Scholar 

  16. Nvidia: NVIDIA’s next generation CUDA compute architecture: Kepler GK110/210. Whitepaper (2014)

    Google Scholar 

  17. Saini, S., Jin, H., Jespersen, D., Cheung, S., Djomehri, M., Chang, J., Hood, R.: Early multi-node performance evaluation of a knights corner (KNC) based NASA supercomputer. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, IPDPS 2015, Hyderabad, India, 25–29 May 2015, pp. 57–67 (2015)

    Google Scholar 

  18. Top500 HPC international ranking. http://www.top500.org/

  19. Vu, T.T.: Heterogeneity and locality-aware work stealing for large scale branch-and-bound irregular algorithms. Ph.D. thesis from Université Lille 1 (2014)

    Google Scholar 

Download references

Acknowledgements

Experiments presented in this paper were carried out using the Intel Xeon Phi of Digitalis (http://digitalis.imag.fr) and Grid’5000 testbed (https://www.grid5000.fr). Therefore, we would like to thank the Digitalis staff and especially Pierre Neyron from LIG/CNRS Lab for making the MIC processor fully functional and available. Grid’5000 is supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nouredine Melab .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Melab, N., Gmys, J., Mezmaz, M., Tuyttens, D. (2020). Many-Core Branch-and-Bound for GPU Accelerators and MIC Coprocessors. In: Bartz-Beielstein, T., Filipič, B., Korošec, P., Talbi, EG. (eds) High-Performance Simulation-Based Optimization. Studies in Computational Intelligence, vol 833. Springer, Cham. https://doi.org/10.1007/978-3-030-18764-4_12

Download citation

Publish with us

Policies and ethics