Skip to main content

Work Distribution of Data-Parallel Applications on Heterogeneous Systems

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9945))

Included in the following conference series:

  • 2375 Accesses

Abstract

Heterogeneous computing systems offer high peak performance and energy efficiency, and utilizing this potential is essential to achieve extreme-scale performance. However, optimal sharing of the work among processing elements in heterogeneous systems is not straightforward. In this paper, we propose an approach that uses combinatorial optimization to search for optimal system configuration in a given parameter space. The optimization goal is to determine the number of threads, thread affinities, and workload partitioning, such that the overall execution time is minimized. For combinatorial optimization we use the Simulated Annealing. We evaluate our approach with a DNA sequence analysis application on a heterogeneous platform that comprises two Intel Xeon E5 processors and an Intel Xeon Phi 7120P co-processor. The obtained results demonstrate that using the near-optimal system configuration, determined by our algorithm based on the simulated annealing, application performance is improved.

This research has received funding from the Swedish Knowledge Foundation under Grant No. 20150088.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. TOP500 Supercomputer Sites. http://www.top500.org/. Accessed Jan 2016

  2. Abraham, E., Bekas, C., Brandic, I., Genaim, S., Johnsen, E.B.,Kondov, I., Pllana, S., Streit, A.: Preparing HPC applications for exascale: challenges and recommendations. In: 2015 18th International Conference on Network-Based Information Systems (NBiS), pp. 401–406, September 2015

    Google Scholar 

  3. Albayrak, O.E., Akturk, I., Ozturk, O.: Improving application behavior on heterogeneous manycore systems through kernel mapping. Parallel Comput. 39(12), 867–878 (2013). http://dx.doi.org/10.1016/j.parco.2013.08.011

    Article  Google Scholar 

  4. Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Exp. 23(2), 187–198 (2011)

    Article  Google Scholar 

  5. Ayguadé, E., Blainey, B., Duran, A., Labarta, J., Martínez, F., Martorell, X., Silvera, R.: Is the Schedule clause really necessary in OpenMP? In: Voss, M.J. (ed.) WOMPAT 2003. LNCS, vol. 2716, pp. 147–159. Springer, Heidelberg (2003). doi:10.1007/3-540-45009-2_12

    Chapter  Google Scholar 

  6. Benkner, S., Pllana, S., Traff, J., Tsigas, P., Dolinsky, U., Augonnet, C., Bachmayer, B., Kessler, C., Moloney, D., Osipov, V.: PEPPHER: efficient and productive usage of hybrid computing systems. IEEE Micro 31(5), 28–41 (2011)

    Article  Google Scholar 

  7. Braun, T.D., Siegel, H.J., Beck, N., Bölöni, L.L., Maheswaran, M., Reuther, A.I., Robertson, J.P., Theys, M.D., Yao, B., Hensgen, D., et al.: A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J. Parallel Distrib. Comput. 61(6), 810–837 (2001)

    Article  MATH  Google Scholar 

  8. Chrysos, G.: Intel\(\textregistered \) Xeon Phi Coprocessor-the Architecture. Intel Whitepaper (2014)

    Google Scholar 

  9. Dokulil, J., Bajrovic, E., Benkner, S., Pllana, S., Sandrieser, M.,Bachmayer, B.: High-level support for hybrid parallel execution of C++ applications targeting Intel Xeon Phi coprocessors. In: ICCS. Procedia Computer Science, vol. 18, pp. 2508–2511. Elsevier (2013)

    Google Scholar 

  10. Duran, A., Ayguadé, E., Badia, R.M., Labarta, J., Martinell, L., Martorell, X., Planas, J.: OmpSs: a proposal for programming heterogeneous multi-core architectures. Parallel Process. Lett. 21(02), 173–193 (2011)

    Article  MathSciNet  Google Scholar 

  11. Grewe, D., O’Boyle, M.F.P.: A static task partitioning approach for heterogeneous systems using OpenCL. In: Knoop, J. (ed.) CC 2011. LNCS, vol. 6601, pp. 286–305. Springer, Heidelberg (2011). doi:10.1007/978-3-642-19861-8_16

    Chapter  Google Scholar 

  12. Kessler, C.W., Dastgeer, U., Thibault, S., Namyst, R., Richards, A., Dolinsky, U., Benkner, S., Traff, J.L., Pllana, S.: Programmability and performance portability aspects of heterogeneous multi-/manycore systems, pp. 1403–1408. IEEE (2012)

    Google Scholar 

  13. Khan, F., Han, Y., Pllana, S., Brezany, P.: Estimation of parameters sensitivity for scientific workflows. In: 2009 International Conference on Parallel Processing Workshops, pp. 457–462, September 2009

    Google Scholar 

  14. Khan, F., Han, Y., Pllana, S., Brezany, P.: An ant-colony-optimization based approach for determination of parameter significance of scientific workflows. In: 2010 24th IEEEInternational Conference on Advanced Information Networking and Applications (AINA), pp. 1241–1248, April 2010

    Google Scholar 

  15. Kołodziej, J., Khan, S.U.: Data scheduling in data grids and data centers: a short taxonomy of problems and intelligent resolution techniques. In: Nguyen, N.-T., Kołodziej, J., Burczyński, T., Biba, M. (eds.) Transactions on Computational Collective Intelligence X. LNCS, vol. 7776, pp. 103–119. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38496-7_7

    Chapter  Google Scholar 

  16. Liu, Y., Pan, T., Aluru, S.: Parallel pairwise correlationcomputation on intel xeon phi clusters. arXiv preprint arXiv:1605.01584 (2016)

  17. Luk, C.K., Hong, S., Kim, H.: Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: 42nd Annual IEEE/ACM International Symposium on Microarchitecture, 2009, MICRO-42, pp. 45–55. IEEE (2009)

    Google Scholar 

  18. Memeti, S., Pllana, S.: PaREM: a novel approach for parallel regular expression matching. In: 17th International Conference on Computational Science and Engineering (CSE 2014), pp. 690–697, December 2014

    Google Scholar 

  19. Memeti, S., Pllana, S.: Accelerating DNA sequence analysis using Intel Xeon Phi. In: PBio at the 2015 IEEE International Symposiumon Parallel and Distributed Processing with Applications (ISPA). IEEE (2015)

    Google Scholar 

  20. Memeti, S., Pllana, S.: Analyzing large-scale DNA sequences on multi-core architectures. In: 18th IEEE International Conference on Computational Science and Engineering (CSE 2015). IEEE (2015)

    Google Scholar 

  21. Nakao, M., Lee, J., Boku, T., Sato, M.: XcalableMP implementationand performance of NAS parallel benchmarks. In: Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, p. 11. ACM (2010)

    Google Scholar 

  22. NCBI: National Center for Biotechnology Information U.S. NationalLibrary of Medicine (2015). http://www.ncbi.nlm.nih.gov/genbank. Accessed Dec 2015

  23. Odajima, T., Boku, T., Hanawa, T., Lee, J., Sato, M.: GPU/CPU work sharing with parallel language XcalableMP-dev for parallelized accelerated computing. In: 2012 41st International Conference on Parallel Processing Workshops (ICPPW), pp. 97–106. IEEE (2012)

    Google Scholar 

  24. Pllana, S., Benkner, S., Xhafa, F., Barolli, L.: Hybrid performance modeling and prediction of large-scale computing systems. In: International Conference on Complex, Intelligent and Software Intensive Systems, 2008, CISIS 2008, pp. 132–138, March 2008

    Google Scholar 

  25. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C: The Art of Scientific Computing, 3rd edn. Cambridge University Press, Cambridge (2007)

    MATH  Google Scholar 

  26. Ravi, V.T., Agrawal, G.: A dynamic scheduling framework for emerging heterogeneous systems. In: 2011 18th International Conference on High Performance Computing (HiPC), pp. 1–10. IEEE (2011)

    Google Scholar 

  27. Scogland, T.R.W., Feng, W., Rountree, B., Supinski, B.R.: CoreTSAR: adaptive worksharing for heterogeneous systems. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 172–186. Springer, Heidelberg (2014). doi:10.1007/978-3-319-07518-1_11

    Google Scholar 

  28. Viebke, A., Pllana, S.: The potential of the Intel (R) Xeon Phi forsupervised deep learning. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), pp. 758–765. IEEE (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Suejb Memeti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Memeti, S., Pllana, S. (2016). Work Distribution of Data-Parallel Applications on Heterogeneous Systems. In: Taufer, M., Mohr, B., Kunkel, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9945. Springer, Cham. https://doi.org/10.1007/978-3-319-46079-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46079-6_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46078-9

  • Online ISBN: 978-3-319-46079-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics