Work Distribution of Data-Parallel Applications on Heterogeneous Systems

Memeti, Suejb; Pllana, Sabri

doi:10.1007/978-3-319-46079-6_6

Suejb Memeti¹⁶ &
Sabri Pllana¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9945))

Included in the following conference series:

International Conference on High Performance Computing

2375 Accesses

Abstract

Heterogeneous computing systems offer high peak performance and energy efficiency, and utilizing this potential is essential to achieve extreme-scale performance. However, optimal sharing of the work among processing elements in heterogeneous systems is not straightforward. In this paper, we propose an approach that uses combinatorial optimization to search for optimal system configuration in a given parameter space. The optimization goal is to determine the number of threads, thread affinities, and workload partitioning, such that the overall execution time is minimized. For combinatorial optimization we use the Simulated Annealing. We evaluate our approach with a DNA sequence analysis application on a heterogeneous platform that comprises two Intel Xeon E5 processors and an Intel Xeon Phi 7120P co-processor. The obtained results demonstrate that using the near-optimal system configuration, determined by our algorithm based on the simulated annealing, application performance is improved.

This research has received funding from the Swedish Knowledge Foundation under Grant No. 20150088.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

TOP500 Supercomputer Sites. http://www.top500.org/. Accessed Jan 2016
Abraham, E., Bekas, C., Brandic, I., Genaim, S., Johnsen, E.B.,Kondov, I., Pllana, S., Streit, A.: Preparing HPC applications for exascale: challenges and recommendations. In: 2015 18th International Conference on Network-Based Information Systems (NBiS), pp. 401–406, September 2015
Google Scholar
Albayrak, O.E., Akturk, I., Ozturk, O.: Improving application behavior on heterogeneous manycore systems through kernel mapping. Parallel Comput. 39(12), 867–878 (2013). http://dx.doi.org/10.1016/j.parco.2013.08.011
Article Google Scholar
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Exp. 23(2), 187–198 (2011)
Article Google Scholar
Ayguadé, E., Blainey, B., Duran, A., Labarta, J., Martínez, F., Martorell, X., Silvera, R.: Is the Schedule clause really necessary in OpenMP? In: Voss, M.J. (ed.) WOMPAT 2003. LNCS, vol. 2716, pp. 147–159. Springer, Heidelberg (2003). doi:10.1007/3-540-45009-2_12
Chapter Google Scholar
Benkner, S., Pllana, S., Traff, J., Tsigas, P., Dolinsky, U., Augonnet, C., Bachmayer, B., Kessler, C., Moloney, D., Osipov, V.: PEPPHER: efficient and productive usage of hybrid computing systems. IEEE Micro 31(5), 28–41 (2011)
Article Google Scholar
Braun, T.D., Siegel, H.J., Beck, N., Bölöni, L.L., Maheswaran, M., Reuther, A.I., Robertson, J.P., Theys, M.D., Yao, B., Hensgen, D., et al.: A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J. Parallel Distrib. Comput. 61(6), 810–837 (2001)
Article MATH Google Scholar
Chrysos, G.: Intel\(\textregistered \) Xeon Phi Coprocessor-the Architecture. Intel Whitepaper (2014)
Google Scholar
Dokulil, J., Bajrovic, E., Benkner, S., Pllana, S., Sandrieser, M.,Bachmayer, B.: High-level support for hybrid parallel execution of C++ applications targeting Intel Xeon Phi coprocessors. In: ICCS. Procedia Computer Science, vol. 18, pp. 2508–2511. Elsevier (2013)
Google Scholar
Duran, A., Ayguadé, E., Badia, R.M., Labarta, J., Martinell, L., Martorell, X., Planas, J.: OmpSs: a proposal for programming heterogeneous multi-core architectures. Parallel Process. Lett. 21(02), 173–193 (2011)
Article MathSciNet Google Scholar
Grewe, D., O’Boyle, M.F.P.: A static task partitioning approach for heterogeneous systems using OpenCL. In: Knoop, J. (ed.) CC 2011. LNCS, vol. 6601, pp. 286–305. Springer, Heidelberg (2011). doi:10.1007/978-3-642-19861-8_16
Chapter Google Scholar
Kessler, C.W., Dastgeer, U., Thibault, S., Namyst, R., Richards, A., Dolinsky, U., Benkner, S., Traff, J.L., Pllana, S.: Programmability and performance portability aspects of heterogeneous multi-/manycore systems, pp. 1403–1408. IEEE (2012)
Google Scholar
Khan, F., Han, Y., Pllana, S., Brezany, P.: Estimation of parameters sensitivity for scientific workflows. In: 2009 International Conference on Parallel Processing Workshops, pp. 457–462, September 2009
Google Scholar
Khan, F., Han, Y., Pllana, S., Brezany, P.: An ant-colony-optimization based approach for determination of parameter significance of scientific workflows. In: 2010 24th IEEEInternational Conference on Advanced Information Networking and Applications (AINA), pp. 1241–1248, April 2010
Google Scholar
Kołodziej, J., Khan, S.U.: Data scheduling in data grids and data centers: a short taxonomy of problems and intelligent resolution techniques. In: Nguyen, N.-T., Kołodziej, J., Burczyński, T., Biba, M. (eds.) Transactions on Computational Collective Intelligence X. LNCS, vol. 7776, pp. 103–119. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38496-7_7
Chapter Google Scholar
Liu, Y., Pan, T., Aluru, S.: Parallel pairwise correlationcomputation on intel xeon phi clusters. arXiv preprint arXiv:1605.01584 (2016)
Luk, C.K., Hong, S., Kim, H.: Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: 42nd Annual IEEE/ACM International Symposium on Microarchitecture, 2009, MICRO-42, pp. 45–55. IEEE (2009)
Google Scholar
Memeti, S., Pllana, S.: PaREM: a novel approach for parallel regular expression matching. In: 17th International Conference on Computational Science and Engineering (CSE 2014), pp. 690–697, December 2014
Google Scholar
Memeti, S., Pllana, S.: Accelerating DNA sequence analysis using Intel Xeon Phi. In: PBio at the 2015 IEEE International Symposiumon Parallel and Distributed Processing with Applications (ISPA). IEEE (2015)
Google Scholar
Memeti, S., Pllana, S.: Analyzing large-scale DNA sequences on multi-core architectures. In: 18th IEEE International Conference on Computational Science and Engineering (CSE 2015). IEEE (2015)
Google Scholar
Nakao, M., Lee, J., Boku, T., Sato, M.: XcalableMP implementationand performance of NAS parallel benchmarks. In: Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, p. 11. ACM (2010)
Google Scholar
NCBI: National Center for Biotechnology Information U.S. NationalLibrary of Medicine (2015). http://www.ncbi.nlm.nih.gov/genbank. Accessed Dec 2015
Odajima, T., Boku, T., Hanawa, T., Lee, J., Sato, M.: GPU/CPU work sharing with parallel language XcalableMP-dev for parallelized accelerated computing. In: 2012 41st International Conference on Parallel Processing Workshops (ICPPW), pp. 97–106. IEEE (2012)
Google Scholar
Pllana, S., Benkner, S., Xhafa, F., Barolli, L.: Hybrid performance modeling and prediction of large-scale computing systems. In: International Conference on Complex, Intelligent and Software Intensive Systems, 2008, CISIS 2008, pp. 132–138, March 2008
Google Scholar
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C: The Art of Scientific Computing, 3rd edn. Cambridge University Press, Cambridge (2007)
MATH Google Scholar
Ravi, V.T., Agrawal, G.: A dynamic scheduling framework for emerging heterogeneous systems. In: 2011 18th International Conference on High Performance Computing (HiPC), pp. 1–10. IEEE (2011)
Google Scholar
Scogland, T.R.W., Feng, W., Rountree, B., Supinski, B.R.: CoreTSAR: adaptive worksharing for heterogeneous systems. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 172–186. Springer, Heidelberg (2014). doi:10.1007/978-3-319-07518-1_11
Google Scholar
Viebke, A., Pllana, S.: The potential of the Intel (R) Xeon Phi forsupervised deep learning. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), pp. 758–765. IEEE (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Linnaeus University, 351 95, Vaxjo, Sweden
Suejb Memeti & Sabri Pllana

Authors

Suejb Memeti
View author publications
You can also search for this author in PubMed Google Scholar
Sabri Pllana
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suejb Memeti .

Editor information

Editors and Affiliations

University of Delaware, Newark, Delaware, USA
Michela Taufer
Forschungszentrum Jülich, Jülich, Germany
Bernd Mohr
DKRZ, Hamburg, Germany
Julian M. Kunkel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Memeti, S., Pllana, S. (2016). Work Distribution of Data-Parallel Applications on Heterogeneous Systems. In: Taufer, M., Mohr, B., Kunkel, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9945. Springer, Cham. https://doi.org/10.1007/978-3-319-46079-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-46079-6_6
Published: 06 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46078-9
Online ISBN: 978-3-319-46079-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics