How Pre-multicore Methods and Algorithms Perform in Multicore Era

  • Alexey LastovetskyEmail author
  • Muhammad Fahad
  • Hamidreza Khaleghzadeh
  • Semyon Khokhriakov
  • Ravi Reddy
  • Arsalan Shahid
  • Lukasz Szustak
  • Roman Wyrzykowski
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11203)


Many classical methods and algorithms developed when single-core CPUs dominated the parallel computing landscape, are still widely used in the changed multicore world. Two prominent examples are load balancing, which has been one of the main techniques for minimization of the computation time of parallel applications since the beginning of parallel computing, and model-based power/energy measurement techniques using performance events. In this paper, we show that in the multicore era, load balancing is no longer synonymous to optimization and present recent methods and algorithms for optimization of parallel applications for performance and energy on modern HPC platforms, which do not rely on load balancing and often return imbalanced but optimal solutions.

We also show that some fundamental assumptions about performance events, which have to be true for the model-based power/energy measurement tools to be accurate, are increasingly difficult to satisfy as the number of CPU cores increases. Therefore, energy-aware computing methods relying on these tools will be increasingly difficult to verify.


Multicore platforms Load balancing Power and energy modeling Performance monitoring counters 



This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) under Grant Number 14/IA/2474. This work is partially supported by EU under the COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS).


  1. 1.
    Fatica, M.: Accelerating Linpack with CUDA on heterogenous clusters. In: GPGPU-2, pp. 46–51. ACM (2009).
  2. 2.
    Yang, C., Wang, F., Du, Y., et al.: Adaptive optimization for petascale heterogeneous CPU/GPU computing. In: Cluster 2010, pp. 19–28 (2010)Google Scholar
  3. 3.
    Ogata, Y., Endo, T., Maruyama, N., Matsuoka, S.: An efficient, model-based CPU-GPU heterogeneous FFT library. In: IPDPS 2008, pp. 1–10 (2008)Google Scholar
  4. 4.
    Lastovetsky, A., Reddy, R.: Data partitioning with a functional performance model of heterogeneous processors. Int. J. High Perform. Comput. Appl. 21(1), 76–90 (2007)CrossRefGoogle Scholar
  5. 5.
    Rojek, K., Wyrzykowski, R.: Parallelization of 3D MPDATA algorithm using many graphics processors. In: Malyshkin, V. (ed.) PaCT 2015. LNCS, vol. 9251, pp. 445–457. Springer, Cham (2015). Scholar
  6. 6.
    Zhong, Z., Rychkov, V., Lastovetsky, A.: Data partitioning on multicore and multi-GPU platforms using functional performance models. IEEE Trans. Comput. 64(9), 2506–2518 (2015)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Linderman, M.D., Collins, J.D., Wang, H., et al.: Merge: a programming model for heterogeneous multi-core systems. SIGPLAN Not. 43, 287–296 (2008)CrossRefGoogle Scholar
  8. 8.
    Augonnet, C., Thibault, S., Namyst, R.: Automatic calibration of performance models on heterogeneous multicore architectures. In: Lin, H.-X., et al. (eds.) Euro-Par 2009. LNCS, vol. 6043, pp. 56–65. Springer, Heidelberg (2010). Scholar
  9. 9.
    Quintana-Ortí, G., Igual, F.D., Quintana-Ortí, E.S., van de Geijn, R.A.: Solving dense linear systems on platforms with multiple hardware accelerators. SIGPLAN Not. 44, 121–130 (2009)CrossRefGoogle Scholar
  10. 10.
    Lastovetsky, A., Szustak, L., Wyrzykowski, R.: Model-based optimization of EULAG kernel on Intel Xeon Phi through load imbalancing. IEEE Trans. Parallel Distrib. Syst. 28(3), 787–797 (2017)CrossRefGoogle Scholar
  11. 11.
    Luk, C.K., Hong, S., Kim, H.: Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: MICRO-42, pp. 45–55 (2009)Google Scholar
  12. 12.
    Cierniak, M., Zaki, M., Li, W.: Compile-time scheduling algorithms for heterogeneous network of workstations. Comput. J. 40, 356–372 (1997)CrossRefGoogle Scholar
  13. 13.
    Kalinov, A., Lastovetsky, A.: Heterogeneous distribution of computations solving linear algebra problems on networks of heterogeneous computers. J. Parallel Distrib. Comput. 61(4), 520–535 (2001)CrossRefGoogle Scholar
  14. 14.
    Martínez, J., Garzón, E., Plaza, A., García, I.: Automatic tuning of iterative computation on heterogeneous multiprocessors with ADITHE. J. Supercomput. 58(2), 151–159 (2011)CrossRefGoogle Scholar
  15. 15.
    Lastovetsky, A., Twamley, J.: Towards a realistic performance model for networks of heterogeneous computers. In: Ng, M.K., Doncescu, A., Yang, L.T., Leng, T. (eds.) High Performance Computational Science and Engineering. ITIFIP, vol. 172, pp. 39–57. Springer, Boston, MA (2005). Scholar
  16. 16.
    Lastovetsky, A., Reddy, R.: Data partitioning with a realistic performance model of networks of heterogeneous computers. In: Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004). IEEE Computer Society, Santa Fe (2004)Google Scholar
  17. 17.
    Ilic, A., Pratas, F., Trancoso, P., Sousa, L.: High-performance computing on heterogeneous systems: database queries on CPU and GPU. In: High Performance Scientific Computing with Special Emphasis on Current Capabilities and Future Perspectives. IOS Press, Amsterdam (2011)Google Scholar
  18. 18.
    Colaço, J., Matoga, A., Ilic, A., Roma, N., Tomás, P., Chaves, R.: Transparent application acceleration by intelligent scheduling of shared library calls on heterogeneous systems. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2013, part I. LNCS, vol. 8384, pp. 693–703. Springer, Heidelberg (2014). Scholar
  19. 19.
    Lastovetsky, A., Reddy, R.: Data distribution for dense factorization on computers with memory heterogeneity. Parallel Comput. 33(12), 757–779 (2007)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Clarke, D., Lastovetsky, A., Rychkov, V.: Dynamic load balancing of parallel computational iterative routines on highly heterogeneous HPC platforms. Parallel Proces. Lett. 21(02), 195–217 (2011)MathSciNetCrossRefGoogle Scholar
  21. 21.
    AlOnazi, A., Keyes, D., Lastovetsky, A., Rychkov, V.: Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms. arXiv preprint arXiv:1505.07630 (2015)
  22. 22.
    Clarke, D., Lastovetsky, A., Rychkov, V.: Column-based matrix partitioning for parallel matrix multiplication on heterogeneous processors based on functional performance models. In: Alexander, M., et al. (eds.) Euro-Par 2011. LNCS, vol. 7155, pp. 450–459. Springer, Heidelberg (2012). Scholar
  23. 23.
    FFTW: Fastest Fourier Transform in the West (2018).
  24. 24.
    Lastovetsky, A., Reddy, R.: New model-based methods and algorithms for performance and energy optimization of data parallel applications on homogeneous multicore clusters. IEEE Trans. Parallel Distrib. Syst. 28, 1119–1133 (2017)CrossRefGoogle Scholar
  25. 25.
    Reddy, R., Lastovetsky, A.: Bi-objective optimization of data-parallel applications on homogeneous multicore clusters for performance and energy. IEEE Trans. Comput. 67, 160–177 (2018)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Khaleghzadeh, H., Reddy, R., Lastovetsky, A.: A novel data-partitioning algorithm for performance optimization of data-parallel applications on heterogeneous HPC platforms. IEEE Trans. Parallel Distrib. Syst. 29, 2176–2190 (2018)CrossRefGoogle Scholar
  27. 27.
    O’Brien, K., Petri, I., Reddy, R., Lastovetsky, A., Sakellariou, R.: A survey of power and energy predictive models in HPC systems and applications. ACM Comput. Surv. 50, 37 (2017)CrossRefGoogle Scholar
  28. 28.
    Shahid, A., Fahad, M., Manumachu, R.R., Lastovetsky, A.: Additivity: a selection criterion for performance events for reliable energy predictive modeling. Supercomput. Front. Innov. 4, 50–65 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Alexey Lastovetsky
    • 1
    Email author
  • Muhammad Fahad
    • 1
  • Hamidreza Khaleghzadeh
    • 1
  • Semyon Khokhriakov
    • 1
  • Ravi Reddy
    • 1
  • Arsalan Shahid
    • 1
  • Lukasz Szustak
    • 2
  • Roman Wyrzykowski
    • 2
  1. 1.School of Computer ScienceUniversity College DublinDublin 4Ireland
  2. 2.Czestochowa University of TechnologyCzestochowaPoland

Personalised recommendations