Multi Objective Optimization of HPC Kernels for Performance, Power, and Energy

  • Prasanna Balaprakash
  • Ananta Tiwari
  • Stefan M. WildEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8551)


Code optimization in the high-performance computing realm has traditionally focused on reducing execution time. The problem, in mathematical terms, has been expressed as a single objective optimization problem. The expected concerns of next-generation systems, however, demand a more detailed analysis of the interplay among execution time and other metrics. Metrics such as power, performance, energy, and resiliency may all be targeted together and traded against one another. We present a multi objective formulation of the code optimization problem. Our proposed framework helps one explore potential tradeoffs among multiple objectives and provides a significantly richer analysis than can be achieved by treating additional metrics as hard constraints. We empirically examine a variety of metrics, architectures, and code optimization decisions and provide evidence that such tradeoffs exist in practice.


Multi Objective Optimization Pareto Front Clock Frequency Decision Space Code Variant 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kogge, P.: The tops in flops. IEEE Spectrum 48(2), 48–54 (2011)CrossRefGoogle Scholar
  2. 2.
    TOP500 List: June 2013 Report,
  3. 3.
    Balaprakash, P., Wild, S.M., Hovland, P.D.: Can search algorithms save large-scale automatic performance tuning? Procedia Computer Science 4, 2136–2145 (2011)CrossRefGoogle Scholar
  4. 4.
    Kadayif, I., Kandemir, M., Vijaykrishnan, N., Irwin, M., Sivasubramaniam, A.: EAC: A compiler framework for high-level energy estimation and optimization. In: Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, pp. 436–442. IEEE (2002)Google Scholar
  5. 5.
    Kodi, A., Louri, A.: Performance adaptive power-aware reconfigurable optical interconnects for high-performance computing (HPC) systems. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC), pp. 1–12 (2007)Google Scholar
  6. 6.
    Ahmad, I., Ranka, S., Khan, S.U.: Using game theory for scheduling tasks on multi-core processors for simultaneous optimization of performance and energy. In: IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–6. IEEE (2008)Google Scholar
  7. 7.
    Azizi, O., Mahesri, A., Lee, B.C., Patel, S.J., Horowitz, M.: Energy-performance tradeoffs in processor architecture and circuit design: A marginal cost analysis. In: ACM SIGARCH Computer Architecture News, vol. 38, pp. 26–36. ACM (2010)Google Scholar
  8. 8.
    Tiwari, A., Laurenzano, M.A., Carrington, L., Snavely, A.: Modeling power and energy usage of HPC kernels. In: IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 990–998. IEEE (2012)Google Scholar
  9. 9.
    Choi, J.W., Bedard, D., Fowler, R., Vuduc, R.: A roofline model of energy. In: 2013 IEEE 27th International Symposium on Parallel Distributed Processing (IPDPS), pp. 661–672. IEEE (May 2013)Google Scholar
  10. 10.
    Ascia, G., Catania, V., Palesi, M.: Multi-objective mapping for mesh-based NoC architectures. In: Proceedings of the 2nd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, pp. 182–187. ACM (2004)Google Scholar
  11. 11.
    Jahr, R., Ungerer, T., Calborean, H., Vintan, L.: Automatic multi-objective optimization of parameters for hardware and code optimizations. In: International Conference on High Performance Computing and Simulation (HPCS), pp. 308–316. IEEE (2011)Google Scholar
  12. 12.
    Park, S., Jiang, W., Zhou, Y., Adve, S.: Managing energy-performance tradeoffs for multithreaded applications on multiprocessor architectures. In: ACM SIGMETRICS Performance Evaluation Review, vol. 35, pp. 169–180 (2007)Google Scholar
  13. 13.
    Bedard, D., Lim, M.Y., Fowler, R., Porterfield, A.: PowerMon: Fine-grained and integrated power monitoring for commodity computer systems. In: IEEE SoutheastCon 2010, pp. 479–484 (2010)Google Scholar
  14. 14.
    Li, D., de Supinski, B.R., Schulz, M., Cameron, K., Nikolopoulos, D.S.: Hybrid MPI/OpenMP power-aware computing. In: IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp. 1–12. IEEE (2010)Google Scholar
  15. 15.
    Rahman, S.F., Guo, J., Yi, Q.: Automated empirical tuning of scientific codes for performance and power consumption. In: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers, pp. 107–116. ACM (2011)Google Scholar
  16. 16.
    Lively, C., Wu, X., Taylor, V., Moore, S., Chang, H.C., Cameron, K.: Energy and performance characteristics of different parallel implementations of scientific applications on multicore systems. International Journal of High Performance Computing Applications 25(3), 342–350 (2011)CrossRefGoogle Scholar
  17. 17.
    Ţăpuş, C., Chung, I.H., Hollingsworth, J.K.: Active harmony: towards automated performance tuning. In: Proceedings of the 2002 ACM/IEEE conference on Supercomputing, Supercomputing 2002, pp. 1–11. IEEE Computer Society Press, Los Alamitos (2002)Google Scholar
  18. 18.
    Tiwari, A., Laurenzano, M.A., Carrington, L., Snavely, A.: Auto-tuning for energy usage in scientific applications. In: Alexander, M., et al. (eds.) Euro-Par 2011, Part II. LNCS, vol. 7156, pp. 178–187. Springer, Heidelberg (2012)Google Scholar
  19. 19.
    Laros III, J.H.: Measuring and tuning energy efficiency on large scale high performance computing platforms. Technical Report SAND2011-5702, Sandia National Laboratories (August 2011)Google Scholar
  20. 20.
    Heydemann, K., Bodin, F.: Iterative compilation for two antagonistic criteria: Application to code size and performance. In: Proceedings of the 4th Workshop on Optimizations for DSP and Embedded Systems (2006)Google Scholar
  21. 21.
    Hoste, K., Eeckhout, L.: Cole: Compiler optimization level exploration. In: Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization, pp. 165–174. ACM (2008)Google Scholar
  22. 22.
    Lokuciejewski, P., Plazar, S., Falk, H., Marwedel, P., Thiele, L.: Multi-objective exploration of compiler optimizations for real-time systems. In: 13th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing (ISORC), pp. 115–122 (2010)Google Scholar
  23. 23.
    Hoste, K., Georges, A., Eeckhout, L.: Automated just-in-time compiler tuning. In: Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 62–72. ACM (2010)Google Scholar
  24. 24.
    Fursin, G., Kashnikov, Y., Memon, A.W., Chamski, Z., Temam, O., Namolaru, M., Yom-Tov, E., Mendelson, B., Zaks, A., Courtois, E., et al.: Milepost gcc: Machine learning enabled self-tuning compiler. International Journal of Parallel Programming 39(3), 296–327 (2011)CrossRefGoogle Scholar
  25. 25.
    Jordan, H., Thoman, P., Durillo, J.J., Pellegrini, S., Gschwandtner, P., Fahringer, T., Moritsch, H.: A multi-objective auto-tuning framework for parallel codes. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), pp. 10:1–10:12. IEEE Computer Society Press, Los Alamitos (2012)Google Scholar
  26. 26.
    Ehrgott, M.: Multicriteria Optimization. 2nd edn. Springer (2005)Google Scholar
  27. 27.
    Balaprakash, P., Wild, S.M., Norris, B.: SPAPT: Search problems in automatic performance tuning. Procedia Computer Science 9, 1959–1968 (2012)CrossRefGoogle Scholar
  28. 28.
    Kaiser, A., Williams, S., Madduri, K., Ibrahim, K., Bailey, D., Demmel, J., Strohmaier, E.: TORCH computational reference kernels: A testbed for computer science research. Technical Report UCB/EECS-2010-144, EECS Department, University of California, Berkeley (December 2010)Google Scholar
  29. 29.
    Davis, T.A.: Direct methods for sparse linear systems, vol. 2. SIAM (2006)Google Scholar
  30. 30.
    Heroux, M.A., Doerer, D.W., Crozier, P.S., Willenbring, J.M.: Improving performance via mini-applications. Technical Report SAND2009-5574, Sandia National Laboratories (September 2009)Google Scholar
  31. 31.
    Norris, B., Hartono, A., Gropp, W.: Annotations for productivity and performance portability. In: Petascale Computing: Algorithms and Applications. Computational Science, pp. 443–462. Chapman & Hall/CRC Press (2007)Google Scholar
  32. 32.
  33. 33.
    Albers, S., Antoniadis, A.: Race to idle: New algorithms for speed scaling with a sleep state. In: Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1266–1285. SIAM (2012)Google Scholar
  34. 34.
  35. 35.
    Alonso, P., Dolz, M.F., Igual, F.D., Mayo, R., Quintana-Orti, E.S.: Saving energy in the LU factorization with partial pivoting on multi-core processors. In: 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 353–358. IEEE (2012)Google Scholar
  36. 36.
    Springer, R., Lowenthal, D.K., Rountree, B., Freeh, V.W.: Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster. In: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 230–238. ACM (2006)Google Scholar
  37. 37.
    Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Transactions on Mathematical Software 38(1) 1:1–1:25 (2011)Google Scholar
  38. 38.
  39. 39.
  40. 40.
  41. 41.
    Yoshii, K., Iskra, K., Gupta, R., Beckman, P., Vishwanath, V., Yu, C., Coghlan, S.: Evaluating power-monitoring capabilities on IBM Blue Gene/P and Blue Gene/Q. In: IEEE International Conference on Cluster Computing (CLUSTER), pp. 36–44. IEEE (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Prasanna Balaprakash
    • 1
    • 2
  • Ananta Tiwari
    • 3
  • Stefan M. Wild
    • 1
    Email author
  1. 1.Argonne National Laboratory, Mathematics and Computer Science DivisionArgonneUSA
  2. 2.Argonne National Laboratory, Leadership Computing FacilityArgonneUSA
  3. 3.Performance Modeling and Characterization (PMaC) LabSan Diego Supercomputer CenterLa JollaUSA

Personalised recommendations