Tuning HipGISAXS on Multi and Many Core Supercomputers

  • Abhinav SarjeEmail author
  • Xiaoye S. Li
  • Alexander Hexemer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8551)


With the continual development of multi and many-core architectures, there is a constant need for architecture-specific tuning of application-codes in order to realize high computational performance and energy efficiency, closer to the theoretical peaks of these architectures. In this paper, we present optimization and tuning of HipGISAXS, a parallel X-ray scattering simulation code [9], on various massively-parallel state-of-the-art supercomputers based on multi and many-core processors. In particular, we target clusters of general-purpose multi-cores such as Intel Sandy Bridge and AMD Magny Cours, and many-core accelerators like Nvidia Kepler GPUs and Intel Xeon Phi coprocessors. We present both high-level algorithmic and low-level architecture-aware optimization and tuning methodologies on these platforms. We cover a detailed performance study of our codes on single and multiple nodes of several current top-ranking supercomputers. Additionally, we implement autotuning of many of the algorithmic and optimization parameters for dynamic selection of their optimal values to ensure high-performance and high-efficiency.


Thread Block Many Integrate Core Strong Scaling OpenMP Thread Kernel Fusion 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Tesla Kepler GPU Accelerators. Datasheet (2012)Google Scholar
  2. 2.
    Intel Xeon Phi Coprocessor. Developer’s Quick Start Guide. Version 1.5. White Paper (2013)Google Scholar
  3. 3.
    Performance Application Programming Interface (PAPI) (2013),
  4. 4.
    Top500 Supercomputers (June 2013),
  5. 5.
    Chourou, S., Sarje, A., Li, X., Chan, E., Hexemer, A.: HipGISAXS: A High Performance Computing Code for Simulating Grazing Incidence X-Ray Scattering Data. Submitted to the Journal of Applied Crystallography (2013)Google Scholar
  6. 6.
    Intel Corp.: Intel Xeon Phi Coprocessor Instruction Set Architecture Reference Manual (September 2012)Google Scholar
  7. 7.
    Kim, C., Satish, N., Chhugani, J., et al.: Closing the Ninja Performance Gap through Traditional Programming and Compiler Technology. Tech. Rep. (2011)Google Scholar
  8. 8.
    Pommier, J.: SIMD implementation of sin, cos, exp and log. Tech. Rep. (2007),
  9. 9.
    Sarje, A., Li, X., Chourou, S., Chan, E., Hexemer, A.: Massively Parallel X-ray Scattering Simulations. In: Supercomputing (SC 2012) (2012)Google Scholar
  10. 10.
    Satish, N., Kim, C., Chhugani, J., et al.: Can traditional programming bridge the Ninja performance gap for parallel computing applications? SIGARCH Computer Architecture News 40(3), 440–451 (2012). CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Abhinav Sarje
    • 1
    Email author
  • Xiaoye S. Li
    • 1
  • Alexander Hexemer
    • 2
  1. 1.Computational Research DivisionLawrence Berkeley National LaboratoryBerkeleyUSA
  2. 2.Advanced Light SourceLawrence Berkeley National LaboratoryBerkeleyUSA

Personalised recommendations