Cache-Aware Roofline Model and Medical Image Processing Optimizations in GPUs

  • Estefania SerranoEmail author
  • Aleksandar Ilic
  • Leonel Sousa
  • Javier Garcia-Blas
  • Jesus Carretero
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11203)


When optimizing or porting applications to new architectures, a preliminary characterization is necessary to exploit the maximum computing power of the employed devices. Profiling tools are available for numerous architectures and programming models, making it easier to spot possible bottlenecks. However, for a better interpretation of the collected results, current profilers rely on insightful performance models. In this paper, we describe the Cache Aware Roofline Model (CARM) and tools for its generation to enable the performance characterization of GPU architectures and workloads. We use CARM to characterize two kernels that are part of a 3D iterative reconstruction application for Computed Tomography (CT). These two kernels take most of the execution time of the whole method, being therefore suitable for a deeper analysis. By exploring the model and the methodology proposed, the overall performance of the kernels has been improved up to two times compared to the previous implementations.


Medical image Computed Tomography CARM GPU Reconstruction 


  1. 1.
    Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)CrossRefGoogle Scholar
  2. 2.
    Ilic, A., Pratas, F., Sousa, L.: Cache-aware Roofline model: upgrading the loft. IEEE Comput. Archit. Lett. 13(1), 21–24 (2014)CrossRefGoogle Scholar
  3. 3.
    Ilic, A., Pratas, F., Sousa, L.: Beyond the roofline: cache-aware power and energy-efficiency modeling for multi-cores. IEEE Trans. Comput. 66(1), 52–58 (2017)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Shinsel, A.: Intel Advisor Roofline (2017). Accessed 02 Mar 2017
  5. 5.
    Marques, D., et al.: Performance analysis with cache-aware roofline model in intel advisor. In: 2017 International Conference on High Performance Computing and Simulation (HPCS), pp. 898–907. IEEE (2017)Google Scholar
  6. 6.
    Lopes, A., Pratas, F., Sousa, L., Ilic, A.: Exploring GPU performance, power and energy-efficiency bounds with Cache-aware Roofline Modeling. In: 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 259–268. IEEE (2017)Google Scholar
  7. 7.
    Feldkamp, L., Davis, L., Kress, J.: Practical cone-beam algorithm. JOSA A 1(6), 612–619 (1984)CrossRefGoogle Scholar
  8. 8.
    de Molina, C., Serrano, E., Garcia-Blas, J., Carretero, J., Desco, M., Abella, M.: Gpu-accelerated iterative reconstruction for limited-data tomography in CBCT systems. BMC Bioinform. 19(1), 171 (2018)CrossRefGoogle Scholar
  9. 9.
    Abella, M., et al.: FUX-Sim: implementation of a fast universal simulation/reconstruction framework for X-ray systems. PLOS ONE 12(7), 1–22 (2017)CrossRefGoogle Scholar
  10. 10.
    Weaver, V.M.: Linux perf\_event features and overhead. In: The 2nd International Workshop on Performance Analysis of Workload Optimized Systems, FastPath, vol. 13 (2013)Google Scholar
  11. 11.
    Dongarra, J., et al.: Performance application programming interfaceGoogle Scholar
  12. 12.
  13. 13.
    Kim, K.-H., Kim, K., Park, Q.-H.: Performance analysis and optimization of three-dimensional FDTD on GPU using roofline model. Comput. Phys. Commun. 182(6), 1201–1207 (2011)CrossRefGoogle Scholar
  14. 14.
    Carvalho, P., Drummond, L.M.A., Bentes, C., Clua, E., Cataldo, E., Marzulo, L.A.J.: Analysis and characterization of GPU benchmarks for kernel concurrency efficiency. In: Mocskos, E., Nesmachnow, S. (eds.) CARLA 2017. CCIS, vol. 796, pp. 71–86. Springer, Cham (2018). Scholar
  15. 15.
    Ryoo, J.H., Quirem, S.J., Lebeane, M., Panda, R., Song, S., John, L.K.: GPGPU benchmark suites: how well do they sample the performance spectrum? In: 2015 44th International Conference on Parallel Processing (ICPP), pp. 320–329. IEEE (2015)Google Scholar
  16. 16.
    Che, S., Skadron, K.: BenchFriend: correlating the performance of GPU benchmarks. Int. J. High Perform. Comput. Appl. 28(2), 238–250 (2014)CrossRefGoogle Scholar
  17. 17.
    Lopez-Novoa, U., Mendiburu, A., Miguel-Alonso, J.: A survey of performance modeling and simulation techniques for accelerator-based computing. IEEE Trans. Parallel Distrib. Syst. 26(1), 272–281 (2015)CrossRefGoogle Scholar
  18. 18.
    Jia, H., Zhang, Y., Long, G., Xu, J., Yan, S., Li, Y.: GPURoofline: a model for guiding performance optimizations on GPUs. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 920–932. Springer, Heidelberg (2012). Scholar
  19. 19.
    Konstantinidis, E., Cotronis, Y.: A practical performance model for compute and memory bound GPU kernels. In: 2015 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 651–658. IEEE (2015)Google Scholar
  20. 20.
    Nugteren, C., van den Braak, G.-J., Corporaal, H.: Roofline-aware DVFS for GPUs. In: Proceedings of International Workshop on Adaptive Self-tuning Computing Systems, p. 8. ACM (2014)Google Scholar
  21. 21.
    Wong, H., Papadopoulou, M.-M., Sadooghi-Alvandi, M., Moshovos, A.: Demystifying GPU microarchitecture through microbenchmarking. In: 2010 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 235–246. IEEE (2010)Google Scholar
  22. 22.
    Mei, X., Chu, X.: Dissecting GPU memory hierarchy through microbenchmarking. IEEE Trans. Parallel Distrib. Syst. 28(1), 72–86 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Estefania Serrano
    • 1
    Email author
  • Aleksandar Ilic
    • 2
  • Leonel Sousa
    • 2
  • Javier Garcia-Blas
    • 1
  • Jesus Carretero
    • 1
  1. 1.University Carlos III of MadridMadridSpain
  2. 2.INESC-ID, Instituto Superior TecnicoUniversity of LisbonLisbonPortugal

Personalised recommendations