International Journal of Parallel Programming

, Volume 44, Issue 5, pp 924–948 | Cite as

An Instrumentation Approach for Hardware-Agnostic Software Characterization

  • Andreea AnghelEmail author
  • Laura Mihaela Vasilescu
  • Giovanni Mariani
  • Rik Jongerius
  • Gero Dittmann


Simulators and empirical profiling data are often used to understand how suitable a specific hardware architecture is for an application. However, simulators can be slow, and empirical profiling-based methods can only provide insights about the existing hardware on which the applications are executed. While the insights obtained in this way are valuable, such methods cannot be used to evaluate a large number of system designs efficiently. Analytical performance evaluation models are fast alternatives, particularly well-suited for system design-space exploration. However, to be truly application-specific, they need to be combined with a workload model that captures relevant application characteristics. In this paper we introduce PISA, a framework based on the LLVM infrastructure that is able to generate such a model for sequential and parallel applications by performing hardware-independent characterization. Characteristics such as instruction-level parallelism, memory access patterns and branch behavior are analyzed per thread or process during application execution. To illustrate the potential of the framework, we provide a detailed characterization of a representative benchmark for graph-based analytics, Graph 500. Finally, we analyze how the properties extracted with PISA across Graph 500 and SPEC CPU2006 applications compare to measurements performed on x86 and POWER8 processors.


Workload characterization Hardware-agnostic Graph 500 Design-space exploration Memory access patterns Instruction-level parallelism Branch entropy Hardware measurements 



This work is conducted in the context of the joint ASTRON and IBM DOME project and is funded by the Netherlands Organisation for Scientific Research (NWO), the Dutch Ministry of EL&I, and the Province of Drenthe. We would like to thank Evelina Dumitrescu for running part of the OpenMP and MPI PISA characterizations.


  1. 1.
    Anghel, A., Rodriguez, G., Prisacari, B., Minkenberg, C., Dittmann, G.: Quantifying communication in graph analytics. In: Kunkel, J.M., Ludwig, T. (eds.) High Performance Computing. Lecture Notes in Computer Science, vol. 9137, pp. 472–487. Springer International Publishing (2015)Google Scholar
  2. 2.
    Argollo, E., Falcón, A., Faraboschi, P., Monchiero, M., Ortega, D.: Cotson: infrastructure for full system simulation. SIGOPS Oper. Syst. Rev. 43(1), 52–61 (2009)CrossRefGoogle Scholar
  3. 3.
    Beckmann, N., Eastep, J., Gruenwald, C., Kurian, G., Kasture, H., Miller, J.E., Celio, C., Agarwal, A.: Graphite: a distributed parallel simulator for multicores. Technical report, MIT (2009)Google Scholar
  4. 4.
    Cabezas, V.: A tool for analysis and visualization of application properties. Technical Report RZ3834, IBM (2012)Google Scholar
  5. 5.
    Carlson, T.E., Heirman, W., Eeckhout, L.: Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’11, pp. 52:1–52:12. ACM, New York, NY, USA (2011)Google Scholar
  6. 6.
    Carlson, T.E., Heirman, W., Eyerman, S., Hur, I., Eeckhout, L.: An evaluation of high-level mechanistic core models. ACM Transactions on Architecture and Code Optimization (TACO), (2014)Google Scholar
  7. 7.
    Czechowski, K., Battaglino, C., McClanahan, C., Chandramowlishwaran, A., Vuduc, R.: Balance principles for algorithm-architecture co-design. In: Proceedings of HotPar’11, pp. 9–9. USENIX Association, Berkeley, CA, USAGoogle Scholar
  8. 8.
    Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A.D., Ailamaki, A., Falsafi, B.: Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: Proceedings of ASPLOS’12, pp. 37–48Google Scholar
  9. 9.
    Fog, A.: The microarchitecture of intel, amd and via cpus. An optimization guide for assembly programmers and compiler makers.
  10. 10.
    Graph 500: Graph 500 benchmark.
  11. 11.
    Hennessy, J.L., Patterson, D.A.: Computer Architecture, Fourth Edition: A Quantitative Approach. Morgan Kaufmann Publishers Inc, San Francisco (2006)zbMATHGoogle Scholar
  12. 12.
    Hoste, K., Eeckhout, L.: Microarchitecture-independent workload characterization. IEEE Micro 27(3), 63–72 (2007)CrossRefGoogle Scholar
  13. 13.
    Jongerius, R., Mariani, G., Anghel, A., Dittmann, G., Vermij, E., Corporaal, H.: Analytic processor model for fast design-space exploration. In: Proceedings of the 33rd IEEE International Conference on Computer Design (ICCD), ICCD’15 (2015)Google Scholar
  14. 14.
    Jose, J., Potluri, S., Tomko, K., Panda, D.K.: Designing scalable Graph500 benchmark with hybrid MPI+OpenSHMEM programming models. In: ISC’13. Lecture Notes in Computer Science, vol. 7905, pp. 109–124. SpringerGoogle Scholar
  15. 15.
    Lam, M.S., Wilson, R.P.: Limits of control flow on parallelism. In: Proceedings of the 19th Annual International Symposium on Computer Architecture, ISCA ’92, pp. 46–57 (1992)Google Scholar
  16. 16.
    Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of CGO’04, pp. 75–86Google Scholar
  17. 17.
    Luk, C.-K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of PLDI’05, pp. 190–200. ACM, New York, NY, USA (2005)Google Scholar
  18. 18.
    Patel, A., Afram, F., Chen, S., Ghose, K.: Marss: a full system simulator for multicore x86 cpus. In: Proceedings of the 48th Design Automation Conference, DAC ’11, pp 1050–1055. ACM, New York, NY, USA (2011)Google Scholar
  19. 19.
    Shao, Y.S., Brooks, D.: ISA-independent workload characterization and its implications for specialized architectures. In: Proceedings of ISPASS’13, pp. 245–255Google Scholar
  20. 20.
    Sharapov, I., Kroeger, R., Delamarter, G., Cheveresan, R., Ramsay, M.: A case study in top-down performance estimation for a large-scale parallel application. In: Proceedings of PPoPP’06, pp. 81–89. ACMGoogle Scholar
  21. 21.
    Suzumura, T., Ueno, K., Sato, H., Fujisawa, K., Matsuoka, S.: Performance characteristics of Graph500 on large-scale distributed environment. In: Proceedings of IISWC’11, pp. 149–158Google Scholar
  22. 22.
    Yokota, T., Ootsu, K., Baba, T.: Potentials of branch predictors: from entropy viewpoints. In: Proceedings of the 21st International Conference on Architecture of Computing Systems, ARCS’08, pp. 273–285. Springer, Berlin, Heidelberg (2008)Google Scholar
  23. 23.
    Zhong, Y., Shen, X., Ding, C.: Program locality analysis using reuse distance. ACM Trans. Program. Lang. Syst 31(6), 20:1–20:39 (2009)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.IBM Research – ZurichRuschlikonSwitzerland
  2. 2.University POLITEHNICA of BucharestBucharestRomania
  3. 3.IBM ResearchDwingelooThe Netherlands

Personalised recommendations