Application Suitability Assessment for Many-Core Targets

  • Chris J. Newburn
  • Jim SukhaEmail author
  • Ilya Sharapov
  • Anthony D. Nguyen
  • Chyi-Chang Miao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9945)


Many-core hardware platforms offer a tremendous opportunity for scaling up performance, but not all codes that run on these platforms have been modernized sufficiently to fully utilize the hardware. Assessing whether a code will effectively utilize a given platform can be challenging, particularly for new or potential future platforms where native execution on real hardware is not possible. In this case, one typically relies on architecture simulators and other workload characterization tools, which are often not user-friendly for developers who want to do a quick initial assessment of an application’s suitability for a many-core architecture.

To help address this challenge, we present QMSprof, a tool and a set of analyses for an initial assessment of the suitability of a set of applications for a simulated extremely-parallel many-core target. QMSprof automates the process of running a suite of workload binaries through Intel® Software Development Emulator (SDE) and the Sniper multi-core simulator and extracting high-level summary statistics. The tool generates comparative plots summarizing key metrics across the workload suite, including the mix of vector and nonvector instructions, scalability with increasing thread count, memory bandwidth utilization, and statistics on cache misses and working set size. These summary metrics are designed to aid performance tuners in selecting promising codes for a many-core target and in pinpointing opportunities for additional tuning. To illustrate the utility of our tool, we also describe some sample results from characterizing applications on a hypothetical many-core architecture.


Many-core Performance Characterization Code modernization 


  1. 1.
    Bentley, B.: Validating the Intel\(\textregistered {}\) Pentium\(\textregistered {}\) 4 microprocessor. In: Proceedings of the 38th Annual Design Automation Conference, DAC 2001, pp. 244–248. ACM, New York (2001).
  2. 2.
    Carlson, T.E., Heirman, W., Eeckhout, L.: Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In: International Conference for High Performance Computing, Networking, Storage and Analysis (2011)Google Scholar
  3. 3.
    CORAL Collaboration: Oak Ridge, Argonne, Livermore. Benchmark codes.
  4. 4.
    Himeno, R.: Himeno benchmark (2016).
  5. 5.
    Hong, S.Y., Lim, J.O.J.: The WRF single-moment 6-class microphysics scheme (WSM 2006). J. Korean Meteorol. Soc. 42(2), 129–151 (2006)Google Scholar
  6. 6.
    Intel\(\textregistered {}\) Advisor (2016).
  7. 7.
    Intel\(\textregistered {}\) Software Development Emulator (2016).
  8. 8.
    Intel\(\textregistered {}\) VTune™ Amplifier (2016).
  9. 9.
    Intel\(\textregistered {}\) Xeon Phi™ Product Family (2016).
  10. 10.
    Li, S.: Case study: computing black-scholes with Intel\(\textregistered {}\) advanced vector extensions (2012).
  11. 11.
    McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. IEEE Comput. Soc. Techn. Committee Comput. Archit. (TCCA) Newsl. 19–25 (1995)Google Scholar
  12. 12.
    McCalpin, J.D.: STREAM: sustainable memory bandwidth in high performance computers (2016).
  13. 13.
    Shai, O., Shmueli, E., Feitelson, D.G.: Heuristics for resource matching in intel’s compute farm. In: Desai, N., Cirne, W. (eds.) JSSPP 2013. LNCS, vol. 8429, pp. 116–135. Springer, Heidelberg (2014). doi: 10.1007/978-3-662-43779-7_7 Google Scholar
  14. 14.
    The Sniper Multi-Core Simulator (2016).
  15. 15.
    Sugumar, R.A., Abraham, S.G.: Efficient simulation of caches under opt replacement with applications to miss characterization. In: Proceedings of the ACM SIGMETRICS Conference (1993)Google Scholar
  16. 16.
    Tramm, J., Gunow, G.: SimpleMOC-kernel, version 2.0 (2015).
  17. 17.
    Valles, A., Zhang, W.: Optimizing for reacting Navier-Stokes equations. In: Reinders, J., Jeffers, J. (eds.) High Performance Parallelism Pearls, pp. 69–85. Morgan Kaufmann, Boston (2015). CrossRefGoogle Scholar
  18. 18.
    Williams, T., Kelley, C.: gnuplot 4.6 (2014).
  19. 19.
    Yoo, A.B., Jette, M.A., Grondona, M.: SLURM: simple linux utility for resource management. In: Feitelson, D., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 44–60. Springer, Heidelberg (2003). doi: 10.1007/10968987_3 CrossRefGoogle Scholar
  20. 20.
    Zhang, W.: miniSMC Benchmark (2014).
  21. 21.
    Zhang, Z., Phan, L.T.X., Tan, G., Jain, S., Duong, H., Loo, B.T., Lee, I.: On the feasibility of dynamic rescheduling on the Intel distributed computing platform. In: Proceedings of the 11th International Middleware Conference Industrial Track, Middleware Industrial Track 2010, pp. 4–10. ACM, New York (2010).

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Chris J. Newburn
    • 1
  • Jim Sukha
    • 1
    Email author
  • Ilya Sharapov
    • 1
  • Anthony D. Nguyen
    • 1
  • Chyi-Chang Miao
    • 1
  1. 1.Intel CorporationHudsonUSA

Personalised recommendations