Abstract
Multi-core processors are ubiquitous. Extracting the desired performance from them requires efficient techniques for partitioning a single piece of work into multiple fine-grained units of work in order to process them simultaneously. Understanding the performance behavior of a parallel system requires a close familiarity with the underlying architecture and the hardware counters.
We present a performance analysis study of a multi-core system by a state-of-the-art parallel performance analyzer tool, the Intel VTune Performance Analyzer. We chose as a test-case a classic nested-loop application that exhibits unexpected performance gains using two different programming models on the same multi-core system. Our expectations were to be able to reason about the performance results by exploring the application behavior using the parallel analyzer tool. We found that it is very difficult to explain high-level performance measurements of multi-core systems by low-level hardware diagnosis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Marowka, A.: Parallel Computing on Any Desktop. Communication of ACM 50(9), 74–78 (2007)
Marowka, A.: Pitfalls and Issues of Manycore Programming. Advances in Computers 79, 71–117 (2010)
Reinders, J.: Intel threading building blocks: outfitting C++ for multi-core processor parallelism. O’Reilly Media, Inc, Sebastopol (2007)
Leiserson, C.E.: The Cilk++ concurrency platform. In: 46th Design Automation Conference, San Francisco, CA (2009)
OpenMP API, Version 3.0 (2008), http://www.openmp.org
Leijen, D., Hall, J.: Optimize managed code for multi-core machines (2007), http://msdn.microsoft.com/msdnmag/issues/07/10/futures/default.aspx
Java Fork/Join Framework (JSR166), http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166ydocs/
Hower, D., Jackson, S.: TaskMan: Simple Task-Parallel Programming, http://pages.cs.wisc.edu/~david/courses/cs758/Fall2009/includes/Projects/JacksonHower-slides.pdf
Faxan, K.-F.: Wool user’s guide, Technical report, Swedish Institute of Computer Science (2009)
Balart, J., Duran, A., Gonzalez, M., Martorell, X., Ayguada, E., Labarta, J.: Nanos mercurium: a research compiler for openmp. In: The Proceedings of the European Workshop on OpenMP (2004)
TBB Web Site, http://www.threadingbuildingblocks.org/
Chapman, B., Jost, G., van der Pas, R.: Using OpenMP, Portable Shared Memory Parallel Programming. MIT Press, Cambridge (2007)
Intel VTune Performance Analyzer, http://software.intel.com/en-us/intel-vtune/
Drepper, U.: What Every Programmer Should Know About Memory (2007), http://people.redhat.com/drepper/cpumemory.pdf
Drepper, U.: Understanding Application Memory Performance. In: RED-HAT (2008)
Contreras, G., Martonosi, M.: Characterizing and Improving the Performance of Intel Threading Building Blocks. In: IEEE Proceeding of International Symposium on Workload Characterization, pp. 57–66 (2008)
Robison, A., Voss, M., Kukanov, A.: Optimization via Reflection on Work Stealing in TBB. In: Proceeding of IEEE International Symposium on Parallel and Distributed Processing, IPDPS, pp. 1–8 (2008)
Wang, L., Xu, X.: Parallel Software Development with Intel Threading Analysis Tools. Intel Technology Journal 11(04), 287–297 (2007)
Kegel, P., Schellmann, M., Gorlatch, S.: Using openMP vs. Threading building blocks for medical imaging on multi-cores. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 654–665. Springer, Heidelberg (2009)
Podobas, A., Brorsson, M., Faxan, K.: A Comparison of some recent Task-based Parallel Programming Models. In: The proceeding of the Third Workshop on Programmability Issues for Multi-Core Computers, Pisa, Italy, January 24 (2010)
Nathan, R.T., Mellor-Crummey, J.M.: Identifying Performance Bottlenecks in Work-Stealing Computation. IEEE Computer, 44–50 (December 2009)
Gurumani, S.T., Milenkovic, A.: Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++. In: ACM SE 2004, Huntsville, Alabama, USA, April 2-3, pp. 261–266 (2004)
Prakash, T.K., Peng, L.: Performance characterization of SPEC CPU2006 on Intel core 2 duo processor. In: ISAST 2008, vol. 2(1), pp. 36–41 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Marowka, A. (2011). On Performance Analysis of a Multithreaded Application Parallelized by Different Programming Models Using Intel VTune. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2011. Lecture Notes in Computer Science, vol 6873. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23178-0_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-23178-0_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23177-3
Online ISBN: 978-3-642-23178-0
eBook Packages: Computer ScienceComputer Science (R0)