Skip to main content

NUMA-Aware Task Performance Analysis

  • Conference paper
  • First Online:
OpenMP: Memory, Devices, and Tasks (IWOMP 2016)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9903))

Included in the following conference series:

Abstract

The tasking feature enriches OpenMP by a method to express parallelism in a more general way than before, as it can be applied to loops but also to recursive algorithms without the need of nested parallel regions. However, the performance of a tasking program is very much influenced by the task scheduling inside the OpenMP runtime. Especially on large NUMA systems and when tasks work on shared data structures which are split across NUMA nodes, the runtime influence is significant. For a programmer there is no easy way to examine these performance relevant decisions taken by the runtime, neither with functionality provided by OpenMP nor with external performance tools. Therefore, we will present a method based on the Score-P measurement infrastructure which allows to analyze task parallel programs on NUMA systems more deeply, allowing the user to see if tasks were executed by the creating thread or remotely on the same or a different socket. Exemplary the Intel and the GNU Compiler were used to execute the same task parallel code, where a performance difference of 8x could be observed, mainly due to task scheduling. We evaluate the presented method by investigating both execution runs and highlight the differences of the task scheduling applied.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ayguadé, E., Copty, N., Duran, A., Hoeflinger, J., Lin, Y., Massaioli, F., Teruel, X., Unnikrishnan, P., Zhang, G.: The Design of OpenMP Tasks. IEEE Trans. Parallel Distrib. Syst. 20(3), 404–418 (2009)

    Article  Google Scholar 

  2. Ayguadé, E., Duran, A., Hoeflinger, J.P., Massaioli, F., Teruel, X.: An experimental evaluation of the new OpenMP tasking model. In: Adve, V., Garzarán, M.J., Petersen, P. (eds.) LCPC 2007. LNCS, vol. 5234, pp. 63–77. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  3. Broquedis, F., Furmento, N., Goglin, B., Wacrenier, P.-A., Namyst, R.: ForestGOMP: an efficient OpenMP environment for NUMA architectures. Int. J. Parallel Programm. 38, 418–439 (2010). doi:10.1007/s10766-010-0136-3

    Article  MATH  Google Scholar 

  4. Intel: Intel VTune Amplifier XE. http://software.intel.com/en-us/intel-vtune-amplifier-xe. Accessed 24 May 2016

  5. Knüpfer, A., Rössel, C., an Mey, D., Biersdorff, S., Diethelm, K., Eschweiler, D., Geimer, M., Gerndt, M., Lorenz, D., Malony, A.D., Nagel, W.E., Oleynik, Y., Philippen, P., Saviankou, P., Schmidl, D., Shende, S.S., Tschüter, R., Wagner, M., Wesarg, B., Wolf, F.: Score-P - a joint performance measurement run-time infrastructure for periscope, Scalasca, TAU, and Vampir. In: Proceedings of 5th Parallel Tools Workshop, Dresden, Germany, September 2011

    Google Scholar 

  6. Lorenz, D., Mohr, B., Rössel, C., Schmidl, D., Wolf, F.: How to reconcile event-based performance analysis with tasking in OpenMP. In: Sato, M., Hanawa, T., Müller, M.S., Chapman, B.M., Supinski, B.R. (eds.) IWOMP 2010. LNCS, vol. 6132, pp. 109–121. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  7. Nagel, W., Weber, M., Hoppe, H.-C., Solchenbach, K.: VAMPIR: visualization and analysis of MPI resources. Supercomputer 12(1), 69–80 (1996)

    Google Scholar 

  8. Olivier, S.L., Porterfield, A.K., Wheeler, K.B., Prins, J.F.: Scheduling task parallelism on multi-socket multicore systems. In: Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2011, pp. 49–56. ACM, New York (2011)

    Google Scholar 

  9. OpenMP ARB: OpenMP Application Program Interface, v. 3.0. http://www.openmp.org/mp-documents/spec30.pdf. Accessed 24 May 2016

  10. Oracle: Oracle Solaris Studio 12.2: Performance Analyzer. http://docs.oracle.com/cd/E18659_01/html/821-1379/. Accessed 24 May 2016

  11. Saviankou, P., Knobloch, M., Visser, A., Mohr, B.: Cube v4: from performance report explorer to performance analysis tool. Proc. Comput. Sci. 51, 1343–1352 (2015)

    Article  Google Scholar 

  12. Schmidl, D., Philippen, P., Lorenz, D., Rössel, C., Geimer, M., an Mey, D., Mohr, B., Wolf, F.: Performance analysis techniques for task-based OpenMP applications. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 196–209. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  13. Schmidl, D., Terboven, C., an Mey, D., Müller, M.S.: Suitability of performance tools for OpenMP task-parallel programs. In: Knüpfer, A., Gracia, J., Nagel, W.E., Resch, M.M. (eds.) Tools for High Performance Computing 2013, pp. 25–37. Springer International Publishing, Basel (2013)

    Google Scholar 

  14. Terboven, C., Schmidl, D., Cramer, T., an Mey, D.: Assessing OpenMP tasking implementations on NUMA architectures. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 182–195. Springer, Heidelberg (2012). doi:10.1007/978-3-642-30961-8_14

    Chapter  Google Scholar 

  15. Terboven, C., Schmidl, D., Cramer, T., an Mey, D.: Task-parallel programming on NUMA architectures. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 638–649. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32820-6_63

    Chapter  Google Scholar 

Download references

Acknowledgement

This work was funded by the German Federal Ministry of Research and Education (BMBF) under Grant Number 01IH13001D(Score-E).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dirk Schmidl .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Schmidl, D., Müller, M.S. (2016). NUMA-Aware Task Performance Analysis. In: Maruyama, N., de Supinski, B., Wahib, M. (eds) OpenMP: Memory, Devices, and Tasks. IWOMP 2016. Lecture Notes in Computer Science(), vol 9903. Springer, Cham. https://doi.org/10.1007/978-3-319-45550-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45550-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45549-5

  • Online ISBN: 978-3-319-45550-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics