Lightweight Instrumentation and Analysis Using OpenSHMEM Performance Counters

Rahman, Md. Wasi-ur-; Ozog, David; Dinan, James

doi:10.1007/978-3-030-04918-8_12

Lightweight Instrumentation and Analysis Using OpenSHMEM Performance Counters

Md. Wasi-ur- Rahman¹⁸,
David Ozog¹⁹ &
James Dinan¹⁹

Conference paper
First Online: 19 March 2019

340 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11283))

Abstract

Partitioned Global Address Space (PGAS) programming models, such as OpenSHMEM, are popular methods of parallel programming; however, performance monitoring and analysis tools for these models have remained elusive. In this work, we propose a performance counter extension to the OpenSHMEM interfaces to expose internal communication state as lightweight performance data to tools. We implement our interface in the open source Sandia OpenSHMEM library and demonstrate its mapping to libfabric primitives. Next, we design a simple collector tool to record the behavior of OpenSHMEM processes at execution time. We analyze the Integer Sort (ISx) benchmark and use the resulting data to investigate several common performance issues—including communication schedule, poor overlap, and load imbalance—and visualize the impact of optimizations to correct these issues. Through this study, our tool uncovered a performance bug in this popular benchmark. Finally, by using our tool to guide the application of several pipelining optimizations, we were able to improve the ISx key exchange performance by more than 30%.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Other names and brands may be claimed as the property of others.
2.
Intel and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries.
Benchmark results were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as “Spectre” and “Meltdown”. Implementation of these updates may make these results inapplicable to your device or system.
Software and workloads used in performance tests may have been optimized for performance only on Intel\(^{\textregistered }\) microprocessors. Performance tests, such as SYSmark\(^{\star }\) and MobileMark\(^{\star }\), are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
For more information go to http://www.intel.com/benchmarks.

References

Adhianto, L., et al.: HPCTOOLKIT: tools for performance analysis of optimized parallel programs. Concurr. Comput.: Pract. Exper. 22(6), 685–701 (2010). Http://hpctoolkit.Org
Google Scholar
Barrett, B.W., Brigthwell, R., Hemmert, K.S., Pedretti, K., Wheeler, K., Underwood, K.D.: Enhanced support for openSHMEM communication in portals. In: IEEE 19th Annual Symposium on High Performance Interconnects. HotI, August 2011
Google Scholar
Brandt, J., Froese, E., Gentile, A., Kaplan, L., Allan, B., Walsh, E.: Network performance counter monitoring and analysis on the Cray XC platform. In: Proceedings of Cray Users Group (2016)
Google Scholar
Browne, S., Dongarra, J., Garner, N., London, K., Mucci, P.: A scalable cross-platform infrastructure for application performance tuning using hardware counters. In: Proceedings of the 2000 ACM/IEEE Conference on Supercomputing. SC 2000, IEEE Computer Society, Washington, DC, USA (2000)
Google Scholar
Cong, G., Wen, H., Murata, H., Negishi, Y.: Tool-assisted optimization of shared-memory accesses in UPC applications. In: IEEE International Conference on High Performance Computing and Communication & IEEE International Conference on Embedded Software and Systems, (HPCC-ICESS), pp. 104–111, June 2012
Google Scholar
DeRose, L., Homer, B., Johnson, D., Kaufmann, S., Poxon, H.: Cray performance analysis tools. In: Resch, M., Keller, R., Himmler, V., Krammer, B., Schulz, A. (eds.) Tools for High Performance Computing, pp. 191–199. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68564-7_12
Chapter Google Scholar
Eschweiler, D., Wagner, M., Geimer, M., Knpfer, A., Nagel, W., Wolf, F.: Open trace format 2: The next generation of scalable trace formats and support libraries. In: Applications, Tools and Techniques on the Road to Exascale Computing. vol. 22, pp. 481–490, January 2012
Google Scholar
Grun, P., et al.: A brief introduction to the openfabrics interfaces - a new network API for maximizing high performance application efficiency. In: 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, pp. 34–39, August 2015
Google Scholar
Hanebutte, U., Hemstad, J.: ISx: A scalable integer sort for co-design in the exascale era. In: 2015 9th International Conference on Partitioned Global Address Space Programming Models (PGAS), pp. 102–104, September 2015
Google Scholar
Hanebutte, U., Hemstad, J.: ISx: a scalable integer sort for co-design in the exascale era. In: 9th International Conference on Partitioned Global Address Space Programming Models. pp. 102–104, September 2015
Google Scholar
Hermanns, M.-A., Geimer, M., Mohr, B., Wolf, F.: Scalable detection of MPI-2 remote memory access inefficiency patterns. In: Ropo, M., Westerholm, J., Dongarra, J. (eds.) EuroPVM/MPI 2009. LNCS, vol. 5759, pp. 31–41. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03770-2_10
Chapter Google Scholar
Knüpfer, A., et al.: Score-P: a joint performance measurement run-time infrastructure for periscope, scalasca, TAU, and vampir. Tools for High Performance Computing, pp. 79–91. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31476-6_7
Chapter Google Scholar
Linford, J., Simon, T.A., Shende, S., Malony, A.D.: Profiling non-numeric OpenSHMEM applications with the TAU performance system. In: Poole, S., Hernandez, O., Shamis, P. (eds.) OpenSHMEM 2014. LNCS, vol. 8356, pp. 105–119. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05215-1_8
Chapter Google Scholar
Linford, J.C., Khuvis, S., Shende, S., Malony, A., Imam, N., Venkata, M.G.: Performance analysis of openSHMEM applications with TAU commander. In: Gorentla Venkata, M., Imam, N., Pophale, S. (eds.) OpenSHMEM 2017. LNCS, vol. 10679, pp. 161–179. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73814-7_11
Chapter Google Scholar
Mohr, B., Kühnal, A., Hermanns, M., Wolf, F.: Performance analysis of one-sided communication mechanisms. In: Joubert, G.R., Nagel, W.E., Peters, F.J., Plata, O.G., Tirado, P., Zapata, E.L. (eds.) Parallel Computing: Current & Future Issues of High-End Computing, Proceedings of the International Conference ParCo 2005. John von Neumann Institute for Computing Series, 13–16 September 2005, Department of Computer Architecture, University of Malaga, Spain, vol. 33, pp. 885–892. Central Institute for Applied Mathematics, Jülich (2005)
Google Scholar
MPI Forum: MPI: A message-passing interface standard version 3.1. Technical report, University of Tennessee, Knoxville, June 2015
Google Scholar
Oeste, S., Knüpfer, A., Ilsche, T.: Towards parallel performance analysis tools for the openSHMEM standard. In: Poole, S., Hernandez, O., Shamis, P. (eds.) OpenSHMEM 2014. LNCS, vol. 8356, pp. 90–104. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05215-1_7
Chapter Google Scholar
OpenSHMEM application programming interface, version 1.3., February 2016. http://www.openshmem.org
OpenSHMEM application programming interface, version 1.4., December 2017. http://www.openshmem.org
Pedretti, K., Vaughan, C.T., Barrett, R.F., Devine, K.D., Hemmert, K.S.: Using the Cray Gemini performance counters. In: Proceedings of the Cray Users Group (2013)
Google Scholar
Portals 4.0. http://www.cs.sandia.gov/Portals/portals4.html
Performance Scaled Messaging 2 (PSM2) Programmer’s Guide, October 2017. https://intel.ly/2y2uvjb
Sandia OpenSHMEM (2018). https://github.com/Sandia-OpenSHMEM/SOS
Seager, K., Choi, S.-E., Dinan, J., Pritchard, H., Sur, S.: Design and implementation of openSHMEM using OFI on the aries interconnect. In: Gorentla Venkata, M., Imam, N., Pophale, S., Mintz, T.M. (eds.) OpenSHMEM 2016. LNCS, vol. 10007, pp. 97–113. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50995-2_7
Chapter Google Scholar
Su, H.H., Billingsley, M., George, A.D.: Parallel performance wizard: a performance system for the analysis of partitioned global-address-space applications. Int. J. High Perform. Comput. Appl. 24(4), 485–510 (2010)
Article Google Scholar
Su, H.-H., Bonachea, D., Leko, A., Sherburne, H., Billingsley, M., George, A.D.: GASP! a standardized performance analysis tool interface for global address space programming models. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, pp. 450–459. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75755-9_54
Chapter Google Scholar
Tallent, N.R., Vishnu, A., Van Dam, H., Daily, J., Kerbyson, D.J., Hoisie, A.: Diagnosing the causes and severity of one-sided message contention. In: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, pp. 130–139. ACM, New York, NY, USA (2015)
Google Scholar
UPC Consortium: UPC language and library specifications, v1.3. Technical Report LBNL-6623E, Lawrence Berkeley National Lab, November 2013
Google Scholar
Van der Wijngaart, R.F., et al.: Comparing runtime systems with exascale ambitions using the parallel research Kernels. In: Kunkel, J.M., Balaji, P., Dongarra, J. (eds.) ISC High Performance 2016. LNCS, vol. 9697, pp. 321–339. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41321-1_17
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Intel Corporation, Austin, TX, USA
Md. Wasi-ur- Rahman
Intel Corporation, Hudson, MA, USA
David Ozog & James Dinan

Authors

Md. Wasi-ur- Rahman
View author publications
You can also search for this author in PubMed Google Scholar
David Ozog
View author publications
You can also search for this author in PubMed Google Scholar
James Dinan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Md. Wasi-ur- Rahman .

Editor information

Editors and Affiliations

Oak Ridge National Laboratory, Oak Ridge, TN, USA
Swaroop Pophale
Oak Ridge National Laboratory, Oak Ridge, TN, USA
Neena Imam
Oak Ridge National Laboratory, Oak Ridge, TN, USA
Ferrol Aderholdt
Oak Ridge National Laboratory, Oak Ridge, TN, USA
Manjunath Gorentla Venkata

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rahman, M.Wu., Ozog, D., Dinan, J. (2019). Lightweight Instrumentation and Analysis Using OpenSHMEM Performance Counters. In: Pophale, S., Imam, N., Aderholdt, F., Gorentla Venkata, M. (eds) OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity. OpenSHMEM 2018. Lecture Notes in Computer Science(), vol 11283. Springer, Cham. https://doi.org/10.1007/978-3-030-04918-8_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-04918-8_12
Published: 19 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04917-1
Online ISBN: 978-3-030-04918-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics