Abstract
In this paper, we describe the implementation of memory checking functionality that is based on instrumentation tools. The combination of instrumentation based checking functions and the MPI-implementation offers superior debugging functionalities, for errors that otherwise are not possible to detect with comparable MPI-debugging tools. Our implementation contains three parts: first, a memory callback extension that is implemented on top of the Valgrind Memcheck tool for advanced memory checking in parallel applications; second, a new instrumentation tool was developed based on the Intel Pin framework, which provides similar functionality as Memcheck it can be used in Windows environments that have no access to the Valgrind suite; third, all the checking functionalities are integrated as the so-called memchecker framework within Open MPI. This will also allow other memory debuggers that offer a similar API to be integrated. The tight control of the user’s memory passed to Open MPI, allows us to detect application errors and to track bugs within Open MPI itself. The extension of the callback mechanism targets communication buffer checks in both pre- and post-communication phases, in order to analyze the usage of the received data, e.g. whether the received data has been overwritten before it is used in an computation or whether the data is never used. We describe our actual checks, classes of errors being found, how memory buffers are being handled internally, show errors actually found in user’s code, and the performance implications of our instrumentation.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
The MemPin tool in this work is developed only targeting at Windows platforms, although it may be used under Linux too.
- 2.
E.g. this showed up uninitialized data in derived objects, e.g. communicators created using MPI_Comm_dup.
References
Bcheck Man Page from SUN Developers Website. Internet. http://developers.sun.com/sunstudio/documentation/ss11/mr/man1/bcheck.1.html (2011)
DeSouza, J., Kuhn, B., de Supinski, B.R.: Automated, scalable debugging of MPI programs with Intel message checker. In: Proceedings of the 2nd International Workshop on Software Engineering for High Performance Computing System Applications, vol. 4, pp. 78–82. ACM Press, New York (2005)
Keller, R., Resch, M.: Testing the correctness of MPI implementations. In: Proceedings of the 5th International Symposium on Parallel and Distributed Computing Conference, Timisoara, pp. 291–295 (2006)
Keller, R., Fan, S., Resch, M.: Memory debugging of MPI-parallel applications in open MPI. In: Joubert, G., Bischof, C., Peters, F., Lippert, T., Bucker, M., Gibbon, P., Mohr B. (eds.) Proceedings of ParCo’07, Julich (2007)
Krammer, B., Mueller, M.S., Resch, M.M.: Runtime checking of MPI applications with Marmot. In: Proceedings of the International Conference ParCo 2005, Malaga (2005)
Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 190–200. ACM, New York (2005)
Message Passing Interface Forum: MPI: A Message Passing Interface Standard. http://www.mpi-forum.org (1995)
Message Passing Interface Forum: MPI-2: Extensions to the Message-Passing Interface. http://www.mpi-forum.org (1997)
MPI Forum Ticket Number 45. Internet. https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/45 (2008)
Seward, J., Nethercote, N.: Using Valgrind to detect undefined value errors with bit-precision. In: Proceedings of the USENIX’05 Annual Technical Conference, Anaheim (2005)
Shiqing Fan, R.K., Resch, M.: Enhanced memory debugging of mpi-parallel applications in open mpi. In: 4th Parallel Tools Workshop, Stuttgart (2010)
Srivastava, A., Eustace, A.: Atom: A System for Building Customized Program Analysis Tools, pp. 196–205. ACM, New York (1994)
The Open Fabrics Project Webpage. https://www.openfabrics.org (2007)
Totalview Memory Debugging capabilities. http://www.etnus.com/TotalView/Memory.html
Vetter, J.S., de Supinski, B.R.: Dynamic software testing of MPI applications with umpire. In: Proceedings of Supercomputing (SC), Dallas. http://www.sc2000.org/proceedings/techpapr/index.htm (2000)
Woodall, T., Graham, R., Castain, R., Daniel, D., Sukalski, M., Fagg, G., Gabriel, E., Bosilca, G., Angskun, T., Dongarra, J., Squyres, J., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A.: Open MPI’s TEG point-to-point communications methodology: comparison to existing implementations. In: Recent Advances in Parallel Virtual Machine and Message Passing Interface, vol. 3241, pp. 105–111. Springer, Budapest (2004)
Acknowledgements
This work was funded by project Int.EU.Grid (Interactive EUropean Grid) with EU-Contract Number 031857, by project DORII (Deployment of Remote Instrumentation Infrastructure) with EU-Contract Number 213110 and by Microsoft Technical Computing Initiative (TCI) since March 2007.
We would like to thank Julian Seward and the open source community for valgrind, which has proven invaluable in many software projects.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fan, S., Keller, R., Resch, M. (2012). Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI. In: Brunst, H., Müller, M., Nagel, W., Resch, M. (eds) Tools for High Performance Computing 2011. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31476-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-31476-6_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31475-9
Online ISBN: 978-3-642-31476-6
eBook Packages: Computer ScienceComputer Science (R0)