Abstract
Resource management systems and tool support are two important factors for efficiently developing applications in large clusters. On the one hand, management systems (in the form of batch queue systems) are responsible for all issues related to executing jobs on the existing machines. On the other hand, run-time tools (in the form of debuggers, tracers, performance analyzers, etc.) are used to guarantee the correctness and the efficiency of execution. Executing an application under the control of both a resource management system and a run-time tool is still a challenging problem in most cases. Using run-time tools might be difficult or even impossible in usual environments due to the restrictions imposed by resource managers. We propose TDP-Shell as a framework for providing the necessary mechanisms to enable and simplify using run-time tools under a specific resource management system. We have analyzed the essential interactions between common run-time tools and resource management systems and implemented a pilot TDP-Shell. The paper describes the main components of TDP-Shell and its use with some illustrative examples.
This work was supported by MEyC-Spain under contract TIN 2004-03388, and partially supported by NATO under contract EST.EAP.CLG 981032.
Chapter PDF
References
Sterling, T., Messina, P., Pool, J.: Findings of the second Pasadena Workshop on system software and tools for high performance computing environments. Tech. Report 95-162, Center of Exc. in Space Data and Inform. Sciences, NASA (1995E)
Johnsen, S., Anshus, O.J., Bjørndalen, J.M., Bongo, L.A.: Survey of execution monitoring tools for computer clusters, Tech. Report, Univ. of Tromso (September 2003)
Mutka, M.J., Livny, M., Litzkow, M.W.: Condor – A Hunter of Idle Workstations. In: 8th Int’l Conf. on Distributed Systems, June 1988, San Francisco (1988)
Wismuller, R., Trinitis, J., Ludwig, T.: OCM-A Monitoring System for Interiperable Tools. In: Proc. 2nd SIGMETRICS Symposium on Parallel and Distrubuted Tools, August 1998, Welches, USA (1998)
Ludwig, T., Wismüller, R.: OMIS 2.0 – A Universal Interface for Monitoring Systems. In: Bubak, M., Waśniewski, J., Dongarra, J. (eds.) PVM/MPI 1997. LNCS, vol. 1332, pp. 267–276. Springer, Heidelberg (1997)
Rackl, G., Lindermeier, M., Rudorfer, M., Süss, B.: MIMO-An Infraestructure for Monitoring and Managing Distributed Middleware Environments. In: Proc. Middleware 2000, pp. 71–87 (2000)
Prodan, R., Kewley, J.M.: A Framework for an Interoperable Tool Environment. In: Bode, A., Ludwig, T., Karl, W.C., Wismüller, R. (eds.) Euro-Par 2000. LNCS, vol. 1900, pp. 65–69. Springer, Heidelberg (2000)
Miller, B., Cortes, A., Senar, M.A., Livny, M.: The Tool Daemon Protocol (TDP). In: Proc. SuperComputing (November 2003)
Etnus, L.L.C.: TotalView User’s Guide. Document version 6.0.0-1 (January 2003), http://www.etnus.com
Carriero, N., Gelernter, D.: Linda in Context. Comm. of the ACM 32(4), 444–458 (1989)
Miller, B.P., et al.: The Paradyn Parallel Performance Measurement Tools. IEEE Computer 28 11 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ivars, V., Cortes, A., Senar, M.A. (2006). TDP_SHELL: An Interoperability Framework for Resource Management Systems and Run-Time Monitoring Tools. In: Nagel, W.E., Walter, W.V., Lehner, W. (eds) Euro-Par 2006 Parallel Processing. Euro-Par 2006. Lecture Notes in Computer Science, vol 4128. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11823285_3
Download citation
DOI: https://doi.org/10.1007/11823285_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37783-2
Online ISBN: 978-3-540-37784-9
eBook Packages: Computer ScienceComputer Science (R0)