Abstract
Process migration refers to the ability to move a running process from one node and make it continue on another. The MPI standard prescribes support for process migration, but so far it was implemented mostly via checkpoint-restart. This paper presents an automatic and transparent process migration framework that can be used for MPI processes. This framework is advantageous when migration of individual processes for purposes such as load-balancing is more adequate than checkpointing the whole job. The paper describes this framework for process migration in clusters and multi-clusters, how it was tuned for Open MPI and the performance of migrated MPI processes.
Chapter PDF
References
The Message Passing Interface (MPI) standard, http://www.mcs.anl.gov/mpi/
Berkeley Lab Checkpoint/Restart, http://ftg.lbl.gov/checkpoint
Barak, A., Shiloh, A.: The MOSIX cluster operating system for high-performance computing on Linux cluster, multi-clusters and clouds (2012), http://www.MOSIX.org/pub/MOSIX_wp.pdf
Amar, L., Barak, A., Drezner, Z., Okun, M.: Randomized gossip algorithms for maintaining a distributed bulletin board with guaranteed age properties. Concurrency and Computation: Practice and Experience 21, 1907–1927 (2009)
Amir, Y., Awerbuch, B., Barak, A., Borgstrom, R.S., Keren, A.: An opportunity cost approach for job assignment in a scalable computing cluster. IEEE Tran. Parallel and Dist. Systems 11(7), 760–768 (2000)
Liu, J., Chandrasekaran, B., Yu, W., Wu, J., Buntinas, D., Kini, S.P., Wyckoff, P., Panda, D.K.: Micro-benchmark level performance comparison of high-speed cluster interconnects. Hot Interconnect 11 (2003), http://nowlab.cse.ohio-state.edu/publications/conf-papers/2003/liuj-hoti03.pdf
Bailey, D., Barszcz, E., Barton, J., Browning, D., Carter, R., Dagum, L., Fatoohi, R., Fineberg, S., Frederickson, P., Lasinski, T., Schreiber, R., Simon, H., Venkatakrishnan, V., Weeratunga, S.: The NAS parallel benchmarks. Tech. Report RNR-94-007, NASA (1994)
Iancu, C., Hofmeyr, S., Blagojevic, F., Zheng, Y.: Oversubscription on multicore processors. In: Proc. 2010 IEEE Int’l Sym. on Parallel and Dist. Processing (2010)
Corbal, J., Duran, A., Labarta, J.: Dynamic load balancing of MPI+OpenMP applications. In: Proc. Int’l Conf. on Parallel Processing (ICPP), pp. 195–202 (2004)
Hursey, J., Squyres, J.M., Mattox, T.I., Lumsdaine, A.: The design and implementation of checkpoint/restart process fault tolerance for Open MPI. In: Proc. 21st IEEE Int’l Parallel and Dist. Processing Sym. (IPDPS), pp. 1–8 (2007)
Liu, T., Ma, Z., Ou, Z.: A novel process migration method for MPI applications. In: Proc. 15th IEEE Pacific Rim Int’l Sym. on Dependable Computing, pp. 247–251 (2009)
Wang, C., Mueller, F., Engelmann, C., Scott, S.: Proactive process-level live migration in HPC environments. In: Proc. 2008 ACM/IEEE Conf. on Supercomputing, SC (2008)
Roman, E.: A Survey of Checkpoint/Restart implementations. Tech. Report LBNL-54942C, Berkeley Lab. (2002)
Gao, Q., Yu, W., Huang, W., Panda, D.K.: Application-transparent checkpoint/restart for MPI programs over Infiniband. In: Proc. 35th Int’l Conf. on Parallel Processing (ICPP), pp. 471–478 (2006)
Ouyang, X., Rajachandrasekar, R., Besseron, X., Panda, D.K.: RDMA-based job migration framework for MPI over Infiniband. In: Proc. 2010 IEEE Int’l Conf. on Cluster Computing (CLUSTER), pp. 116–125 (2010)
Ma, R.K.K., Wang, C., Lau, F.C.M.: M-JavaMPI: A Java-MPI binding with process migration support. In: Proc. 2nd IEEE Int’l Sym. on Cluster Computing and the Grid (CCGRID), p. 255 (2002)
Huang, C., Zheng, G., Kale, L., Kumar, S.: Performance evaluation of Adaptive MPI. In: Proc. 11th ACM SIGPLAN Sym. on Principles and Practice of Parallel Programming (PPoPP), pp. 12–21 (2006)
Hursey, J., Mattox, T.I., Lumsdaine, A.: Interconnect agnostic checkpoint/restart in Open MPI. In: Proc. 18th ACM Int’l Sym. on High Performance Dist. Computing (HPDC), pp. 49–58 (2009)
Keller, J., Majeed, M., Kessler, C.W.: Balancing CPU load for irregular MPI applications. In: Proc. Int’l Conf. on Parallel Computing, ParCo (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Barak, A., Margolin, A., Shiloh, A. (2012). Automatic Resource-Centric Process Migration for MPI. In: Träff, J.L., Benkner, S., Dongarra, J.J. (eds) Recent Advances in the Message Passing Interface. EuroMPI 2012. Lecture Notes in Computer Science, vol 7490. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33518-1_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-33518-1_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33517-4
Online ISBN: 978-3-642-33518-1
eBook Packages: Computer ScienceComputer Science (R0)