Ant: A Debugging Framework for MPI Parallel Programs

  • Jae-Woo Lee
  • Leonardo R. Bachega
  • Samuel P. Midkiff
  • Y. C. Hu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7760)


This paper describes Ant, a debugging framework targeting MPI parallel programs. The Ant framework statically analyzes programs, marking code regions as being executed by all processes or executed by only some of the processes. The analyzed program is then instrumented with calls to an invariant violation monitoring and detection library. The analysis allows regions to be instrumented based on whether all, or less than all, processes execute the region. Ant’s instrumentation strategy allows sampled monitoring across processes in regions executed by all processes. We present a case study using Ant with C-DIDUCE (a variant of DIDUCE for C) to find violations of value invariants in parallel C/MPI programs. Ant’s instrumentation strategy reduces the overhead of monitoring by over 14 times with less impact on accuracy than a scheme that simply distributes monitoring over all processes executing the program.


MPI Parallel Program Debugging Anomaly Detection DIDUCE 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Software errors cost U.S. economy $59.5 billion annually, NIST News Release 2002-10 (2002)Google Scholar
  2. 2.
    Hangal, S., Lam, M.S.: Tracking down software bugs using automatic anomaly detection. In: Proceedings of the 24th International Conference on Software Engineering, pp. 291–301 (2002)Google Scholar
  3. 3.
    Fei, L., Midkiff, S.P.: Artemis: practical runtime monitoring of applications for execution anomalies. In: PLDI 2006: Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 84–95. ACM Press, New York (2006)CrossRefGoogle Scholar
  4. 4.
    Zhou, P., Liu, W., Fei, L., Lu, S., Qin, F., Zhou, Y., Midkiff, S., Torrellas, J.: AccMon: Automatically detecting memory-related bugs via program counter-based invariants. In: Proceedings of the 37th Annual IEEE/ACM International Symposium on Micro-architecture, MICRO 2004 (2004)Google Scholar
  5. 5.
    Liblit, B., Naik, M., Zheng, A.X., Aiken, A., Jordan, M.I.: Scalable statistical bug isolation. In: Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation (2005)Google Scholar
  6. 6.
    Liblit, B., Aiken, A., Zheng, A.X., Jordan, M.I.: Bug isolation via remote program sampling. In: Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, pp. 141–154 (2003)Google Scholar
  7. 7.
    Liu, C., Yan, X., Fei, L., Han, J., Midkiff, S.P.: Sober: statistical model-based bug localization. In: ESEC/FSE-13: Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ACM Press (2005)Google Scholar
  8. 8.
    Ernst, M.D., Czeisler, A., Griswold, W.G., Notkin, D.: Quickly detecting relevant program invariants. In: Proceedings of the 22nd International Conference on Software Engineering, pp. 449–458 (2000)Google Scholar
  9. 9.
    The Cetus Project,
  10. 10.
  11. 11.
  12. 12.
  13. 13.
    Hutchins, M., Foster, H., Goradia, T., Ostrand, T.: Experiments of the effectiveness of dataflow- and controlflow-based test adequacy criteria. In: Proceedings of the 16th International Conference on Software Engineering, ICSE 1994, pp. 191–200. IEEE Computer Society Press, Los Alamitos (1994)CrossRefGoogle Scholar
  14. 14.
    Alexander, V.: Mirgorodskiy, Naoya Maruyama, and Barton P. Miller. Problem diagnosis in large-scale computing environments. In: SC 2006: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, p. 88. ACM (2006)Google Scholar
  15. 15.
    Gao, Q., Qin, F., Panda, D.K.: DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements. In: SC 2007: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing. ACM (2007)Google Scholar
  16. 16.
  17. 17.
    Lumetta, S.S., Culler, D.E.: The Mantis parallel debugger. In: SPDT 1996: Proceedings of the SIGMETRICS Symposium on Parallel and Distributed Tools, pp. 118–126. ACM Press, New York (1996)CrossRefGoogle Scholar
  18. 18.
    Sistare, S., Dorenkamp, E., Nevin, N., Loh, E.: MPI support in the Prism programming environment. In: Supercomputing 1999: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing (CDROM), p. 22. ACM Press (1999)Google Scholar
  19. 19.
    Stringhini, D., Navaux, P., de Kergommeaux, J.C.: A selection mechanism to group processes in a parallel debugger. In: In Proceedings 2000 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2000) (June 2000)Google Scholar
  20. 20.
    Cheng, D., Hood, R.: A portable debugger for parallel and distributed programs. In: Proceedings of Supercomputing 1994, pp. 723–732 (November 1994)Google Scholar
  21. 21.
    Wismuller, R., Oberhubera, M., Krammera, J., Hansenb, O.: Interactive debugging and performance analysis of massively parallel applications. Parallel Computing 22(3), 415–442 (1996)CrossRefGoogle Scholar
  22. 22.
    Arnold, D.C., Ahn, D.H., de Supinski, B.R., Lee, G.L., Miller, B.P., Schulz, M.: Stack trace analysis for large scale debugging. In: International Parallel and Distributed Processing Symposium, p. 64 (2007)Google Scholar
  23. 23.
    Lee, G.L., Ahn, D.H., Arnold, D.C., de Supinski, B.R., Legendre, M., Miller, B.P., Schulz, M., Liblit, B.: Lessons learned at 208k: towards debugging millions of cores. In: SC 2008: Proceedings of the, ACM/IEEE Conference on Supercomputing, pp. 1–9. IEEE Press, Piscataway (2008)Google Scholar
  24. 24.
    Strom, R.E., Bacon, D.F., Goldberg, A.P., Lowry, A., Yellin, D.M., Yemini, S.A.: Hermes: a Language for Distributed Computing. Prentice-Hall, Inc., Upper Saddle River (1991)Google Scholar
  25. 25.
    Kamil, A., Yelick, K.: Concurrency Analysis for Parallel Programs with Textually Aligned Barriers. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds.) LCPC 2005. LNCS, vol. 4339, pp. 185–199. Springer, Heidelberg (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Jae-Woo Lee
    • 1
  • Leonardo R. Bachega
    • 1
  • Samuel P. Midkiff
    • 1
  • Y. C. Hu
    • 1
  1. 1.School of Electrical and Computer EngineeringPurdue UniversityWest LafayetteUSA

Personalised recommendations