Cooperative Write-Behind Data Buffering for MPI I/O

  • Wei-keng Liao
  • Kenin Coloma
  • Alok Choudhary
  • Lee Ward
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3666)


Many large-scale production parallel programs often run for a very long time and require data checkpoint periodically to save the state of the computation for program restart and/or tracing the progress. Such a write-only pattern has become a dominant part of an application’s I/O workload and implies the importance of its optimization. Existing approaches for write-behind data buffering at both file system and MPI I/O levels have been proposed, but challenges still exist for efficient design to maintain data consistency among distributed buffers. To address this problem, we propose a buffering scheme that coordinates the compute processes to achieve the consistency control. Different from other earlier work, our design can be applied to files opened in read-write mode and handle the patterns with mixed MPI collective and independent I/O calls. Performance evaluation using BTIO and FLASH IO benchmarks is presented, which shows a significant improvement over the method without buffering.


Write behind MPI I/O file consistency data buffering I/O thread 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Callaghan, B.: NFS Illustrated. Addison-Wesley, Reading (2000)Google Scholar
  2. 2.
    Ma, X., Winslett, M., Lee, J., Yu, S.: Improving MPI-IO Output Performance with Active Buffering Plus Threads. In: The International Parallel and Distributed Processing Symposium, IPDPS (2003)Google Scholar
  3. 3.
    Thakur, R., Gropp, W., Lusk, E.: Users Guide for ROMIO: A High-Performance, Portable MPI-IO Implementation. Technical Report ANL/MCS-TM-234, Mathematics and Computer Science Division, Argonne National Laboratory (1997)Google Scholar
  4. 4.
    Message Passing Interface Forum: MPI-2: Extensions to the Message Passing Interface (1997),
  5. 5.
    Purakayastha, A., Ellis, C.S., Kotz, D.: ENWRICH: A Compute-Processor Write Caching Scheme for Parallel File Systems. In: The Fourth Workshop on Input/Output in Parallel and Distributed Systems, IOPADS (1996)Google Scholar
  6. 6.
    Prost, J., Treumann, R., Hedges, R., Jia, B., Koniges, A.: MPI-IO/GPFS, an Optimized Implementation of MPI-IO on top of GPFS. In: Supercomputing (2001)Google Scholar
  7. 7.
    Schmuck, F., Haskin, R.: GPFS: A Shared-Disk File System for Large Computing Clusters. In: The Conference on File and Storage Technologies (FAST 2002), pp. 231–244 (2002)Google Scholar
  8. 8.
    Bernstein, P., Hadzilacos, V., Goodman, N.: Concurrency Control and Recovery in Database Systems. Addison-Wesley, Reading (1987)Google Scholar
  9. 9.
    IEEE/ANSI Std. 1003.1: Portable Operating System Interface (POSIX)-Part 1: System Application Program Interface (API) [C Language]. (1996)Google Scholar
  10. 10.
    Thakur, R., Gropp, W., Lusk, E.: On Implementing MPI-IO Portably and with High Performance. In: The Sixth Workshop on I/O in Parallel and Distributed Systems, pp. 23–32 (1999)Google Scholar
  11. 11.
    Thakur, R., Gropp, W., Lusk, E.: An Abstract-Device Interface for Implementing Portable Parallel-I/O Interfaces. In: The 6th Symposium on the Frontiers of Massively Parallel Computation (1996)Google Scholar
  12. 12.
    Thakur, R., Gropp, W., Lusk, E.: Data Sieving and Collective I/O in ROMIO. In: The 7th Symposium on the Frontiers of Massively Parallel Computation (1999)Google Scholar
  13. 13.
    Wong, P., der Wijngaart, R.: NAS Parallel Benchmarks I/O Version 2.4. Technical Report NAS-03-002, NASA Ames Research Center, Moffet Field, CA (2003)Google Scholar
  14. 14.
    Fineberg, S., Wong, P., Nitzberg, B., Kuszmaul, C.: PMPIO - A Portable Implementation of MPI-IO. In: The 6th Symposium on the Frontiers of Massively Parallel Computation (1996)Google Scholar
  15. 15.
    Fryxell, B., Olson, K., Ricker, P., Timmes, F.X., Zingale, M., Lamb, D.Q., MacNeice, P., Rosner, R., Tufo, H.: FLASH: An Adaptive Mesh Hydrodynamics Code for Modelling Astrophysical Thermonuclear Flashes. Astrophysical Journal Suppliment, 131–273 (2000)Google Scholar
  16. 16.
    Zingale, M.: FLASH I/O Benchmark Routine – Parallel HDF 5 (2001),

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Wei-keng Liao
    • 1
  • Kenin Coloma
    • 1
  • Alok Choudhary
    • 1
  • Lee Ward
    • 2
  1. 1.Electrical and Computer Engineering DepartmentNorthwestern University 
  2. 2.Scalable Computing Systems DepartmentSandia National Laboratories 

Personalised recommendations