Towards Flexible, Reliable, High Throughput Parallel Discrete Event Simulations
The excessive amount of time necessary to complete large-scale discrete-event simulations of complex systems such as telecommunication networks, transportation systems, and multiprocessor computers continues to plague researchers and impede progress in many important domains. Parallel discrete-event simulation techniques offer an attractive solution to addressing this problem by enabling scalable execution, and much prior research has been focused on this approach. However, effective, practical solutions have been elusive. This is largely because of the complexity and difficulty associated with developing parallel discrete event simulation systems and software. In particular, an effective parallel execution mechanism must simultaneously address a variety of issues, such as efficient synchronization, resource allocation, load distribution, and fault tolerance, to mention a few. Further, most systems that have been developed to date assume a set of processors is dedicated to completing the simulation computation, yielding inflexible execution environments. Effective exploitation of parallelization techniques requires that these issues be addressed automatically by the underlying system, largely transparent to the application developer.
We describe an approach to execute large-scale parallel discrete event simulation programs over multiple processor computing resources that automates many of the tasks associated with parallel execution. Based on concepts utilized in volunteer distributed computing projects, an approach is proposed that supports execution over computing platforms shared with other users. This approach automatically adapts as new processors are added or existing processors taken aware during the course of the execution. The runtime environment based on a master-worker approach is used to automatically distribute simulation computations over the available processors to balance workload, and to automatically recover from processor failures that may occur during the computation. Synchronization, a particularly important problem for parallel discreteevent simulation computations, is handled automatically by the underlying runtime system.
KeywordsWork Unit Message Queue Primary Server Parallel Virtual Machine Fault Tolerance System
Unable to display preview. Download preview PDF.