Spatial stochastic reaction-diffusion simulation has been recognized as an essential modeling tool in computational neuroscience studies of signaling pathways, as shown in an increasing number of recent studies such as our previous work on the stochastic effects of calcium dynamics in Purkinje cells [1]. A critical performance issue arises when attempting to model large pathway models with complex morphologies, for example the one proposed in Human Brain Project [2], due to the serial nature of Gillespie's direct method [3], the fundamental algorithm of many spatial stochastic reaction-diffusion simulators including STEPS [4]. Various solutions have been proposed to improve the computational efficiency of the Gillespie method, including the tau-leaping approximation [5], most of which, however, remain serial implementations.

The need of parallel implementation of stochastic reaction-diffusion simulators has become urgent, as the scale and complexity of model being studied surpasses the speedup gained from hardware upgrade and algorithm improvement. However, such a task is not trivial as the original Gillespie SSA is known to be extremely serial.

In CNS2014 we proposed a parallel solution to approximate diffusion events in STEPS [6], which significantly improves the performance while maintaining high accuracy. We now further improve this solution by introducing a multinomial algorithm for fast diffusion direction selection of multiple molecules in a single subvolume. We also combine this diffusion approximation with a new operator splitting solution for reaction events in the SSA system. The combined solution is implemented in STEPS as its first parallel solver named TetOpSplit. Current implementation of TetOpSplit uses MPI as its parallel protocol and aims to provide solutions for large scale simulations such as whole cell reaction-diffusion models, in modern supercomputers like Blue Gene.

In this poster we discuss the difficulties we encountered during the transformation from the serial TetOpSplit algorithm to its parallel counterpart, as well as our solutions. We also provide performance results of the new solver via different examples, and compare them with the results gained from our original serial SSA implementation.