Porting Adaptive Ensemble Molecular Dynamics Workflows to the Summit Supercomputer
Abstract
Molecular dynamics (MD) simulations must take very small (femtosecond) integration steps in simulation-time to avoid numerical errors. Efficient use of parallel programming models and accelerators in state-of-the art MD programs now is pushing Moore’s limit for time-per-MD step. As a result, directly simulating timescales beyond milliseconds will not be attainable directly, even at exascale. However, concepts from statistical physics can be used to combine many parallel simulations to provide information about longer timescales and to adequately sample the simulation space, while preserving details about the dynamics of the system. Implementing such an approach requires a workflow program that allows adaptable steering of task assignments based on extensive statistical analysis of intermediate results. Here we report the implementation of such an adaptable workflow program to drive simulations on the Summit IBM Power System AC922, a pre-exascale supercomputer at the Oak Ridge Leadership Computing Facility (OLCF). We compare to experiences on Titan, Summit’s predecessor, report the performance of the workflow and its components, and describe the porting process. We find that using a workflow program managed by a Mongo database can provide the fault tolerance, scalable performance, task dispatch rate, and reconfigurability required for robust and portable implementation of ensemble simulations such as are used in enhanced-sampling molecular dynamics. This type of workflow generator can also be used to provide adaptive steering of ensemble simulations for other applications in addition to MD.
Keywords
High Performance Computing Molecular dynamics Scientific workflows Adaptive samplingNotes
Acknowledgements
The authors would like to acknowledge Micholas Dean Smith for help with MD test systems preparation, and Shantenu Jha and lab for extensive help with incorporation of the Radical Cybertools software stack. An award of computer time was provided by the Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program. This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR227525. ORNL is managed by UT-Battelle, LLC for the US Department of Energy. FN acknowledges European Commission (ERC CoG 772230) and Deutsche Forschungsgemeinschaft (NO 825/3-1). JCS acknowledges DOE contract ERKP752. CC acknowledges support from the National Science Foundation (CHE-1265929, CHE-1740990, CHE-1900374, and PHY-1427654) and the Welch Foundation (C-1570).
References
- 1.Abraham, M.J., et al.: GROMACS: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1, 19–25 (2015)CrossRefGoogle Scholar
- 2.Adorf, C.S., Dodd, P.M., Ramasubramani, V., Glotzer, S.C.: Simple data and workflow management with the signac framework. Comput. Mater. Sci. 146, 220–229 (2018)CrossRefGoogle Scholar
- 3.Ailamaki, A., Ioannidis, Y.E., Livny, M.: Scientific workflow management by database management. In: Proceedings of Tenth International Conference on Scientific and Statistical Database Management (Cat. No. 98TB100243), pp. 190–199. IEEE (1998)Google Scholar
- 4.Amaro, R.E., et al.: Ensemble docking in drug discovery. Biophys. J. 114, 2271–2278 (2018)CrossRefGoogle Scholar
- 5.Balasubramanian, V., Jensen, T., Turilli, M., Kasson, P., Shirts, M., Jha, S.: Implementing adaptive ensemble biomolecular applications at scale. arXiv preprint arXiv:1804.04736 (2018)
- 6.Balasubramanian, V., et al.: Harnessing the power of many: extensible toolkit for scalable ensemble applications. In: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 536–545. IEEE (2018)Google Scholar
- 7.Balasubramanian, V., et al.: Extasy: scalable and flexible coupling of MD simulations and advanced sampling techniques. In: 2016 IEEE 12th International Conference on e-Science, pp. 361–370. IEEE (2016)Google Scholar
- 8.Balasubramanian, V., Treikalis, A., Weidner, O., Jha, S.: Ensemble toolkit: scalable and flexible execution of ensembles of tasks. In: 2016 45th International Conference on Parallel Processing (ICPP), pp. 458–463. IEEE (2016)Google Scholar
- 9.Bernardi, R.C., Melo, M.C., Schulten, K.: Enhanced sampling techniques in molecular dynamics simulations of biological systems. Biochim. Biophys. Acta 1850(5), 872–877 (2015)CrossRefGoogle Scholar
- 10.Bowman, G.R., Pande, V.S., Noé, F. (eds.): An Introduction to Markov State Models and Their Application to Long Timescale Molecular Simulation. AEMB, vol. 797. Springer, Dordrecht (2014). https://doi.org/10.1007/978-94-007-7606-7CrossRefzbMATHGoogle Scholar
- 11.Buchete, N.V., Hummer, G.: Peptide folding kinetics from replica exchange molecular dynamics. Phys. Rev. E 77(3), 030902 (2008)CrossRefGoogle Scholar
- 12.Davidson, S.B., Freire, J.: Provenance and scientific workflows: challenges and opportunities. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1345–1350. ACM (2008)Google Scholar
- 13.Deelman, E., et al.: Pegasus, a workflow management system for science automation. Future Gener. Comput. Syst. 46, 17–35 (2015)CrossRefGoogle Scholar
- 14.Deelman, E., Vahi, K., Rynge, M., Juve, G., Mayani, R., da Silva, R.F.: Pegasus in the cloud: science automation through workflow technologies. IEEE Internet Comput. 20(1), 70–76 (2016)CrossRefGoogle Scholar
- 15.Dorier, M., Wozniak, J.M., Ross, R.: Supporting task-level fault-tolerance in HPC workflows by launching MPI jobs inside MPI jobs. In: Proceedings of the 12th Workshop on Workflows in Support of Large-Scale Science, p. 5. ACM (2017)Google Scholar
- 16.Dou, L., et al.: Scientific workflow design 2.0: demonstrating streaming data collections in Kepler. In: 2011 IEEE 27th International Conference on Data Engineering, pp. 1296–1299. IEEE (2011)Google Scholar
- 17.Eastman, P., et al.: OpenMM 7: rapid development of high performance algorithms for molecular dynamics. PLoS Comput. Biol. 13(7), e1005659 (2017)CrossRefGoogle Scholar
- 18.Garcia, A.E., Herce, H., Paschek, D.: Simulations of temperature and pressure unfolding of peptides and proteins with replica exchange molecular dynamics. Ann. Rep. Comput. Chem. 2, 83–95 (2006)CrossRefGoogle Scholar
- 19.
- 20.Hänggi, P., Talkner, P., Borkovec, M.: Reaction-rate theory: fifty years after Kramers. Rev. Mod. Phys. 62(2), 251 (1990)MathSciNetCrossRefGoogle Scholar
- 21.Hruska, E., Abella, J.R., Nüske, F., Kavraki, L.E., Clementi, C.: Quantitative comparison of adaptive sampling methods for protein dynamics. J. Chem. Phys. 149(24), 244119 (2018)CrossRefGoogle Scholar
- 22.Hummer, G.: Position-dependent diffusion coefficients and free energies from Bayesian analysis of equilibrium and replica molecular dynamics simulations. New J. Phys. 7(1), 34 (2005)CrossRefGoogle Scholar
- 23.Husic, B.E., McGibbon, R.T., Sultan, M.M., Pande, V.S.: Optimized parameter selection reveals trends in Markov state models for protein folding. J. Chem. Phys. 145(19), 194103 (2016)CrossRefGoogle Scholar
- 24.Jain, A., et al.: FireWorks: a dynamic workflow system designed for high-throughput applications. Concurr. Comput.: Pract. Exp. 27(17), 5037–5059 (2015)CrossRefGoogle Scholar
- 25.Kasson, P.M., Jha, S.: Adaptive ensemble simulations of biomolecules. Curr. Opin. Struct. Biol. 52, 87–94 (2018)CrossRefGoogle Scholar
- 26.Kubo, R.: The fluctuation-dissipation theorem. Rep. Prog. Phys. 29(1), 255 (1966)CrossRefGoogle Scholar
- 27.Kumar, S., Rosenberg, J.M., Bouzida, D., Swendsen, R.H., Kollman, P.A.: The weighted histogram analysis method for free-energy calculations on biomolecules. I. The method. J. Comput. Chem. 13(8), 1011–1021 (1992)CrossRefGoogle Scholar
- 28.Laney, D.: Workflow project overview. https://www.csm.ornl.gov/SOS20/documents/Laney-Workflow-Overview-SOS16.pdf
- 29.Li, Y., Manoharan, S.: A performance comparison of SQL and NoSQL databases. In: 2013 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM), pp. 15–19. IEEE (2013)Google Scholar
- 30.MacLean, C.: Python usage metrics on Blue Waters. Cray User Group (2017)Google Scholar
- 31.Merzky, A., Santcroos, M., Turilli, M., Jha, S.: RADICAL-Pilot: Scalable execution of heterogeneous and dynamic workloads on supercomputers. Computer Research Repository (CoRR), abs/1512.08194 (2015)Google Scholar
- 32.Merzky, A., Turilli, M., Maldonado, M., Jha, S.: Design and performance characterization of radical-pilot on Titan. arXiv preprint arXiv:1801.01843 (2018)
- 33.Messina, P.: The exascale computing project. Comput. Sci. Eng. 19(3), 63–67 (2017)CrossRefGoogle Scholar
- 34.Parker, Z., Poe, S., Vrbsky, S.V.: Comparing NoSQL MongoDB to an SQL DB. In: Proceedings of the 51st ACM Southeast Conference, p. 5. ACM (2013)Google Scholar
- 35.Pennycook, S.J., Sewall, J.D., Lee, V.: A metric for performance portability. arXiv preprint arXiv:1611.07409 (2016)
- 36.Pérez-Hernández, G., Paul, F., Giorgino, T., De Fabritiis, G., Noé, F.: Identification of slow molecular order parameters for Markov model construction. J. Chem. Phys. 139(1), 07B604\(\_1\) (2013)CrossRefGoogle Scholar
- 37.Pouya, I., Pronk, S., Lundborg, M., Lindahl, E.: Copernicus, a hybrid dataflow and peer-to-peer scientific computing platform for efficient large-scale ensemble sampling. Future Gener. Comput. Syst. 71, 18–31 (2017)CrossRefGoogle Scholar
- 38.Prinz, J.H.: Git Commit. https://github.com/markovmodel/adaptivemd/commit/186ffa097059168cb6b 17dfd2f0b01f83bc7b6e1
- 39.Prinz, J.H.: https://github.com/jhprinz
- 40.Prinz, J.H., et al.: Markov models of molecular kinetics: generation and validation. J. Chem. Phys. 134(17), 174105 (2011)CrossRefGoogle Scholar
- 41.Pronk, S., et al.: Molecular simulation workflows as parallel algorithms: the execution engine of Copernicus, a distributed high-performance computing platform. J. Chem. Theory Comput. 11(6), 2600–2608 (2015)CrossRefGoogle Scholar
- 42.
- 43.Scherer, M.K., et al.: PyEMMA 2: a software package for estimation, validation, and analysis of Markov models. J. Chem. Theory Comput. 11(11), 5525–5542 (2015)CrossRefGoogle Scholar
- 44.Sedova, A., Eblen, J.D., Budiardja, R., Tharrington, A., Smith, J.C.: High-performance molecular dynamics simulation for biological and materials sciences: challenges of performance portability. In: 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), pp. 1–13. IEEE (2018)Google Scholar
- 45.Sedova, A., Tillack, A.F., Tharrington, A.: Using compiler directives for performance portability in scientific computing: kernels from molecular simulation. In: Chandrasekaran, S., Juckeland, G., Wienke, S. (eds.) WACCPD 2018. LNCS, vol. 11381, pp. 22–47. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-12274-4_2CrossRefGoogle Scholar
- 46.Venkatraman, S., Fahd, K., Kaspi, S., Venkatraman, R.: SQL versus NoSQL movement with big data analytics. IJ Inf. Technol. Comput. Sci. 8, 59–66 (2016)Google Scholar
- 47.Sorin, E.J., Pande, V.S.: Exploring the helix-coil transition via all-atom equilibrium ensemble simulations. Biophys. J. 88(4), 2472–2493 (2005)CrossRefGoogle Scholar
- 48.Souza, R., Mattoso, M.: Provenance of dynamic adaptations in user-steered dataflows. In: Belhajjame, K., Gehani, A., Alper, P. (eds.) IPAW 2018. LNCS, vol. 11017, pp. 16–29. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98379-0_2CrossRefGoogle Scholar
- 49.Souza, R., Silva, V., Oliveira, D., Valduriez, P., Lima, A.A., Mattoso, M.: Parallel execution of workflows driven by a distributed database management system. In: ACM/IEEE Conference on Supercomputing, Poster (2015)Google Scholar
- 50.Swenson, D.W., Prinz, J.H., Noe, F., Chodera, J.D., Bolhuis, P.G.: OpenPathSampling: a Python framework for path sampling simulations. 1. Basics. J. Chem. Theory Comput. 15, 813–836 (2018)CrossRefGoogle Scholar
- 51.Trott, C.R., Plimpton, S.J., Thompson, A.P.: Solving the performance portability issue with Kokkos (2017)Google Scholar
- 52.Turilli, M., Santcroos, M., Jha, S.: A comprehensive perspective on pilot-job systems. ACM Comput. Surv. (CSUR) 51(2), 43 (2018)CrossRefGoogle Scholar
- 53.Weinan, E., Ren, W., Vanden-Eijnden, E.: String method for the study of rare events. Phys. Rev. B 66(5), 052301 (2002)Google Scholar
- 54.Wolstencroft, K., et al.: The Taverna workflow suite: designing and executing workflows of web services on the desktop, web or in the cloud. Nucleic Acids Res. 41(W1), W557–W561 (2013)CrossRefGoogle Scholar
- 55.Woolf, T.B., Roux, B.: Conformational flexibility of o-phosphorylcholine and o-phosphorylethanolamine: a molecular dynamics study of solvation effects. J. Am. Chem. Soc. 116(13), 5916–5926 (1994)CrossRefGoogle Scholar
- 56.Wozniak, J.M., Armstrong, T.G., Wilde, M., Katz, D.S., Lusk, E., Foster, I.T.: Swift/T: large-scale application composition via distributed-memory dataflow processing. In: 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, pp. 95–102. IEEE (2013)Google Scholar
- 57.Wu, H., Paul, F., Wehmeyer, C., Noé, F.: Multiensemble Markov models of molecular thermodynamics and kinetics. In: Proceedings of the National Academy of Sciences, p. 201525092 (2016)Google Scholar