Google-Accelerated Biomolecular Simulations
Biomolecular simulations rely heavily on the availability of suitable compute infrastructure for data-driven tasks like modeling, sampling, and analysis. These resources are typically available on a per-lab and per-facility basis, or through dedicated national supercomputing centers. In recent years, cloud computing has emerged as an alternative by offering an abundance of on-demand, specialist-maintained resources that enable efficiency and increased turnaround through rapid scaling.
Scientific computations that take the shape of parallel workloads using large datasets are commonplace, making them ideal candidates for distributed computing in the cloud. Recent developments have greatly simplified the task for the experimenter to configure the cloud for use and job submission. This chapter will show how to use Google’s Cloud Platform for biomolecular simulations by example of the molecular dynamics package GROningen MAchine for Chemical Simulations (GROMACS). The instructions readily transfer to a large variety of other tasks, allowing the reader to use the cloud for their specific purposes.
Importantly, by using Docker containers, a popular light-weight virtualization solution, and cloud storage, key issues in scientific research are addressed: reproducibility of results, record keeping, and the possibility for other researchers to obtain copies and directly build upon previous work for further experimentation and hypothesis testing.
Key wordsCloud computing Large-scale simulation Distributed computing
This work was performed on Google infrastructure. The author thanks Jojo Dijamco for many detailed discussions and careful review of the manuscript, and members of the Google Accelerated Science team for helpful feedback.
- 1.Shaw DE, Deneroff MM, Dror RO, Kuskin JS, Larson RH, Salmon JK, Young C, Batson B, Bowers KJ, Chao JC, Eastwood MP, Gagliardo J, Grossman JP, Ho CR, Ierardi DJ, Kolossváry I, Klepeis JL, Layman T, McLeavey C, Moraes MA, Mueller R, Priest EC, Shan Y, Spengler J, Theobald M, Towles B, Wang SC (2008) Anton, a special-purpose machine for molecular dynamics simulation. Commun ACM 51(7):91–97CrossRefGoogle Scholar
- 2.Shaw DE, Grossman JP, Bank JA, Batson B, Butts JA, Chao JC, Deneroff MM, Dror RO, Even A, Fenton CH, Forte A, Gagliardo J, Gill G, Greskamp B, Ho CR, Ierardi DJ, Iserovich L, Kuskin JS, Larson RH, Layman T, Lee L, Lerer AK, Li C, Killebrew D, Mackenzie KM, Mok SY, Moraes MA, Mueller R, Nociolo LJ, Peticolas JL, Quan T, Ramot D, Salmon JK, Scarpazza DP, Schafer UB, Siddique N, Snyder CW, Spengler J, Tang PTP, Theobald M, Toma H, Towles B, Vitale B, Wang SC, Young C (2014) Anton 2: raising the bar for performance and programmability in a special-purpose molecular dynamics supercomputer. In: Kellenberger P (ed) SC’14 proc. int. conf. high performance computing, networking, storage and analysis, New Orleans, 2014Google Scholar
- 5.Bowman GR, Pande VS, Noé F (eds) (2014) An introduction to Markov state models and their application to long timescale molecular simulation. Springer, DordrechtGoogle Scholar
- 6.Dellago C, Bolhuis PG (2009) Transition path sampling and other advanced simulation techniques for rare events. Adv Polym Sci 221:167–233Google Scholar
- 9.Bowers KJ, Chow E, Xu H, Dror RO, Eastwood MP, Gregersen BA, Klepeis JL, Kolossváry I, Moraes MA, Sacerdoti FD, Salmon JK, Shan Y, Shaw DE (2006) Scalable algorithms for molecular dynamics simulations on commodity clusters. In: SC’06 proc. ACM/IEEE conf. supercomputing, Tampa, 2006Google Scholar
- 12.Poplin R, Newburger D, Dijamco J, Nguyen N, Loy D, Gross SS, McLean CY, DePristo MA (2017) Creating a universal SNP and small indel variant caller with deep neural networks, biorxiv. https://doi.org/10.1101/092890
- 14.Hykes S (2013) The future of Linux containers. In: PyCon’13 lightning talksGoogle Scholar
- 17.Google Cloud Platform (2018) Running a dsub pipeline. https://cloud.google.com/genomics/tutorials/dsub. Accessed 26 Aug 2018