Turbo Analytics: Applications of Big Data and HPC in Drug Discovery

Joshi, Rajendra R.; Sonavane, Uddhavesh; Jani, Vinod; Saxena, Amit; Koulgi, Shruti; Uppuladinne, Mallikarjunachari; Sharma, Neeru; Malviya, Sandeep; Ramakrishnan, E. P.; Gavane, Vivek; Bayaskar, Avinash; Mahajan, Rashmi; Pandey, Sudhir

doi:10.1007/978-3-030-05282-9_11

Rajendra R. Joshi³,
Uddhavesh Sonavane³,
Vinod Jani³,
Amit Saxena³,
Shruti Koulgi³,
Mallikarjunachari Uppuladinne³,
Neeru Sharma³,
Sandeep Malviya³,
E. P. Ramakrishnan³,
Vivek Gavane³,
Avinash Bayaskar³,
Rashmi Mahajan³ &
…
Sudhir Pandey³

Part of the book series: Challenges and Advances in Computational Chemistry and Physics ((COCH,volume 27))

1219 Accesses
1 Citations
7 Altmetric

Abstract

In this current age of data-driven science, perceptive research is being carried out in the areas of genomics, network and metabolic biology, human, animal, organ and tissue models of drug toxicity, witnessing or capturing key biological events or interactions for drug discovery. Drug designing and repurposing involves understanding of ligand orientations for proper binding to the target molecules. The crucial requirement of finding right pose of small molecule in ligand–protein complex is done using drug docking and simulation methods. The domains of biology like genomics, biomolecular structure dynamics, and drug discovery are capable of generating vast molecular data in range of terabytes to petabytes. The analysis and visualization of this data pose a great challenge to the researchers and needs to be addressed in an accelerated and efficient way. So there is continuous need to have advanced analytics platform and algorithms which can perform analysis of this data in a faster way. Big data technologies may help to provide solutions for these problems of molecular docking and simulations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

PCA:: Principal component analysis
RMSD:: Root-mean-square deviation
RMSF:: Root-mean-square fluctuation
MR:: MapReduce

References

Schmidt B, Hildebrandt A (2017) Next-generation sequencing: big data meets high performance computing. Drug Discov Today 22:712–717
Article CAS PubMed Google Scholar
Tripathi R et al (2016) Next-generation sequencing revolution through big data analytics. Front Life Sci 9(2):119–149
Article CAS Google Scholar
Taglang G, Jackson DB (2016) Use of “big data” in drug discovery and clinical trials. Gynecol Oncol 141(1):17–23
Article PubMed Google Scholar
Leyens Lada et al (2017) Use of big data for drug development and for public and personal health and care. Genet Epidemiol 41(1):51–60
Article PubMed Google Scholar
Richter BG, Sexton DP (2009) Managing and analyzing next-generation sequence data. PLoS Comput Biol 5(6):e1000369
Article PubMed PubMed Central CAS Google Scholar
Stephens ZD et al (2015) Big data: astronomical or genomical? PLoS Biol 13(7):e1002195
Article PubMed PubMed Central CAS Google Scholar
Zhao S et al (2017) Cloud computing for next-generation sequencing data analysis. In: Cloud computing-architecture and applications. InTech, London
Google Scholar
Bhuvaneshwar K et al (2015) A case study for cloud based high throughput analysis of NGS data using the globus genomics system. Comput Struct Biotechnol J 13:64–74
Article CAS PubMed Google Scholar
da Fonseca RR et al (2016) Next-generation biology: sequencing and data analysis approaches for non-model organisms. Mar Genomics 30:3–13
Article PubMed Google Scholar
https://www.rcsb.org/
Shaw DE et al (2008) Anton, a special-purpose machine for molecular dynamics simulation. Commun ACM 51(7):91–97
Article Google Scholar
Bernardi RC, Melo MCR, Schulten K (2015) Enhanced sampling techniques in molecular dynamics simulations of biological systems. Biochimica et Biophysica Acta (BBA) 1850(5):872–877
Article CAS Google Scholar
Sugita Y, Okamoto Y (1999) Replica-exchange molecular dynamics method for protein folding. Chem Phys Lett 314.1:141–151.APA
Google Scholar
Swinney DC, Anthony J (2011) How were new medicines discovered? Nat Rev Drug Discov 10(7):507–519
Article CAS PubMed Google Scholar
Borhani DW, Shaw DE (2012) The future of molecular dynamics simulations in drug discovery. J Comput Aided Mol Des 26(1):15–26
Article CAS PubMed Google Scholar
Durrant JD, McCammon JA (2011) Molecular dynamics simulations and drug discovery. BMC Biol 9(1):71
Article CAS PubMed PubMed Central Google Scholar
Fabricant DS, Farnsworth NR (2001) The value of plants used in traditional medicine for drug discovery. Environ Health Perspect 109(Suppl 1):69
Article CAS PubMed PubMed Central Google Scholar
http://www.chemspider.com/
Wishart DS et al (2006) DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 34.suppl_1:D668–D672
Article CAS Google Scholar
Irwin JJ, Shoichet BK (2005) ZINC—a free database of commercially available compounds for virtual screening. J Chem Inf Model 45(1):177–182
Article CAS PubMed PubMed Central Google Scholar
Lengauer T, Rarey M (1996) Computational methods for biomolecular docking. Curr Opin Struct Biol 6(3):402–406
Article CAS PubMed Google Scholar
Sleigh Sara H, Barton Cheryl L (2010) Repurposing strategies for therapeutics. Pharm Med 24(3):151–159
Article Google Scholar
Oprea TI, Mestres J (2012) Drug repurposing: far beyond new targets for old drugs. AAPS J 14(4):759–763
Article CAS PubMed PubMed Central Google Scholar
Sagiroglu, Seref, and Duygu Sinanc (2013) Big data: a review. In: International conference on collaboration technologies and systems (CTS). IEEE
Google Scholar
Nayak A, Poriya A, Poojary D (2013) Type of NOSQL databases and its comparison with relational databases. Int J Appl Inf Syst 5(4):16–19
Google Scholar
Hadoop A (2009) Hadoop. 2009-03-06. http://hadoop.apache.org
Zaharia M et al (2010) Spark: cluster computing with working sets. HotCloud 10(10-10):95
Google Scholar
Allen WJ et al (2015) DOCK 6: impact of new features and current docking performance. J Comp Chem 36(15):1132–1156
Article CAS Google Scholar
Jones G et al (1997) Development and validation of a genetic algorithm for flexible docking. J Mol Biol 267(3):727–748
Article CAS PubMed Google Scholar
Trott Oleg, Olson AJ (2010) AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem 31(2):455–461
CAS PubMed PubMed Central Google Scholar
Case DA et al (2005) The Amber biomolecular simulation programs. J Comput Chem 26.16:1668–1688
Article CAS Google Scholar
Brooks BR et al (2009) CHARMM: the biomolecular simulation program. J Comput Chem 30.10:1545–1614
Article CAS Google Scholar
Van Der Spoel D et al (2005) GROMACS: fast, flexible, and free. J Comput Chem 26(16):1701–1718
Article CAS Google Scholar
Phillips JC et al (2005) Scalable molecular dynamics with NAMD. J Comput Chem 26(16):1781–1802
Article CAS PubMed PubMed Central Google Scholar
Rysavy SJ, Bromley D, Daggett V (2014) DIVE: a graph-based visual-analytics framework for big data. IEEE Comput Graphics Appl 34(2):26–37
Article Google Scholar
Doerr S et al (2016) HTMD: high-throughput molecular dynamics for molecular discovery. J Chem Theory Comput 12(4):1845–1852
Article CAS PubMed Google Scholar
Tu T et al (2008) A scalable parallel framework for analyzing terascale molecular dynamics simulation trajectories. In: International conference for high performance computing, networking, storage and analysis. SC 2008. IEEE
Google Scholar
Roe DR, Cheatham TE III (2013) PTRAJ and CPPTRAJ: software for processing and analysis of molecular dynamics trajectory data. J Chem Theory Comput 9(7):3084–3095
Article CAS PubMed Google Scholar
Humphrey W, Dalke A, Schulten K (1996) VMD: visual molecular dynamics. J Mol Graph Model 14(1):33–38
Article CAS Google Scholar
Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemometr Intell Lab Syst 2(1–3):37–52
Article CAS Google Scholar
Genheden S, Ryde U (2015) The MM/PBSA and MM/GBSA methods to estimate ligand-binding affinities. Expert Opin Drug Discov 10(5):449–461
Article CAS PubMed PubMed Central Google Scholar
Privalov PL, Crane-Robinson C (2017) Role of water in the formation of macromolecular structures. Eur Biophys J 46(3):203–224
Article CAS PubMed Google Scholar
Pace CN, Fu H, Lee Fryar K, Landua J, Trevino SR, Schell D, Thurlkill RL, Imura S, Scholtz JM, Gajiwala K, Sevcik J (2014) Contribution of hydrogen bonds to protein stability. Protein Sci 23(5):652–661
Article CAS PubMed PubMed Central Google Scholar
Irwin JJ, Shoichet BK (2005) ZINC—a free database of commercially available compounds for virtual screening. J Chem Inf Model 45(1):177–182
Article CAS PubMed PubMed Central Google Scholar
Yuriev E, Chalmers D, Capuano B (2009) Conformational analysis of drug molecules: a practical exercise in the medicinal chemistry course. J Chem Educ 86(4):477
Article CAS Google Scholar
Li J, Ehlers T, Sutter J, Varma-O’Brien S, Kirchmair J (2007) CAESAR: a new conformer generation algorithm based on recursive buildup and local rotational symmetry consideration. J Chem Inf Model 47(5):1923–1932
Article CAS PubMed Google Scholar
Lagorce D, Pencheva T, Villoutreix BO, Miteva MA (2009) DG-AMMOS: a new tool to generate 3D conformation of small molecules using distance geometry and automated molecular mechanics optimization for in silico screening. BMC Chem. Bio 9(1):6
Article CAS Google Scholar
Sefraoui O, Aissaoui M, Eleuldj M (2012) OpenStack: toward an open-source solution for cloud computing. Int J Comput Appl 55(3):38–42
Google Scholar
Stewart JJP (1990) MOPAC: a semiempirical molecular orbital program. J Comput Aided Mol Des 4(1):1–103
Article PubMed Google Scholar
Hawkins PC, Skillman AG, Warren GL, Ellingson BA, Stahl MT (2010) Conformer generation with OMEGA: algorithm and validation using high quality structures from the protein databank and cambridge structural database. J Chem Inf Model 50(4):572–584
Article CAS PubMed PubMed Central Google Scholar
Ware B (2002) Open source development with LAMP: using Linux, Apache, MySQL and PHP. Addison-Wesley Longman Publishing Co., Inc., Reading
Google Scholar
https://www.rabbitmq.com/
Hukushima K, Nemoto K (1996) Exchange Monte Carlo method and application to spin glass simulations. J Phy Soc Jpn 65(6):1604–1608
Article CAS Google Scholar
Ashburn TT, Thor KB (2004) Drug repositioning: identifying and developing new uses for existing drugs. Nat Rev Drug Discov 3(8):673–683
Article CAS PubMed Google Scholar
Novac Natalia (2013) Challenges and opportunities of drug repositioning. Trends Pharmacol Sci 34(5):267–272
Article CAS PubMed Google Scholar
Smith Kelly M, Romanelli Frank (2005) Recreational use and misuse of phosphodiesterase 5 inhibitors. J Am Pharm Assoc 45(1):63–75
Article Google Scholar
Pfister DG (2012) Off-label use of oncology drugs: the need for more data and then some. J Clin Oncol, 584–586
Article PubMed Google Scholar
Jin G, Wong STC (2014) Toward better drug repositioning: prioritizing and integrating existing methods into efficient pipelines. Drug Discov Today 19(5):637–644
Article PubMed Google Scholar
Neves SR, Ram PT, Iyengar R (2002) G protein pathways. Science 296(5573):1636–1639
Article CAS PubMed Google Scholar
Khrenova MG et al (2014) Modeling the role of G12 V and G13 V Ras mutations in the Ras-GAP-catalyzed hydrolysis reaction of guanosine triphosphate. Biochemistry 53(45):7093–7099
Article CAS PubMed Google Scholar
Spoerner M et al (2010) Conformational states of human rat sarcoma (Ras) protein complexed with its natural ligand GTP and their role for effector interaction and GTP hydrolysis. J Biol Chem 285(51):39768–39778
Article CAS PubMed PubMed Central Google Scholar
Ma J, Karplus M (1997) Molecular switch in signal transduction: reaction paths of the conformational changes in ras p21. Proc Natl Acad Sci USA 94(22):11905–11910
Article CAS PubMed PubMed Central Google Scholar
White MA et al (1995) Multiple Ras functions can contribute to mammalian cell transformation. Cell 80(4):533–541
Article CAS PubMed Google Scholar
Schubbert S, Shannon K, Bollag G (2007) Hyperactive Ras in developmental disorders and cancer. Nat Rev Cancer 7(4):295
Article CAS PubMed Google Scholar
Gao C, Eriksson LA (2013) Impact of mutations on K-Ras-p 120GAP interaction. Comput Mol BioSci 3(02):9
Article CAS Google Scholar
Shurki A, Warshel A (2004) Why does the Ras switch “break” by oncogenic mutations? Proteins: Struct Funct Bioinf 55(1):1–10
Article CAS Google Scholar
Lu S et al (2016) Ras conformational ensembles, allostery, and signaling. Chem Rev 116(11):6607–6665
Article CAS PubMed Google Scholar
Sharma N, Sonavane U, Joshi R (2017) Differentiating the pre-hydrolysis states of wild-type and A59G mutant HRas: an insight through MD simulations. Comput Biol Chem 69:96–109
Article CAS PubMed Google Scholar
Sharma N, Sonavane U, Joshi R (2014) Probing the wild-type HRas activation mechanism using steered molecular dynamics, understanding the energy barrier and role of water in the activation. Eur Biophys J 43(2-3):81–95
Article CAS PubMed Google Scholar
Wang W, Fang G, Rudolph J (2012) Ras inhibition via direct Ras binding—is there a path forward? Bioorg Med Chem Lett 22(18):5766–5776
Article CAS PubMed Google Scholar
Branton D, Deamer DW, Marziali A, Bayley H, Benner SA, Butler T, Di Ventra M, Garaj S, Hibbs A, Huang X, Jovanovich, SB (2010) The potential and challenges of nanopore sequencing. In: Nanoscience and technology: A collection of reviews from Nature Journals, pp 261–268
Chapter Google Scholar
https://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html

Download references

Author information

Authors and Affiliations

High Performance Computing-Medical & Bioinformatics Applications Group, Centre for Development of Advanced Computing (C-DAC), Savitribai Phule Pune University Campus, Pune, 411007, India
Rajendra R. Joshi, Uddhavesh Sonavane, Vinod Jani, Amit Saxena, Shruti Koulgi, Mallikarjunachari Uppuladinne, Neeru Sharma, Sandeep Malviya, E. P. Ramakrishnan, Vivek Gavane, Avinash Bayaskar, Rashmi Mahajan & Sudhir Pandey

Authors

Rajendra R. Joshi
View author publications
You can also search for this author in PubMed Google Scholar
Uddhavesh Sonavane
View author publications
You can also search for this author in PubMed Google Scholar
Vinod Jani
View author publications
You can also search for this author in PubMed Google Scholar
Amit Saxena
View author publications
You can also search for this author in PubMed Google Scholar
Shruti Koulgi
View author publications
You can also search for this author in PubMed Google Scholar
Mallikarjunachari Uppuladinne
View author publications
You can also search for this author in PubMed Google Scholar
Neeru Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Sandeep Malviya
View author publications
You can also search for this author in PubMed Google Scholar
E. P. Ramakrishnan
View author publications
You can also search for this author in PubMed Google Scholar
Vivek Gavane
View author publications
You can also search for this author in PubMed Google Scholar
Avinash Bayaskar
View author publications
You can also search for this author in PubMed Google Scholar
Rashmi Mahajan
View author publications
You can also search for this author in PubMed Google Scholar
Sudhir Pandey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rajendra R. Joshi .

Editor information

Editors and Affiliations

Amrita Centre for Nanosciences and Molecular Medicine, Amrita Institute of Medical Sciences and Research Centre, Kochi, India
C. Gopi Mohan

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Joshi, R.R. et al. (2019). Turbo Analytics: Applications of Big Data and HPC in Drug Discovery. In: Mohan, C. (eds) Structural Bioinformatics: Applications in Preclinical Drug Discovery Process. Challenges and Advances in Computational Chemistry and Physics, vol 27. Springer, Cham. https://doi.org/10.1007/978-3-030-05282-9_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-05282-9_11
Published: 11 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05281-2
Online ISBN: 978-3-030-05282-9
eBook Packages: Chemistry and Materials ScienceChemistry and Material Science (R0)

Publish with us

Policies and ethics